Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9626193
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: April 18, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9626192
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: April 18, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9612842
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: April 4, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Publication number: 20170090924
    Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
    Type: Application
    Filed: September 26, 2015
    Publication date: March 30, 2017
    Applicant: Intel Corporation
    Inventors: Asit K. Mishra, Edward T. Grochowski, Jonathan D. Pearce, Deborah T. Marr, Ehud Cohen, Elmoustapha OuId-Ahmed-Vall, Jesus Corbal San Adrian, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Milind B. Girkar
  • Publication number: 20170090921
    Abstract: A processor includes a decode unit to decode an instruction indicating a source packed data operand having source data elements and indicating a destination storage location. Each of the source data elements has a source data element value and a source data element position. An execution unit, in response to the instruction, stores a result packed data operand having result data elements each having a result data element value and a result data element position. Each result data element value is one of: (1) equal to a source data element position of a source data element, closest to one end of the source operand, having a source data element value equal to the result data element position of the result data element; and (2) a replacement value, when no source data element has a source data element value equal to the result data element position of the result data element.
    Type: Application
    Filed: September 25, 2015
    Publication date: March 30, 2017
    Applicant: INTEL CORPORATION
    Inventors: Christopher J. Hughes, Jong Soo Park
  • Patent number: 9600442
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.
    Type: Grant
    Filed: July 18, 2014
    Date of Patent: March 21, 2017
    Assignee: Intel Corporation
    Inventor: Christopher J Hughes
  • Patent number: 9582422
    Abstract: Two techniques address bottlenecking in processors. The first is indirect prefetching. The technique can be especially useful for graph analytics and sparse matrix applications. For graph analytics and sparse matrix applications, the addresses of most random memory accesses come from an index array B which is sequentially scanned by an application. The random accesses are actually indirect accesses in the form A[B[i]]. A hardware component is introduced to detect this pattern. The hardware can then read B a certain distance ahead, and prefetch the corresponding element in A. For example, if the “prefetch distance” is k, when B[i] is accessed, the hardware reads B[i+k], and then A[B[i+k]. For partial cacheline accessing, the indirect accesses are usually accessing random memory locations and only accessing a small portion of a cacheline. Instead of loading the whole cacheline into L1 cache, the second technique only loads a part of the cacheline.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: February 28, 2017
    Assignee: INTEL CORPORATION
    Inventors: Xiangyao Yu, Christopher J. Hughes, Nadathur Rajagopalan Satish
  • Patent number: 9575765
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: February 21, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Publication number: 20170039068
    Abstract: Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.
    Type: Application
    Filed: October 20, 2016
    Publication date: February 9, 2017
    Inventors: Sanjeev KUMAR, Christopher J. HUGHES, Partha KUNDU, Anthony NGUYEN
  • Patent number: 9563429
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: February 7, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9563425
    Abstract: Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.
    Type: Grant
    Filed: November 28, 2012
    Date of Patent: February 7, 2017
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Changkyu Kim, Daehyun Kim, Victor W. Lee, Jong Soo Park
  • Publication number: 20160378651
    Abstract: A processor includes a processing core to generate a memory request for an application data in an application. The processor also includes a virtual page group memory management (VPGMM) unit coupled to the processing core to specify a caching priority (CP) to the application data for the application. The caching priority identifies importance of the application data in a cache.
    Type: Application
    Filed: June 24, 2015
    Publication date: December 29, 2016
    Inventors: Subramanya R. Dulloor, Rajesh M. Sankaran, David A. Koufaty, Christopher J. Hughes, Jong Soo Park, Sheng Li
  • Publication number: 20160378473
    Abstract: A processor includes a front end to receive an instruction, a decoder to decode the instruction, a core to execute the first instruction, and a retirement unit to retire the first instruction. The core includes logic to execute the first instruction, including logic to repeatedly record a translation lookaside buffer (TLB) until a designated number of records are determined, and flush the TLB after a flush interval.
    Type: Application
    Filed: June 26, 2015
    Publication date: December 29, 2016
    Inventors: Kshitij A. Doshi, Christopher J. Hughes
  • Patent number: 9529715
    Abstract: Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: December 27, 2016
    Assignee: Intel Corporation
    Inventors: Sanjeev Kumar, Christopher J. Hughes, Partha Kundu, Anthony Nguyen
  • Publication number: 20160357556
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction to continue a data speculative execution (DSX) and to determine that a DSX loop iteration is to be committed, commit speculative stores associated with the DSX loop iteration, and start a new DSX loop iteration.
    Type: Application
    Filed: December 24, 2014
    Publication date: December 8, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR
  • Patent number: 9513905
    Abstract: In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: December 6, 2016
    Assignee: Intel Corporation
    Inventors: Mikhail Smelyanskiy, Sanjeev Kumar, Daehyun Kim, Jatin Chhugani, Changkyu Kim, Christopher J. Hughes, Victor W. Lee, Anthony D. Nguyen, Yen-Kuang Chen
  • Publication number: 20160335086
    Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.
    Type: Application
    Filed: July 25, 2016
    Publication date: November 17, 2016
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
  • Publication number: 20160321185
    Abstract: In one embodiment, a processor includes a control logic to determine whether to enable an incoming data block associated with a first priority to displace, in a cache memory coupled to the processor, a candidate victim data block associated with a second priority and stored in the cache memory, based at least in part on the first and second priorities, a first access history associated with the incoming data block and a second access history associated with the candidate victim data block. Other embodiments are described and claimed.
    Type: Application
    Filed: April 28, 2015
    Publication date: November 3, 2016
    Inventors: Kshitij A. Doshi, Christopher J. Hughes
  • Publication number: 20160299763
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: June 21, 2016
    Publication date: October 13, 2016
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 9411584
    Abstract: Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: August 9, 2016
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Brett L. Toll, Mark J. Charney, Milind B. Girkar