Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10387158
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction inside a speculative execution (DSX) and rollback execution to a stored address and clear a DSX status indication in a DSX status register, and thereby abort the DSX.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: August 20, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes, Robert Valentine, Milind B. Girkar
  • Publication number: 20190250921
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Application
    Filed: April 29, 2019
    Publication date: August 15, 2019
    Inventors: Andrew T. FORSYTH, Brian J. HICKMANN, Jonathan C. HALL, Christopher J. HUGHES
  • Publication number: 20190243761
    Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
    Type: Application
    Filed: April 11, 2019
    Publication date: August 8, 2019
    Applicant: Intel Corporation
    Inventors: Doddaballapur N. Jayasimha, Samantika S. Sury, Christopher J. Hughes, Jonas Svennebring, Yen-Cheng Liu, Stephen R. Van Doren, David A. Koufaty
  • Publication number: 20190236013
    Abstract: Technologies for migration of dynamic home tile mapping are described. An apparatus includes means for receiving coherence messages from other processor cores on the die, means for recording locations from which the coherence messages originate and means for determining distances between the requested home tiles and the locations from which the coherence messages originate. The apparatus includes means for determining whether an average distance between a particular home tile, whose identifier is stored in the home tile table, exceeds a threshold. When the average distance exceeds the defined threshold, the apparatus includes means for migrating the particular home tile to another location.
    Type: Application
    Filed: April 12, 2019
    Publication date: August 1, 2019
    Inventors: Christopher J. Hughes, Daehyun Kim, Jong Soo Park, Richard M. Yoo
  • Publication number: 20190228049
    Abstract: The present disclosure is directed to systems and methods for performing discrete cosine transforms and inverse discrete cosine transforms (DCT/IDCT) using a CORDIC algorithm implemented in systolic array circuitry that includes a plurality cells or nodes, each containing circuitry to implement the CORDIC algorithm. DCT/IDCT control circuitry multiplies the systolic array output matrix generated by the systolic array circuitry by a scaling factor that may include a defined scaling value or an actual cosine value. The DCT/IDCT control circuitry causes the transfer of the scaled systolic array output matrix to combination circuitry where the DCT/IDCT input matrix is combined with the scaled systolic array output matrix to provide the DCT/IDCT output matrix. The DCT/IDCT control circuitry also transfers bypass information to at least a portion of the cells or nodes in the systolic array circuitry.
    Type: Application
    Filed: March 30, 2019
    Publication date: July 25, 2019
    Applicant: Intel Corporation
    Inventors: Kamlesh R. Pillai, Christopher J. Hughes
  • Publication number: 20190205139
    Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations.
    Type: Application
    Filed: December 29, 2017
    Publication date: July 4, 2019
    Inventors: Christopher J. Hughes, Joseph Nuzman, Jonas Svennebring, Doddaballapur N. Jayasimha, Samantika S. Sury, David A. Koufaty, Niall D. McDonnell, Yen-Cheng Liu, Stephen R. Van Doren, Stephen J. Robinson
  • Publication number: 20190179762
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.
    Type: Application
    Filed: February 15, 2019
    Publication date: June 13, 2019
    Inventor: Christopher J. Hughes
  • Patent number: 10318295
    Abstract: A processor of an aspect includes a decode unit to decode a transaction end plus commit to persistence instruction. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to atomically ensure that data associated with all prior store to memory operations made to a persistent memory, which are to have been accepted to memory when performance of the instruction begins, but which are not necessarily to have been stored in the persistent memory when the performance of the instruction begins, are to be stored in the persistent memory before the instruction becomes globally visible. The execution unit, in response to the instruction, is also to atomically end a transactional memory transaction before the instruction becomes globally visible.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: June 11, 2019
    Assignee: Intel Corporation
    Inventors: Kshitij A. Doshi, Christopher J. Hughes
  • Publication number: 20190171396
    Abstract: A processor includes a processing core to generate a memory request for an application data in an application. The processor also includes a virtual page group memory management (VPGMM) unit coupled to the processing core to specify a caching priority (CP) to the application data for the application. The caching priority identifies importance of the application data in a cache.
    Type: Application
    Filed: November 13, 2018
    Publication date: June 6, 2019
    Inventors: Subramanya R. Dulloor, Rajesh M. Sankaran, David A. Koufaty, Christopher J. Hughes, Jong Soo Park, Sheng Li
  • Patent number: 10303525
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: May 28, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes, Robert Valentine, Milind B. Girkar, Hideki Ido, Youfeng Wu, Cheng Wang
  • Patent number: 10303606
    Abstract: Technologies for migration of dynamic home tile mapping are described. A cache controller can receive coherence messages from other processor cores on the die. The cache controller records locations from which the coherence messages originate and determine distances between the requested home tiles and the locations from which the coherence messages originate. The cache controller determines whether an average distance between a particular home tile, whose identifier is stored in the home tile table, exceeds a threshold. When the average distance exceeds the defined threshold, the cache controller migrates the particular home tile to another location.
    Type: Grant
    Filed: March 21, 2017
    Date of Patent: May 28, 2019
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Daehyun Kim, Jong Soo Park, Richard M. Yoo
  • Patent number: 10296459
    Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: May 21, 2019
    Assignee: Intel Corporation
    Inventors: Doddaballapur N. Jayasimha, Samantika S. Sury, Christopher J. Hughes, Jonas Svennebring, Yen-Cheng Liu, Stephen R. Van Doren, David A. Koufaty
  • Patent number: 10275257
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: May 22, 2017
    Date of Patent: April 30, 2019
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Publication number: 20190121642
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 20, 2018
    Publication date: April 25, 2019
    Inventors: Christopher J. HUGHES, Mikhail PLOTNIKOV, Andrey NARAIKIN, Robert VALENTINE
  • Publication number: 20190121643
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 20, 2018
    Publication date: April 25, 2019
    Inventors: Christopher J. HUGHES, Mikhail PLOTNIKOV, Andrey NARAIKIN, Robert VALENTINE
  • Publication number: 20190121644
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for DSX comprises execution hardware to execute instructions to begin and end a data speculative execution (DSX) and speculative instructions during the DSX, and DSX tracking hardware to track speculative memory accesses and detect ordering violations in a DSX of speculative instructions using a sequence number, addresses of instruction accesses, and whether an instruction being tracked is a write, and to trigger a mis-speculation upon an ordering violation.
    Type: Application
    Filed: December 24, 2014
    Publication date: April 25, 2019
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR
  • Patent number: 10268579
    Abstract: Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.
    Type: Grant
    Filed: April 1, 2017
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: Sanjeev Kumar, Christopher J. Hughes, Partha Kundu, Anthony Nguyen
  • Publication number: 20190102196
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
    Type: Application
    Filed: September 28, 2018
    Publication date: April 4, 2019
    Inventors: Raanan SADE, Robert VALENTINE, Bret TOLL, Christopher J. HUGHES, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY
  • Patent number: 10229670
    Abstract: Methods and systems to translate input labels of arcs of a network, corresponding to a sequence of states of the network, to a list of output grammar elements of the arcs, corresponding to a sequence of grammar elements. The network may include a plurality of speech recognition models combined with a weighted finite state machine transducer (WFST). Traversal may include active arc traversal, and may include active arc propagation. Arcs may be processed in parallel, including arcs originating from multiple source states and directed to a common destination state. Self-loops associated with states may be modeled within outgoing arcs of the states, which may reduce synchronization operations. Tasks may be ordered with respect to cache-data locality to associate tasks with processing threads based at least in part on whether another task associated with a corresponding data object was previously assigned to the thread.
    Type: Grant
    Filed: June 24, 2013
    Date of Patent: March 12, 2019
    Assignee: Intel Corporation
    Inventors: Kisun You, Christopher J. Hughes, Yen-Kuang Chen
  • Patent number: 10210091
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.
    Type: Grant
    Filed: February 15, 2017
    Date of Patent: February 19, 2019
    Assignee: Intel Corporation
    Inventor: Christopher J. Hughes