Patents by Inventor Charles R. Yount

Charles R. Yount has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10540177
    Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
    Type: Grant
    Filed: February 21, 2017
    Date of Patent: January 21, 2020
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
  • Patent number: 10509726
    Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform scatters, and prefetch (to a specified cache) contents of target locations for future scatters from arbitrary locations in memory. The execution unit includes logic to load, for each target location of a scatter or prefetch operation, an index value to be used in computing the address in memory for the operation. The index value may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction, the index value retrieved for the location, and a prefetch offset (for prefetch operations), with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction to be scattered to the memory.
    Type: Grant
    Filed: December 20, 2015
    Date of Patent: December 17, 2019
    Assignee: Intel Corporation
    Inventors: Indraneil M. Gokhale, Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Antonio C. Valles
  • Patent number: 10318291
    Abstract: A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.
    Type: Grant
    Filed: May 3, 2017
    Date of Patent: June 11, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
  • Patent number: 10248488
    Abstract: Systems, methods, and apparatuses for fault tolerance and detection are described. For example, an apparatus including circuitry to replicate input sources of an instruction; arithmetic logic unit (ALU) circuitry to execute the instruction with replicated input sources using single instruction, multiple data (SIMD) hardware to produce a packed data result; and comparison circuitry coupled to the ALU circuitry to evaluate the packed data result and output a singular data result into a destination of the instruction is described.
    Type: Grant
    Filed: December 29, 2015
    Date of Patent: April 2, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount
  • Patent number: 10061746
    Abstract: A processor includes a front end to fetch an instruction. The instruction is to calculate a data point using inputs from a plurality of adjacent source data in a plurality of dimensions. The processor includes a decoder to decode the instruction. The processor also includes a core to, based on the decoded instruction, perform a plurality of tabular vector read operations to read the plurality of adjacent source data and perform a tabular vector calculation to execute the instruction. The tabular vector calculation is based upon results of performing the plurality of tabular vector read operations. The core is further to write results of the tabular vector calculation.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: August 28, 2018
    Assignee: Intel Corporation
    Inventor: Charles R. Yount
  • Patent number: 9971391
    Abstract: A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.
    Type: Grant
    Filed: December 23, 2015
    Date of Patent: May 15, 2018
    Assignee: Intel Corporation
    Inventors: Devadatta Bodas, Meenakshi Arunachalam, Ilya Sharapov, Charles R. Yount, Scott B. Huck, Ramakrishna Huggahalli, Justin J. Song, Brian J. Griffith, Muralidhar Rajappa, Lingdan (Linda) Zeng
  • Patent number: 9928063
    Abstract: Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.
    Type: Grant
    Filed: September 16, 2016
    Date of Patent: March 27, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Publication number: 20170357514
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Application
    Filed: August 25, 2017
    Publication date: December 14, 2017
    Applicant: lntel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Patent number: 9804844
    Abstract: Instructions and logic provide vector load-op and/or store-op with stride functionality. Some embodiments, responsive to an instruction specifying: a set of loads, a second operation, destination register, operand register, memory address, and stride length; execution units read values in a mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the data element is loaded from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. Then the second operation is performed using corresponding data in the destination and operand registers to generate results. The instruction may be restarted after faults.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: October 31, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Publication number: 20170300326
    Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
    Type: Application
    Filed: February 21, 2017
    Publication date: October 19, 2017
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, SULEYMAN SAIR, KSHITIJ A. DOSHI, CHARLES R. YOUNT, BRET L. TOLL
  • Patent number: 9792119
    Abstract: Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.
    Type: Grant
    Filed: September 16, 2016
    Date of Patent: October 17, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Publication number: 20170269935
    Abstract: Instructions and logic provide vector loads and/or stores with stride and mask functionality. In one implementation, a processor is provided that includes decode circuitry to decode an instruction specifying a memory address and a stride length for a set of load operations corresponding to a first plurality of data elements of a destination register. The processor further includes one or more execution units, responsive to the decoded first instruction, to load a first data element from the memory address into a first data element of the destination register, and load a second data element from a memory address that is non-zero multiple of the stride length into a first data element of the destination register.
    Type: Application
    Filed: June 2, 2017
    Publication date: September 21, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Patent number: 9747101
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: August 29, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Publication number: 20170235572
    Abstract: A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.
    Type: Application
    Filed: May 3, 2017
    Publication date: August 17, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
  • Publication number: 20170185132
    Abstract: A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.
    Type: Application
    Filed: December 23, 2015
    Publication date: June 29, 2017
    Inventors: Devadatta Bodas, Meenakshi Arunachalam, Ilya Sharapov, Charles R. Yount, Scott B. Huck, Ramakrishna Huggahalli, Justin J. Song, Brian J. Griffith, Muralidhar Rajappa, Lingdan (Linda) Zeng
  • Publication number: 20170185465
    Abstract: Systems, methods, and apparatuses for fault tolerance and detection are described. For example, an apparatus including circuitry to replicate input sources of an instruction; arithmetic logic unit (ALU) circuitry to execute the instruction with replicated input sources using single instruction, multiple data (SIMD) hardware to produce a packed data result; and comparison circuitry coupled to the ALU circuitry to evaluate the packed data result and output a singular data result into a destination of the instruction is described.
    Type: Application
    Filed: December 29, 2015
    Publication date: June 29, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount
  • Publication number: 20170177349
    Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform a gather, and prefetch (to a specified cache) elements for a future gather from arbitrary locations in memory. The execution unit includes logic to load, for each element to be gathered or prefetched, an index value to be used in computing the address in memory for the element. The index value may be retrieved from an array of indices that is identified for the instruction. The execution unit includes logic to compute the address based on the sum of a base address that is specified for the instruction and the index value that was retrieved for the data element, with or without scaling. The execution unit includes logic to store gathered data elements in contiguous locations in a destination vector register that is specified for the instruction.
    Type: Application
    Filed: December 21, 2015
    Publication date: June 22, 2017
    Inventors: Charles R. Yount, Antonio C. Valles, Indraneil M. Gokhale, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20170177363
    Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices and gather elements from random locations or locations in sparse memory based on those indices. The execution unit includes logic to load, for each data element to be gathered by the instruction, as needed, an index value to be used in computing the address in memory of a particular data element to be gathered. The index value may be retrieved from an array of indices that is identified for the instruction. The execution unit includes logic to compute the address as the sum of a base address that is specified for the instruction and the index value that was retrieved for the data element, with or without scaling. The execution unit includes logic to store the gathered data elements in contiguous locations in a destination vector register that is specified for the instruction.
    Type: Application
    Filed: December 22, 2015
    Publication date: June 22, 2017
    Inventors: Charles R. Yount, Indraneil M. Gokhale, Antonio C. Valles, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20170177346
    Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform scatters, and prefetch (to a specified cache) contents of target locations for future scatters from arbitrary locations in memory. The execution unit includes logic to load, for each target location of a scatter or prefetch operation, an index value to be used in computing the address in memory for the operation. The index value may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction, the index value retrieved for the location, and a prefetch offset (for prefetch operations), with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction to be scattered to the memory.
    Type: Application
    Filed: December 20, 2015
    Publication date: June 22, 2017
    Inventors: Indraneil M. Gokhale, Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Antonio C. Valles
  • Publication number: 20170177360
    Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices and scatter elements to locations in sparse memory based on those indices. The execution unit includes logic to load, for each data element to be scattered by the instruction, as needed, an index value to be used in computing the address in memory at which a particular data element is to be written. The index values may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction and the index values retrieved for the data element locations, with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction and store them to the computed locations.
    Type: Application
    Filed: December 21, 2015
    Publication date: June 22, 2017
    Inventors: Indraneil M. Gokhale, Charles R. Yount, Antonio C. Valles, Elmoustapha Ould-Ahmed-Vall