Patents by Inventor Charles R. Yount

Charles R. Yount has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient zero-based decompression

Patent number: 10540177

Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.

Type: Grant

Filed: February 21, 2017

Date of Patent: January 21, 2020

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
Instructions and logic for load-indices-and-prefetch-scatters operations

Patent number: 10509726

Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform scatters, and prefetch (to a specified cache) contents of target locations for future scatters from arbitrary locations in memory. The execution unit includes logic to load, for each target location of a scatter or prefetch operation, an index value to be used in computing the address in memory for the operation. The index value may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction, the index value retrieved for the location, and a prefetch offset (for prefetch operations), with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction to be scattered to the memory.

Type: Grant

Filed: December 20, 2015

Date of Patent: December 17, 2019

Assignee: Intel Corporation

Inventors: Indraneil M. Gokhale, Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Antonio C. Valles
Providing vector horizontal compare functionality within a vector register

Patent number: 10318291

Abstract: A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.

Type: Grant

Filed: May 3, 2017

Date of Patent: June 11, 2019

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
Fault tolerance and detection by replication of input data and evaluating a packed data execution result

Patent number: 10248488

Abstract: Systems, methods, and apparatuses for fault tolerance and detection are described. For example, an apparatus including circuitry to replicate input sources of an instruction; arithmetic logic unit (ALU) circuitry to execute the instruction with replicated input sources using single instruction, multiple data (SIMD) hardware to produce a packed data result; and comparison circuitry coupled to the ALU circuitry to evaluate the packed data result and output a singular data result into a destination of the instruction is described.

Type: Grant

Filed: December 29, 2015

Date of Patent: April 2, 2019

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount
Instruction and logic for a vector format for processing computations

Patent number: 10061746

Abstract: A processor includes a front end to fetch an instruction. The instruction is to calculate a data point using inputs from a plurality of adjacent source data in a plurality of dimensions. The processor includes a decoder to decode the instruction. The processor also includes a core to, based on the decoded instruction, perform a plurality of tabular vector read operations to read the plurality of adjacent source data and perform a tabular vector calculation to execute the instruction. The tabular vector calculation is based upon results of performing the plurality of tabular vector read operations. The core is further to write results of the tabular vector calculation.

Type: Grant

Filed: September 26, 2014

Date of Patent: August 28, 2018

Assignee: Intel Corporation

Inventor: Charles R. Yount
Method to assess energy efficiency of HPC system operated with and without power constraints

Patent number: 9971391

Abstract: A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.

Type: Grant

Filed: December 23, 2015

Date of Patent: May 15, 2018

Assignee: Intel Corporation

Inventors: Devadatta Bodas, Meenakshi Arunachalam, Ilya Sharapov, Charles R. Yount, Scott B. Huck, Ramakrishna Huggahalli, Justin J. Song, Brian J. Griffith, Muralidhar Rajappa, Lingdan (Linda) Zeng
Instruction and logic to provide vector horizontal majority voting functionality

Patent number: 9928063

Abstract: Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.

Type: Grant

Filed: September 16, 2016

Date of Patent: March 27, 2018

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
INSTRUCTION AND LOGIC TO PROVIDE VECTOR SCATTER-OP AND GATHER-OP FUNCTIONALITY

Publication number: 20170357514

Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.

Type: Application

Filed: August 25, 2017

Publication date: December 14, 2017

Applicant: lntel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
Instruction and logic to provide stride-based vector load-op functionality with mask duplication

Patent number: 9804844

Abstract: Instructions and logic provide vector load-op and/or store-op with stride functionality. Some embodiments, responsive to an instruction specifying: a set of loads, a second operation, destination register, operand register, memory address, and stride length; execution units read values in a mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the data element is loaded from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. Then the second operation is performed using corresponding data in the destination and operand registers to generate results. The instruction may be restarted after faults.

Type: Grant

Filed: September 26, 2011

Date of Patent: October 31, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
EFFICIENT ZERO-BASED DECOMPRESSION

Publication number: 20170300326

Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.

Type: Application

Filed: February 21, 2017

Publication date: October 19, 2017

Inventors: ELMOUSTAPHA OULD-AHMED-VALL, SULEYMAN SAIR, KSHITIJ A. DOSHI, CHARLES R. YOUNT, BRET L. TOLL
Instruction and logic to provide vector horizontal majority voting functionality

Patent number: 9792119

Abstract: Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.

Type: Grant

Filed: September 16, 2016

Date of Patent: October 17, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
INSTRUCTION AND LOGIC TO PROVIDE VECTOR LOADS AND STORES WITH STRIDES AND MASKING FUNCTIONALITY

Publication number: 20170269935

Abstract: Instructions and logic provide vector loads and/or stores with stride and mask functionality. In one implementation, a processor is provided that includes decode circuitry to decode an instruction specifying a memory address and a stride length for a set of load operations corresponding to a first plurality of data elements of a destination register. The processor further includes one or more execution units, responsive to the decoded first instruction, to load a first data element from the memory address into a first data element of the destination register, and load a second data element from a memory address that is non-zero multiple of the stride length into a first data element of the destination register.

Type: Application

Filed: June 2, 2017

Publication date: September 21, 2017

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
Gather-op instruction to duplicate a mask and perform an operation on vector elements gathered via tracked offset-based gathering

Patent number: 9747101

Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.

Type: Grant

Filed: September 26, 2011

Date of Patent: August 29, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
PROVIDING VECTOR HORIZONTAL COMPARE FUNCTIONALITY WITHIN A VECTOR REGISTER

Publication number: 20170235572

Abstract: A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.

Type: Application

Filed: May 3, 2017

Publication date: August 17, 2017

Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
Method to assess energy efficiency of hpc system operated with & without power constraints

Publication number: 20170185132

Abstract: A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.

Type: Application

Filed: December 23, 2015

Publication date: June 29, 2017

Inventors: Devadatta Bodas, Meenakshi Arunachalam, Ilya Sharapov, Charles R. Yount, Scott B. Huck, Ramakrishna Huggahalli, Justin J. Song, Brian J. Griffith, Muralidhar Rajappa, Lingdan (Linda) Zeng
Systems, Methods, and Apparatuses for Fault Tolerance and Detection

Publication number: 20170185465

Abstract: Systems, methods, and apparatuses for fault tolerance and detection are described. For example, an apparatus including circuitry to replicate input sources of an instruction; arithmetic logic unit (ALU) circuitry to execute the instruction with replicated input sources using single instruction, multiple data (SIMD) hardware to produce a packed data result; and comparison circuitry coupled to the ALU circuitry to evaluate the packed data result and output a singular data result into a destination of the instruction is described.

Type: Application

Filed: December 29, 2015

Publication date: June 29, 2017

Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount
Instructions and Logic for Load-Indices-and-Prefetch-Gathers Operations

Publication number: 20170177349

Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform a gather, and prefetch (to a specified cache) elements for a future gather from arbitrary locations in memory. The execution unit includes logic to load, for each element to be gathered or prefetched, an index value to be used in computing the address in memory for the element. The index value may be retrieved from an array of indices that is identified for the instruction. The execution unit includes logic to compute the address based on the sum of a base address that is specified for the instruction and the index value that was retrieved for the data element, with or without scaling. The execution unit includes logic to store gathered data elements in contiguous locations in a destination vector register that is specified for the instruction.

Type: Application

Filed: December 21, 2015

Publication date: June 22, 2017

Inventors: Charles R. Yount, Antonio C. Valles, Indraneil M. Gokhale, Elmoustapha Ould-Ahmed-Vall
Instructions and Logic for Load-Indices-and-Gather Operations

Publication number: 20170177363

Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices and gather elements from random locations or locations in sparse memory based on those indices. The execution unit includes logic to load, for each data element to be gathered by the instruction, as needed, an index value to be used in computing the address in memory of a particular data element to be gathered. The index value may be retrieved from an array of indices that is identified for the instruction. The execution unit includes logic to compute the address as the sum of a base address that is specified for the instruction and the index value that was retrieved for the data element, with or without scaling. The execution unit includes logic to store the gathered data elements in contiguous locations in a destination vector register that is specified for the instruction.

Type: Application

Filed: December 22, 2015

Publication date: June 22, 2017

Inventors: Charles R. Yount, Indraneil M. Gokhale, Antonio C. Valles, Elmoustapha Ould-Ahmed-Vall
Instructions and Logic for Load-Indices-and-Prefetch-Scatters Operations

Publication number: 20170177346

Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform scatters, and prefetch (to a specified cache) contents of target locations for future scatters from arbitrary locations in memory. The execution unit includes logic to load, for each target location of a scatter or prefetch operation, an index value to be used in computing the address in memory for the operation. The index value may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction, the index value retrieved for the location, and a prefetch offset (for prefetch operations), with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction to be scattered to the memory.

Type: Application

Filed: December 20, 2015

Publication date: June 22, 2017

Inventors: Indraneil M. Gokhale, Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Antonio C. Valles
Instructions and Logic for Load-Indices-and-Scatter Operations

Publication number: 20170177360

Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices and scatter elements to locations in sparse memory based on those indices. The execution unit includes logic to load, for each data element to be scattered by the instruction, as needed, an index value to be used in computing the address in memory at which a particular data element is to be written. The index values may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction and the index values retrieved for the data element locations, with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction and store them to the computed locations.

Type: Application

Filed: December 21, 2015

Publication date: June 22, 2017

Inventors: Indraneil M. Gokhale, Charles R. Yount, Antonio C. Valles, Elmoustapha Ould-Ahmed-Vall

1 2 next