Patents by Inventor Suleyman Sair

Suleyman Sair has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20170357514
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Application
    Filed: August 25, 2017
    Publication date: December 14, 2017
    Applicant: lntel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Publication number: 20170322905
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Application
    Filed: April 24, 2017
    Publication date: November 9, 2017
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL, SULEYMAN SAIR
  • Patent number: 9804844
    Abstract: Instructions and logic provide vector load-op and/or store-op with stride functionality. Some embodiments, responsive to an instruction specifying: a set of loads, a second operation, destination register, operand register, memory address, and stride length; execution units read values in a mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the data element is loaded from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. Then the second operation is performed using corresponding data in the destination and operand registers to generate results. The instruction may be restarted after faults.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: October 31, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Publication number: 20170300326
    Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
    Type: Application
    Filed: February 21, 2017
    Publication date: October 19, 2017
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, SULEYMAN SAIR, KSHITIJ A. DOSHI, CHARLES R. YOUNT, BRET L. TOLL
  • Patent number: 9792119
    Abstract: Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.
    Type: Grant
    Filed: September 16, 2016
    Date of Patent: October 17, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Publication number: 20170269935
    Abstract: Instructions and logic provide vector loads and/or stores with stride and mask functionality. In one implementation, a processor is provided that includes decode circuitry to decode an instruction specifying a memory address and a stride length for a set of load operations corresponding to a first plurality of data elements of a destination register. The processor further includes one or more execution units, responsive to the decoded first instruction, to load a first data element from the memory address into a first data element of the destination register, and load a second data element from a memory address that is non-zero multiple of the stride length into a first data element of the destination register.
    Type: Application
    Filed: June 2, 2017
    Publication date: September 21, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Patent number: 9747101
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: August 29, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Publication number: 20170235572
    Abstract: A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.
    Type: Application
    Filed: May 3, 2017
    Publication date: August 17, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
  • Publication number: 20170185465
    Abstract: Systems, methods, and apparatuses for fault tolerance and detection are described. For example, an apparatus including circuitry to replicate input sources of an instruction; arithmetic logic unit (ALU) circuitry to execute the instruction with replicated input sources using single instruction, multiple data (SIMD) hardware to produce a packed data result; and comparison circuitry coupled to the ALU circuitry to evaluate the packed data result and output a singular data result into a destination of the instruction is described.
    Type: Application
    Filed: December 29, 2015
    Publication date: June 29, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount
  • Publication number: 20170177344
    Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into preliminary vector registers. The source data is to be unaligned as resident in the vector registers. The core includes logic to apply blend instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective interim vector registers, and to apply further blend instructions to contents of the interim vector registers to cause additional indexed elements from the structures to be loaded into respective source vector registers.
    Type: Application
    Filed: December 18, 2015
    Publication date: June 22, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Joonmoo Huh
  • Publication number: 20170177345
    Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from a plurality of structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into a plurality of preliminary vector registers with a first indexed layout of elements and a second indexed layout of elements. A plurality of the preliminary vector registers are to be loaded with the first indexed layout of elements. A common register of the preliminary vector registers are to be loaded with the second indexed layout of elements. The core also includes logic to apply permute instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective source vector registers.
    Type: Application
    Filed: December 18, 2015
    Publication date: June 22, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Joonmoo Huh
  • Publication number: 20170177356
    Abstract: Systems, methods, and apparatuses for strided access are described. In some embodiments, a plurality of registers are loaded with data from an array of structures. Then data elements that that are not needed in a permute operation are overwritten with index values with a write mask. The register now contains a mix of data and index values. When this same write mask is passed to the permute instruction which overwrites the index register as destination, the data values are preserved and index values are overwritten with data coming from the other two source registers as controlled by the index values.
    Type: Application
    Filed: December 18, 2015
    Publication date: June 22, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Joonmoo Huh
  • Publication number: 20170177355
    Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from structures in the source data to be loaded into a final register to be used to execute the instruction. The core also includes logic to load source data into a plurality of preliminary vector registers to align a defined element of one of the preliminary vector registers in a position that corresponds to a required position in the final register for execution. The core includes logic to apply permute instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the structures to be loaded into respective source vector registers.
    Type: Application
    Filed: December 18, 2015
    Publication date: June 22, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Joonmoo Huh
  • Patent number: 9672036
    Abstract: Instructions and logic provide vector loads and/or stores with stride and mask functionality. Some embodiments, responsive to an instruction specifying: a set of loads, destination register, mask register, memory address, and stride length; execution units read values in the mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the corresponding multiple of said stride length is generated according to the data field's position in the mask register to load the data element from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. These instructions can restart after faults.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: June 6, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Patent number: 9665371
    Abstract: Instructions and logic provide vector horizontal compare functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read values from data fields of the specified size in the source operand, corresponding to the mask and compare the values for equality. In some embodiments, responsive to a detection of inequality, a trap may be taken. In some alternative embodiments, a flag may be set. In other alternative embodiments, a mask field may be set to a masked state for the corresponding unequal value(s). In some embodiments, responsive to all unmasked data fields of the source operand being equal to a particular value, that value may be broadcast to all data fields of the specified size in the destination operand.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: May 30, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
  • Patent number: 9632980
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: April 25, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Suleyman Sair
  • Patent number: 9575757
    Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: February 21, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
  • Publication number: 20170003962
    Abstract: Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.
    Type: Application
    Filed: September 16, 2016
    Publication date: January 5, 2017
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Suleyman Sair, Charles R. Yount
  • Patent number: 9513917
    Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
    Type: Grant
    Filed: January 31, 2014
    Date of Patent: December 6, 2016
    Assignee: Intel Corporation
    Inventors: Robert C. Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert D. Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Edward Thomas Grochowski, Jonathan Cannon Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C. Abel, Mark Charney, Seth Abraham, Suleyman Sair, Andrew Thomas Forsyth, Lisa Wu, Charles Yount
  • Patent number: 9459866
    Abstract: A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: October 4, 2016
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll