Patents by Inventor Mikhail Plotnikov

Mikhail Plotnikov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10740100
    Abstract: A method performed by a processor includes receiving an instruction. The instruction indicating a source operand, indicating a stride, indicating at least one set of strided data element positions out of all sets of strided data element positions for the indicated stride, and indicating at least one destination packed data register. The method also includes storing, in response to the instruction, for each of the indicated at least one set of strided data element positions, a corresponding result packed data operand, in a corresponding destination packed data register of the processor. Each result packed data operand including a plurality of data elements, which are from the corresponding indicated set of strided data element positions of the source operand. The strided data element positions of the set are separated from one another by integer multiples of the indicated stride. Other methods, processors, systems, and machine readable media are also disclosed.
    Type: Grant
    Filed: January 29, 2019
    Date of Patent: August 11, 2020
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20200210183
    Abstract: Systems, apparatuses and methods may provide for technology that identifies that an iterative loop includes a first code portion that executes in response to a condition being satisfied, generates a first vector mask that is to represent one or more instances of the condition being satisfied for one or more values of a first vector of values, and one or more instances of the condition being unsatisfied for the first vector of values, where the first vector of values is to correspond to one or more first iterations of the iterative loop, and conducts a vectorization process of the iterative loop based on the first vector mask.
    Type: Application
    Filed: March 6, 2020
    Publication date: July 2, 2020
    Applicant: Intel Corporation
    Inventors: Ilya Burylov, Mikhail Plotnikov, Hideki Ido, Ruslan Arutyunyan
  • Publication number: 20200174790
    Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements and determining a number of instances of each distinct data value within the vector. A system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, a source vector identifier, and an immediate value, wherein the execution circuit is to, for each data element position of a source vector, determine a number of matching data element positions in the source vector storing a same data value as stored at the data element position, the matching data element positions located between the data element position and a least significant data element position of the source vector, and store in a corresponding data element position of a destination vector identified by the destination vector identifier, a value representing the number of matching data element positions.
    Type: Application
    Filed: June 30, 2017
    Publication date: June 4, 2020
    Applicant: Intel Corporation
    Inventors: Mikhail PLOTNIKOV, Christopher J. HUGHES, Andrey NARAIKIN
  • Publication number: 20200142699
    Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.
    Type: Application
    Filed: June 30, 2017
    Publication date: May 7, 2020
    Applicant: Intel Corporation
    Inventors: William M. BROWN, Mikhail PLOTNIKOV, Christopher J. HUGHES
  • Publication number: 20200073659
    Abstract: Method and apparatus for converting scatter control elements to gather control elements used to permute vector data elements is described herein. One embodiment of a method includes decoding an instruction having a field for a source vector operand storing a plurality of data elements, wherein each of the data element includes a set bit and a plurality of unset bits. Each of the set bits is set at a unique bit offset within the respective data element. The method further includes executing the decoded instruction by generating, for each bit offset across the plurality of data elements in the source vector operand, a count of unset bits between a first data element having a bit set at a current bit offset and a second data element comprising a least significant bit (LSB). A set of control elements is generated based on the count of unset bits generated for each bit offset.
    Type: Application
    Filed: March 31, 2017
    Publication date: March 5, 2020
    Applicant: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 10545761
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: January 28, 2020
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 10503505
    Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.
    Type: Grant
    Filed: April 2, 2018
    Date of Patent: December 10, 2019
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
  • Publication number: 20190369992
    Abstract: A processor includes a decode circuit to decode an instruction into a decoded instruction and an execution circuit to execute the decoded instruction to sum one or more values of one or more contiguous elements of an input vector that form a block to produce an accumulated value for the block and store the accumulated value for the block in a destination vector, where an input mask dictates the one or more contiguous elements of the input vector that form the block.
    Type: Application
    Filed: February 17, 2017
    Publication date: December 5, 2019
    Inventor: Mikhail PLOTNIKOV
  • Publication number: 20190347104
    Abstract: A processor includes a decode circuit to decode an instruction into a decoded instruction and an execution circuit to execute the decoded instruction to access a first bit of a first input vector located at a bit position indicated by an element of a second input vector, stride over bits of the first input vector using a stride to access bits of the first input vector that are located at a strided bit position with respect to the first bit of the first input vector, and store the first bit of the first input vector and the bits of the first input vector that are located at a strided bit position with respect to the first bit of the first input vector as consecutive bits in a destination vector.
    Type: Application
    Filed: February 28, 2017
    Publication date: November 14, 2019
    Inventors: Mikhail PLOTNIKOV, Igor ERMOLAEV
  • Publication number: 20190347101
    Abstract: Disclosed embodiments relate to vector compress2 and expand2 instructions with two memory locations. In one example, a system includes a memory and a processor that includes circuits to fetch, decode, and execute the instruction that includes an opcode, a first destination operand identifier, a second operand identifier, a source operand identifier, and a control mask, wherein, for each element of the source operand, the execution circuit is to generate a result by performing one of compression and expansion of the element; and, based on the value of a bit of the control mask corresponding to the element, store the result to a first location identified by the first destination operand identifier and increment the first destination operand identifier by a size of the result, and, otherwise, store the result to a second location identified by the second destination operand identifier and increment the second destination operand identifier by the size of the result.
    Type: Application
    Filed: April 6, 2017
    Publication date: November 14, 2019
    Applicant: Intel Corporation
    Inventors: Mikhail PLOTNIKOV, Igor ERMOLAEV, Alexander BOBYR
  • Patent number: 10452398
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: October 22, 2019
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Publication number: 20190278577
    Abstract: Methods, apparatus, and system to optimize compilation of source code into vectorized compiled code, notwithstanding the presence of output dependencies which might otherwise preclude vectorization.
    Type: Application
    Filed: July 1, 2016
    Publication date: September 12, 2019
    Inventors: Mikhail PLOTNIKOV, Hideki IDO, Xinmin TIAN, Sergey PREIS, Milind B. GIRKAR, Maxim SHUTOV
  • Publication number: 20190227798
    Abstract: A method performed by a processor includes receiving an instruction. The instruction indicating a source operand, indicating a stride, indicating at least one set of strided data element positions out of all sets of strided data element positions for the indicated stride, and indicating at least one destination packed data register. The method also includes storing, in response to the instruction, for each of the indicated at least one set of strided data element positions, a corresponding result packed data operand, in a corresponding destination packed data register of the processor. Each result packed data operand including a plurality of data elements, which are from the corresponding indicated set of strided data element positions of the source operand. The strided data element positions of the set are separated from one another by integer multiples of the indicated stride. Other methods, processors, systems, and machine readable media are also disclosed.
    Type: Application
    Filed: January 29, 2019
    Publication date: July 25, 2019
    Inventors: Mikhail Plotnikov, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20190146792
    Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.
    Type: Application
    Filed: January 16, 2019
    Publication date: May 16, 2019
    Inventors: MIKHAIL PLOTNIKOV, IGOR ERMOLAEV
  • Patent number: 10282204
    Abstract: Systems, methods, and apparatuses for strided loads are described. In an embodiment, an instruction to include at least an opcode, a field for at least two packed data source operands, a field for a packed data destination operand, and an immediate is designated as a strided load instruction. This instruction is executed to load packed data elements from the at least two packed data source operands using a stride and storing results of the strided loads in the packed data destination operand starting from a defined position determined in part from the immediate.
    Type: Grant
    Filed: July 2, 2016
    Date of Patent: May 7, 2019
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20190129721
    Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.
    Type: Application
    Filed: December 27, 2018
    Publication date: May 2, 2019
    Inventors: MIKHAIL PLOTNIKOV, ANDREY NARAIKIN, ELMOUSTAPHA OULD-AHMED-VALL
  • Publication number: 20190121642
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 20, 2018
    Publication date: April 25, 2019
    Inventors: Christopher J. HUGHES, Mikhail PLOTNIKOV, Andrey NARAIKIN, Robert VALENTINE
  • Publication number: 20190121643
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 20, 2018
    Publication date: April 25, 2019
    Inventors: Christopher J. HUGHES, Mikhail PLOTNIKOV, Andrey NARAIKIN, Robert VALENTINE
  • Patent number: 10268479
    Abstract: Systems, apparatuses, and methods for executing an instruction. The instruction includes fields for a first source operand, a second source operand, and a destination operand. A decoded instruction causes a reduction of broadcasted packed data elements of a first packed data source with a reduction operation and store a result of each of the reductions in a packed data destination, wherein the packed data elements of the first packed data source to be used in the reduction are dictated by a result of a comparison of broadcasted values of packed data elements stored in a second packed data source to the packed data elements stored in the second packed data source without broadcasting.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Jesus Corbal, Robert Valentine
  • Patent number: 10223113
    Abstract: A processor of an aspect includes a decode unit to decode an instruction indicating a first source packed data operand including at least four data elements, a source mask including at least four mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand having a series of at least two unmasked result data elements. Each of the unmasked result data elements stores a value of a different one of at least two consecutive data elements of the first source packed data operand in a relative order. All masked result elements, which are between a nearest corresponding pair of unmasked result data elements, have a same value as an unmasked result data element of the corresponding pair, which is closest to a first end of the result packed data operand. The masked result data elements correspond to masked mask elements.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov