Patents by Inventor Mikhail Plotnikov

Mikhail Plotnikov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Deinterleave strided data elements processors, methods, systems, and instructions

Patent number: 10740100

Abstract: A method performed by a processor includes receiving an instruction. The instruction indicating a source operand, indicating a stride, indicating at least one set of strided data element positions out of all sets of strided data element positions for the indicated stride, and indicating at least one destination packed data register. The method also includes storing, in response to the instruction, for each of the indicated at least one set of strided data element positions, a corresponding result packed data operand, in a corresponding destination packed data register of the processor. Each result packed data operand including a plurality of data elements, which are from the corresponding indicated set of strided data element positions of the source operand. The strided data element positions of the set are separated from one another by integer multiples of the indicated stride. Other methods, processors, systems, and machine readable media are also disclosed.

Type: Grant

Filed: January 29, 2019

Date of Patent: August 11, 2020

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Elmoustapha Ould-Ahmed-Vall
VECTORIZATION OF LOOPS BASED ON VECTOR MASKS AND VECTOR COUNT DISTANCES

Publication number: 20200210183

Abstract: Systems, apparatuses and methods may provide for technology that identifies that an iterative loop includes a first code portion that executes in response to a condition being satisfied, generates a first vector mask that is to represent one or more instances of the condition being satisfied for one or more values of a first vector of values, and one or more instances of the condition being unsatisfied for the first vector of values, where the first vector of values is to correspond to one or more first iterations of the iterative loop, and conducts a vectorization process of the iterative loop based on the first vector mask.

Type: Application

Filed: March 6, 2020

Publication date: July 2, 2020

Applicant: Intel Corporation

Inventors: Ilya Burylov, Mikhail Plotnikov, Hideki Ido, Ruslan Arutyunyan
METHOD AND APPARATUS FOR VECTORIZING HISTOGRAM LOOPS

Publication number: 20200174790

Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements and determining a number of instances of each distinct data value within the vector. A system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, a source vector identifier, and an immediate value, wherein the execution circuit is to, for each data element position of a source vector, determine a number of matching data element positions in the source vector storing a same data value as stored at the data element position, the matching data element positions located between the data element position and a least significant data element position of the source vector, and store in a corresponding data element position of a destination vector identified by the destination vector identifier, a value representing the number of matching data element positions.

Type: Application

Filed: June 30, 2017

Publication date: June 4, 2020

Applicant: Intel Corporation

Inventors: Mikhail PLOTNIKOV, Christopher J. HUGHES, Andrey NARAIKIN
METHOD AND APPARATUS FOR DATA-READY MEMORY OPERATIONS

Publication number: 20200142699

Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.

Type: Application

Filed: June 30, 2017

Publication date: May 7, 2020

Applicant: Intel Corporation

Inventors: William M. BROWN, Mikhail PLOTNIKOV, Christopher J. HUGHES
METHOD AND APPARATUS FOR CONVERTING SCATTER CONTROL ELEMENTS TO GATHER CONTROL ELEMENTS USED TO SORT VECTOR DATA ELEMENTS

Publication number: 20200073659

Abstract: Method and apparatus for converting scatter control elements to gather control elements used to permute vector data elements is described herein. One embodiment of a method includes decoding an instruction having a field for a source vector operand storing a plurality of data elements, wherein each of the data element includes a set bit and a plurality of unset bits. Each of the set bits is set at a unique bit offset within the respective data element. The method further includes executing the decoded instruction by generating, for each bit offset across the plurality of data elements in the source vector operand, a count of unset bits between a first data element having a bit set at a current bit offset and a second data element comprising a least significant bit (LSB). A set of control elements is generated based on the count of unset bits generated for each bit offset.

Type: Application

Filed: March 31, 2017

Publication date: March 5, 2020

Applicant: Intel Corporation

Inventor: Mikhail Plotnikov
Methods, apparatus, instructions and logic to provide permute controls with leading zero count functionality

Patent number: 10545761

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Grant

Filed: December 20, 2018

Date of Patent: January 28, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
Read and write masks update instruction for vectorization of recursive computations over independent data

Patent number: 10503505

Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.

Type: Grant

Filed: April 2, 2018

Date of Patent: December 10, 2019

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
VECTOR INSTRUCTION FOR ACCUMULATING AND COMPRESSING VALUES BASED ON INPUT MASK

Publication number: 20190369992

Abstract: A processor includes a decode circuit to decode an instruction into a decoded instruction and an execution circuit to execute the decoded instruction to sum one or more values of one or more contiguous elements of an input vector that form a block to produce an accumulated value for the block and store the accumulated value for the block in a destination vector, where an input mask dictates the one or more contiguous elements of the input vector that form the block.

Type: Application

Filed: February 17, 2017

Publication date: December 5, 2019

Inventor: Mikhail PLOTNIKOV
STRIDESHIFT INSTRUCTION FOR TRANSPOSING BITS INSIDE VECTOR REGISTER

Publication number: 20190347104

Abstract: A processor includes a decode circuit to decode an instruction into a decoded instruction and an execution circuit to execute the decoded instruction to access a first bit of a first input vector located at a bit position indicated by an element of a second input vector, stride over bits of the first input vector using a stride to access bits of the first input vector that are located at a strided bit position with respect to the first bit of the first input vector, and store the first bit of the first input vector and the bits of the first input vector that are located at a strided bit position with respect to the first bit of the first input vector as consecutive bits in a destination vector.

Type: Application

Filed: February 28, 2017

Publication date: November 14, 2019

Inventors: Mikhail PLOTNIKOV, Igor ERMOLAEV
VECTOR COMPRESS2 AND EXPAND2 INSTRUCTIONS WITH TWO MEMORY LOCATIONS

Publication number: 20190347101

Abstract: Disclosed embodiments relate to vector compress2 and expand2 instructions with two memory locations. In one example, a system includes a memory and a processor that includes circuits to fetch, decode, and execute the instruction that includes an opcode, a first destination operand identifier, a second operand identifier, a source operand identifier, and a control mask, wherein, for each element of the source operand, the execution circuit is to generate a result by performing one of compression and expansion of the element; and, based on the value of a bit of the control mask corresponding to the element, store the result to a first location identified by the first destination operand identifier and increment the first destination operand identifier by a size of the result, and, otherwise, store the result to a second location identified by the second destination operand identifier and increment the second destination operand identifier by the size of the result.

Type: Application

Filed: April 6, 2017

Publication date: November 14, 2019

Applicant: Intel Corporation

Inventors: Mikhail PLOTNIKOV, Igor ERMOLAEV, Alexander BOBYR
Methods, apparatus, instructions and logic to provide permute controls with leading zero count functionality

Patent number: 10452398

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Grant

Filed: December 20, 2018

Date of Patent: October 22, 2019

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
VECTORIZE STORE INSTRUCTIONS METHOD AND APPARATUS

Publication number: 20190278577

Abstract: Methods, apparatus, and system to optimize compilation of source code into vectorized compiled code, notwithstanding the presence of output dependencies which might otherwise preclude vectorization.

Type: Application

Filed: July 1, 2016

Publication date: September 12, 2019

Inventors: Mikhail PLOTNIKOV, Hideki IDO, Xinmin TIAN, Sergey PREIS, Milind B. GIRKAR, Maxim SHUTOV
DEINTERLEAVE STRIDED DATA ELEMENTS PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20190227798

Abstract: A method performed by a processor includes receiving an instruction. The instruction indicating a source operand, indicating a stride, indicating at least one set of strided data element positions out of all sets of strided data element positions for the indicated stride, and indicating at least one destination packed data register. The method also includes storing, in response to the instruction, for each of the indicated at least one set of strided data element positions, a corresponding result packed data operand, in a corresponding destination packed data register of the processor. Each result packed data operand including a plurality of data elements, which are from the corresponding indicated set of strided data element positions of the source operand. The strided data element positions of the set are separated from one another by integer multiples of the indicated stride. Other methods, processors, systems, and machine readable media are also disclosed.

Type: Application

Filed: January 29, 2019

Publication date: July 25, 2019

Inventors: Mikhail Plotnikov, Elmoustapha Ould-Ahmed-Vall
APPARATUSES, METHODS, AND SYSTEMS FOR ELEMENT SORTING OF VECTORS

Publication number: 20190146792

Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.

Type: Application

Filed: January 16, 2019

Publication date: May 16, 2019

Inventors: MIKHAIL PLOTNIKOV, IGOR ERMOLAEV
Systems, apparatuses, and methods for strided load

Patent number: 10282204

Abstract: Systems, methods, and apparatuses for strided loads are described. In an embodiment, an instruction to include at least an opcode, a field for at least two packed data source operands, a field for a packed data destination operand, and an immediate is designated as a strided load instruction. This instruction is executed to load packed data elements from the at least two packed data source operands using a stride and storing results of the strided loads in the packed data destination operand starting from a defined position determined in part from the immediate.

Type: Grant

Filed: July 2, 2016

Date of Patent: May 7, 2019

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Elmoustapha Ould-Ahmed-Vall
COLLAPSING OF MULTIPLE NESTED LOOPS, METHODS, AND INSTRUCTIONS

Publication number: 20190129721

Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.

Type: Application

Filed: December 27, 2018

Publication date: May 2, 2019

Inventors: MIKHAIL PLOTNIKOV, ANDREY NARAIKIN, ELMOUSTAPHA OULD-AHMED-VALL
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY

Publication number: 20190121642

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Application

Filed: December 20, 2018

Publication date: April 25, 2019

Inventors: Christopher J. HUGHES, Mikhail PLOTNIKOV, Andrey NARAIKIN, Robert VALENTINE
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY

Publication number: 20190121643

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Application

Filed: December 20, 2018

Publication date: April 25, 2019

Inventors: Christopher J. HUGHES, Mikhail PLOTNIKOV, Andrey NARAIKIN, Robert VALENTINE
Systems, apparatuses, and methods for broadcast compare addition

Patent number: 10268479

Abstract: Systems, apparatuses, and methods for executing an instruction. The instruction includes fields for a first source operand, a second source operand, and a destination operand. A decoded instruction causes a reduction of broadcasted packed data elements of a first packed data source with a reduction operation and store a result of each of the reductions in a packed data destination, wherein the packed data elements of the first packed data source to be used in the reduction are dictated by a result of a comparison of broadcasted values of packed data elements stored in a second packed data source to the packed data elements stored in the second packed data source without broadcasting.

Type: Grant

Filed: December 30, 2016

Date of Patent: April 23, 2019

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Jesus Corbal, Robert Valentine
Processors, methods, systems, and instructions to store consecutive source elements to unmasked result elements with propagation to masked result elements

Patent number: 10223113

Abstract: A processor of an aspect includes a decode unit to decode an instruction indicating a first source packed data operand including at least four data elements, a source mask including at least four mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand having a series of at least two unmasked result data elements. Each of the unmasked result data elements stores a value of a different one of at least two consecutive data elements of the first source packed data operand in a relative order. All masked result elements, which are between a nearest corresponding pair of unmasked result data elements, have a same value as an unmasked result data element of the corresponding pair, which is closest to a first end of the result packed data operand. The masked result data elements correspond to masked mask elements.

Type: Grant

Filed: March 27, 2014

Date of Patent: March 5, 2019

Assignee: Intel Corporation

Inventor: Mikhail Plotnikov

prev 1 2 3 4 5 next