Patents by Inventor Alexander Y. Ostanevich

Alexander Y. Ostanevich has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus to create register windows for parallel iterations to achieve high performance in HW-SW codesigned loop accelerator

Patent number: 10241801

Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.

Type: Grant

Filed: December 23, 2016

Date of Patent: March 26, 2019

Assignee: INTEL CORPORATION

Inventors: Jayesh Iyer, Sergey P. Scherbinin, Alexander Y. Ostanevich, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
Method to do control speculation on loads in a high performance strand-based loop accelerator

Patent number: 10241789

Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.

Type: Grant

Filed: December 27, 2016

Date of Patent: March 26, 2019

Assignee: INTEL CORPORATION

Inventors: Alexander Y. Ostanevich, Sergey P. Scherbinin, Jayesh Iyer, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
Apparatus and methods to support counted loop exits in a multi-strand loop processor

Patent number: 10241794

Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.

Type: Grant

Filed: December 27, 2016

Date of Patent: March 26, 2019

Assignee: Intel Corporation

Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
Method and apparatus to efficiently handle allocation of memory ordering buffers in a multi-strand out-of-order loop processor

Patent number: 10235171

Abstract: An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.

Type: Grant

Filed: December 27, 2016

Date of Patent: March 19, 2019

Assignee: INTEL CORPORATION

Inventors: Alexander Y. Ostanevich, Jayesh Iyer, Sergey P. Scherbinin, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
METHOD AND APPARATUS TO EFFICIENTLY HANDLE ALLOCATION OF MEMORY ORDERING BUFFERS IN A MULTI-STRAND OUT-OF-ORDER LOOP PROCESSOR

Publication number: 20180181397

Abstract: An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.

Type: Application

Filed: December 27, 2016

Publication date: June 28, 2018

Inventors: Alexander Y. OSTANEVICH, Jayesh IYER, Sergey P. SCHERBININ, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
APPARATUS AND METHODS TO SUPPORT COUNTED LOOP EXITS IN A MULTI-STRAND LOOP PROCESSOR

Publication number: 20180181400

Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.

Type: Application

Filed: December 27, 2016

Publication date: June 28, 2018

Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
METHOD TO DO CONTROL SPECULATION ON LOADS IN A HIGH PERFORMANCE STRAND-BASED LOOP ACCELERATOR

Publication number: 20180181396

Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.

Type: Application

Filed: December 27, 2016

Publication date: June 28, 2018

Inventors: Alexander Y. OSTANEVICH, Sergey P. SCHERBININ, Jayesh IYER, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
METHOD AND APPARATUS TO CREATE REGISTER WINDOWS FOR PARALLEL ITERATIONS TO ACHIEVE HIGH PERFORMANCE IN HW-SW CODESIGNED LOOP ACCELERATOR

Publication number: 20180181405

Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.

Type: Application

Filed: December 23, 2016

Publication date: June 28, 2018

Inventors: Jayesh IYER, Sergey P. SCHERBININ, Alexander Y. OSTANEVICH, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
APPARATUS AND METHODS OF DECOMPOSING LOOPS TO IMPROVE PERFORMANCE AND POWER EFFICIENCY

Publication number: 20180181398

Abstract: Embodiments described herein relate to apparatus and methods for decomposing loops to improve performance and power efficiency. In one embodiment, a processor includes: a loop accelerator including a plurality of strand execution circuits, a binary translator to: receive a plurality of instructions from an instruction storage, to determine whether the plurality of instructions include loop instructions, and, in response to determining that they do, to divide the loop instructions into two or more jobs using at least one job creation rule, to assign the two or more jobs to two or more strands using at least one strand creation rule, and to cause the loop accelerator to execute at least two of the two or more strands in parallel using the plurality of strand execution circuits.

Type: Application

Filed: December 28, 2016

Publication date: June 28, 2018

Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
HARDWARE-ASSISTED SOFTWARE VERIFICATION AND SECURE EXECUTION

Publication number: 20170090929

Abstract: In an example, there is disclosed a computing apparatus, including a processor operable to execute a plurality of instructions forming a program; and a verification engine, operable to: receive an execution control data (ECD) for the program; and monitor execution of only some instructions of the program to ensure that they are consistent with the ECD. In some embodiments, the monitoring engine may include a correctness monitoring unit (CMU) in processor hardware. There is also disclosed one or more computer-readable storage mediums having stored thereon executable instructions for providing a monitoring engine, and a computer-implemented method of providing a monitoring engine.

Type: Application

Filed: September 25, 2015

Publication date: March 30, 2017

Applicant: McAfee, Inc.

Inventors: Igor Muttik, Boris A. Babayan, Alexander V. Ermolovich, Alexander Y. Ostanevich, Sergey A. Rozhkov
Data dependence testing for loop fusion with code replication, array contraction, and loop interchange

Patent number: 8677338

Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.

Type: Grant

Filed: June 4, 2008

Date of Patent: March 18, 2014

Assignee: Intel Corporation

Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
Improving data locality and parallelism by code replication

Patent number: 8453134

Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.

Type: Grant

Filed: June 4, 2008

Date of Patent: May 28, 2013

Assignee: Intel Corporation

Inventors: John L. Ng, Alexander Y. Ostanevich, Alexander L. Sushentsov
IMPROVING DATA LOCALITY AND PARALLELISM BY CODE REPLICATION AND ARRAY CONTRACTION

Publication number: 20090307674

Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.

Type: Application

Filed: June 4, 2008

Publication date: December 10, 2009

Inventors: John L. NG, Alexander Y. OSTANEVICH, Alexander L. SUSHENTSOV
DATA DEPENDENCE TESTING FOR LOOP FUSION WITH CODE REPLICATION, ARRAY CONTRACTION, AND LOOP INTERCHANGE

Publication number: 20090307675

Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.

Type: Application

Filed: June 4, 2008

Publication date: December 10, 2009

Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
Hardware supported software pipelined loop prologue optimization

Patent number: 6954927

Abstract: A method for optimizing a software pipelineable loop in a software code is provided. The loop comprises one or more pipelined stages and one or more loop operations. The method comprises evaluating an initiation interval time (IN) for a pipelined stage of the loop. A loop operation time latency (Tld) and a number of loop operations (Np) from the pipelined stages to peel based on IN and Tld is then determined. The loop operation is peeled Np times and copied before the loop in the software code. A vector of registers is allocated and the results of the peeled loop operations and a result of an original loop operation is assigned to the vector of registers. Memory addresses for the results of the peeled loop operations and original loop operation are also assigned.

Type: Grant

Filed: October 4, 2001

Date of Patent: October 11, 2005

Assignee: Elbrus International

Inventor: Alexander Y. Ostanevich
Register economy heuristic for a cycle driven multiple issue instruction scheduler

Patent number: 6718541

Abstract: A method for scheduling operations utilized by an optimizing compiler to reduce register pressure on a target hardware platform assigns register economy priority (REP) values to each operation in a basic block. For each time slot, operations are scheduled in order of their lowest REP values.

Type: Grant

Filed: December 21, 2000

Date of Patent: April 6, 2004

Assignee: Elbrus International Limited

Inventors: Alexander Y. Ostanevich, Vladimir Y. Volkonsky
Profile driven code motion and scheduling

Patent number: 6594824

Abstract: A method and apparatus for generating an optimized intermediate representation of source code for a computer program are described. An initial intermediate representation is extracted from the source code by organizing it as a plurality of basic blocks that each contain at least one program instruction ordered according to respective estimated profit values. A goal function that measures the degree of optimization of the program is calculated in accordance with its intermediate representation. The effect on the goal function of modifying the intermediate representation by moving an instruction from one of the basic blocks to each of its predecessors is tested iteratively and adopting the modified intermediate representation if it causes a reduction in the goal function.

Type: Grant

Filed: February 17, 2000

Date of Patent: July 15, 2003

Assignee: Elbrus International Limited

Inventors: Vladimir Y. Volkonsky, Alexander Y. Ostanevich, Alexander L. Sushentsov
Hardware supported software pipelined loop prologue optimization

Publication number: 20020133813

Abstract: A method for optimizing a software pipelineable loop in a software code is provided. The loop comprises one or more pipelined stages and one or more loop operations. The method comprises evaluating an initiation interval time (IN) for a pipelined stage of the loop. A loop operation time latency (Tld) and a number of loop operations (Np) from the pipelined stages to peel based on IN and Tld is then determined. The loop operation is peeled Np times and copied before the loop in the software code. A vector of registers is allocated and the results of the peeled loop operations and a result of an original loop operation is assigned to the vector of registers. Memory addresses for the results of the peeled loop operations and original loop operation are also assigned.

Type: Application

Filed: October 4, 2001

Publication date: September 19, 2002

Applicant: Elbrus International

Inventor: Alexander Y. Ostanevich
List scheduling algorithm for a cycle-driven instruction scheduler

Publication number: 20020083423

Abstract: A method for scheduling a plurality of operations of one or more types of operations including a plurality of computing resources is provided. The method includes building a list of partial lists for the one or more types of operations where the partial lists include one or more operations. A current partial list of a type of operation is determined. A computing resource for an operation in the current partial list is then allocated. The method then determines if additional computing resources for the type of operation are available for the current partial list. If so, the method reiterates back to determining a current partial list. If additional computing resources are not available, the method performs the steps of excluding the current partial list from the list and if the list includes any other partial lists, reiterating back to determining a current partial list.

Type: Application

Filed: October 4, 2001

Publication date: June 27, 2002

Applicant: Elbrus International

Inventors: Alexander Y. Ostanevich, Vladimir Y. Volkonsky
Register economy heuristic for a cycle driven multiple issue instruction scheduler

Publication number: 20020013937

Abstract: A method for scheduling operations utilized by an optimizing compiler to reduce register pressure on a target hardware platform assigns register economy priority (REP) values to each operation in a basic block. For each time slot, operations are scheduled in order of their lowest REP values.

Type: Application

Filed: December 21, 2000

Publication date: January 31, 2002

Inventors: Alexander Y. Ostanevich, Vladimir Y. Volkonsky