Patents by Inventor Sergey A. Rozhkov

Sergey A. Rozhkov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10579378
    Abstract: An apparatus and method are described for executing instructions using a predicate register. For example, one embodiment of a processor comprises: a register set including a predicate register to store a set of predicate condition bits, the predicate condition bits specifying whether results of a particular predicated instruction sequence are to be retained or discarded; and predicate execution logic to execute a first predicate instruction to indicate a start of a new predicated instruction sequence by copying a condition value from a processor control register in the register set to the predicate register. In a further embodiment, the predicate condition bits in the predicate register are to be shifted in response to the first predicate instruction to free space within the predicate register for the new condition value associated with the new predicated instruction sequence.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: March 3, 2020
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Victor W. Lee, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10241789
    Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 26, 2019
    Assignee: INTEL CORPORATION
    Inventors: Alexander Y. Ostanevich, Sergey P. Scherbinin, Jayesh Iyer, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10241801
    Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
    Type: Grant
    Filed: December 23, 2016
    Date of Patent: March 26, 2019
    Assignee: INTEL CORPORATION
    Inventors: Jayesh Iyer, Sergey P. Scherbinin, Alexander Y. Ostanevich, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10241794
    Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 26, 2019
    Assignee: Intel Corporation
    Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10235171
    Abstract: An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 19, 2019
    Assignee: INTEL CORPORATION
    Inventors: Alexander Y. Ostanevich, Jayesh Iyer, Sergey P. Scherbinin, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Publication number: 20180181397
    Abstract: An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.
    Type: Application
    Filed: December 27, 2016
    Publication date: June 28, 2018
    Inventors: Alexander Y. OSTANEVICH, Jayesh IYER, Sergey P. SCHERBININ, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
  • Publication number: 20180181400
    Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
    Type: Application
    Filed: December 27, 2016
    Publication date: June 28, 2018
    Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Publication number: 20180181396
    Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
    Type: Application
    Filed: December 27, 2016
    Publication date: June 28, 2018
    Inventors: Alexander Y. OSTANEVICH, Sergey P. SCHERBININ, Jayesh IYER, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
  • Publication number: 20180181405
    Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
    Type: Application
    Filed: December 23, 2016
    Publication date: June 28, 2018
    Inventors: Jayesh IYER, Sergey P. SCHERBININ, Alexander Y. OSTANEVICH, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
  • Publication number: 20180181398
    Abstract: Embodiments described herein relate to apparatus and methods for decomposing loops to improve performance and power efficiency. In one embodiment, a processor includes: a loop accelerator including a plurality of strand execution circuits, a binary translator to: receive a plurality of instructions from an instruction storage, to determine whether the plurality of instructions include loop instructions, and, in response to determining that they do, to divide the loop instructions into two or more jobs using at least one job creation rule, to assign the two or more jobs to two or more strands using at least one strand creation rule, and to cause the loop accelerator to execute at least two of the two or more strands in parallel using the plurality of strand execution circuits.
    Type: Application
    Filed: December 28, 2016
    Publication date: June 28, 2018
    Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Publication number: 20170090929
    Abstract: In an example, there is disclosed a computing apparatus, including a processor operable to execute a plurality of instructions forming a program; and a verification engine, operable to: receive an execution control data (ECD) for the program; and monitor execution of only some instructions of the program to ensure that they are consistent with the ECD. In some embodiments, the monitoring engine may include a correctness monitoring unit (CMU) in processor hardware. There is also disclosed one or more computer-readable storage mediums having stored thereon executable instructions for providing a monitoring engine, and a computer-implemented method of providing a monitoring engine.
    Type: Application
    Filed: September 25, 2015
    Publication date: March 30, 2017
    Applicant: McAfee, Inc.
    Inventors: Igor Muttik, Boris A. Babayan, Alexander V. Ermolovich, Alexander Y. Ostanevich, Sergey A. Rozhkov
  • Publication number: 20160055004
    Abstract: An apparatus and method are described for non-speculative execution of conditional instructions. For example, one embodiment of a processor comprises: a register set including a first register to store a set of one or more condition bits; non-speculative execution logic to execute a first instruction to identify a first target instruction strand in response to a first conditional value read from the set of condition bits, the first instruction to wait until the first conditional value becomes known before causing the first target instruction strand to be fetched and executed, the non-speculative execution logic to execute a second instruction to identify an end of the first target instruction strand and responsively identify a new current instruction pointer for instructions which follow the second instruction; and out-of-order execution logic to fetch and execute the instructions which follow the second instruction prior to the execution of the second instruction.
    Type: Application
    Filed: August 21, 2014
    Publication date: February 25, 2016
    Inventors: EDWARD T. GROCHOWSKI, MILIND B. GIRKAR, VICTOR W. LEE, DMITRY M. MASLENNIKOV, ROBERT VALENTINE, SERGEY A. ROZHKOV, BORIS A. BABAYAN
  • Publication number: 20150277910
    Abstract: An apparatus and method are described for executing instructions using a predicate register. For example, one embodiment of a processor comprises: a register set including a predicate register to store a set of predicate condition bits, the predicate condition bits specifying whether results of a particular predicated instruction sequence are to be retained or discarded; and predicate execution logic to execute a first predicate instruction to indicate a start of a new predicated instruction sequence by copying a condition value from a processor control register in the register set to the predicate register. In a further embodiment, the predicate condition bits in the predicate register are to be shifted in response to the first predicate instruction to free space within the predicate register for the new condition value associated with the new predicated instruction sequence.
    Type: Application
    Filed: March 27, 2014
    Publication date: October 1, 2015
    Inventors: EDWARD T. GROCHOWSKI, VICTOR W. LEE, SERGEY A. ROZHKOV, BORIS A. BABAYAN
  • Patent number: 8261250
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Grant
    Filed: January 10, 2011
    Date of Patent: September 4, 2012
    Assignee: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Publication number: 20110107067
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Application
    Filed: January 10, 2011
    Publication date: May 5, 2011
    Applicant: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Patent number: 7895587
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: February 22, 2011
    Assignee: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Publication number: 20100274972
    Abstract: Systems, methods, and apparatuses for parallel computing are described. In some embodiments, a processor is described that includes a front end and back end. The front includes an instruction cache to store instructions of a strand. The back end includes a scheduler, register file, and execution resources to execution the strand's instructions.
    Type: Application
    Filed: December 23, 2009
    Publication date: October 28, 2010
    Inventors: Boris Babayan, Vladimir L. Gnatyuk, Sergey Yu. Shishlov, Sergey P. Scherbinin, Alexander V. Butuzov, Vladimir M. Pentkovski, Denis M. Khartikov, Sergey A. Rozhkov, Roman A. Khvatov
  • Publication number: 20070006193
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Application
    Filed: September 8, 2006
    Publication date: January 4, 2007
    Applicant: Elbrus International
    Inventors: Boris Babaian, Yuli Sakhin, Vladimir Volkonskiy, Sergey Rozhkov, Vladimir Tikhorsky, Feodor Gruzdov, Leonid Nazarov, Mikhail Chudakov
  • Patent number: 7143401
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Grant
    Filed: February 20, 2001
    Date of Patent: November 28, 2006
    Assignee: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Patent number: 7065750
    Abstract: Precise exceptions handling in the optimized binary translated code is achieved by transitioning execution to the non-optimized step-by-step foreign code execution means in accordance with one of the several coherent foreign states designated during the optimized translation of the foreign code. A method to improve the operation by avoiding complete foreign state updates in the optimized code, an apparatus to track the switching between the states and a method to recompute the complete foreign state in accordance to the current state identification, execution context and additional documentation provided during the translation time are proposed.
    Type: Grant
    Filed: April 18, 2001
    Date of Patent: June 20, 2006
    Assignee: Elbrus International
    Inventors: Boris A. Babaian, Andrew V. Yakushev, Sergey A. Rozhkov, Vladimir M. Gushchin