Patents by Inventor Jayesh Iyer

Jayesh Iyer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10884735
    Abstract: A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
    Type: Grant
    Filed: February 26, 2018
    Date of Patent: January 5, 2021
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Jamison Collins, Sebastian Winkel, Howard Chen
  • Patent number: 10346170
    Abstract: In one embodiment, a processor includes logic, responsive to a first instruction, to perform an operation on a first source operand and a second source operand associated with the first instruction and write a result of the operation to a destination location comprising a third source operand. The write may be a partial write of the destination location to maintain an unmodified portion of the third source operand. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 5, 2015
    Date of Patent: July 9, 2019
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Jamison D. Collins, Sebastian Winkel
  • Patent number: 10324724
    Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: June 18, 2019
    Assignee: Intel Corporation
    Inventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
  • Patent number: 10241789
    Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 26, 2019
    Assignee: INTEL CORPORATION
    Inventors: Alexander Y. Ostanevich, Sergey P. Scherbinin, Jayesh Iyer, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10241794
    Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 26, 2019
    Assignee: Intel Corporation
    Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10241801
    Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
    Type: Grant
    Filed: December 23, 2016
    Date of Patent: March 26, 2019
    Assignee: INTEL CORPORATION
    Inventors: Jayesh Iyer, Sergey P. Scherbinin, Alexander Y. Ostanevich, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10235171
    Abstract: An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 19, 2019
    Assignee: INTEL CORPORATION
    Inventors: Alexander Y. Ostanevich, Jayesh Iyer, Sergey P. Scherbinin, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Publication number: 20190042247
    Abstract: A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
    Type: Application
    Filed: February 26, 2018
    Publication date: February 7, 2019
    Inventors: Jayesh Iyer, Jamison Collins, Sebastian Winkel, Howard Chen
  • Patent number: 10133582
    Abstract: A processor includes a first logic to execute an instruction stream out-of-order, the instruction stream divided into a plurality of strands, the instruction stream and each strand ordered by program order (PO). The processor also includes a second logic to determine an oldest undispatched instruction in the instruction stream and store an associated PO value of the oldest undispatched instruction as an executed instruction pointer. The instruction stream includes dispatched and undispatched instructions. The processor also includes a third logic to determine a most recently retired instruction in the instruction stream and store an associated PO value of the most recently retired instruction as a retirement pointer, a fourth logic to select a range of instructions between the retirement pointer and the executed instruction pointer, and a fifth logic to identify the range of instructions as eligible for retirement.
    Type: Grant
    Filed: December 23, 2013
    Date of Patent: November 20, 2018
    Assignee: Intel Corporation
    Inventors: Nikolay Kosarev, Sergey Y. Shishlov, Jayesh Iyer, Alexander V. Butuzov, Boris A. Babayan, Andrey Kluchnikov
  • Patent number: 10095623
    Abstract: Methods and apparatuses to control access to a multiple bank data cache are described. In one embodiment, a processor includes conflict resolution logic to detect multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle and to grant access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache. In another embodiment, a method includes detecting multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle, and granting access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache.
    Type: Grant
    Filed: October 18, 2016
    Date of Patent: October 9, 2018
    Assignee: INTEL CORPORATION
    Inventors: Andrey Kluchnikov, Jayesh Iyer, Sergey Y. Shishlov, Boris A. Babayan
  • Publication number: 20180181398
    Abstract: Embodiments described herein relate to apparatus and methods for decomposing loops to improve performance and power efficiency. In one embodiment, a processor includes: a loop accelerator including a plurality of strand execution circuits, a binary translator to: receive a plurality of instructions from an instruction storage, to determine whether the plurality of instructions include loop instructions, and, in response to determining that they do, to divide the loop instructions into two or more jobs using at least one job creation rule, to assign the two or more jobs to two or more strands using at least one strand creation rule, and to cause the loop accelerator to execute at least two of the two or more strands in parallel using the plurality of strand execution circuits.
    Type: Application
    Filed: December 28, 2016
    Publication date: June 28, 2018
    Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Publication number: 20180181405
    Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
    Type: Application
    Filed: December 23, 2016
    Publication date: June 28, 2018
    Inventors: Jayesh IYER, Sergey P. SCHERBININ, Alexander Y. OSTANEVICH, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
  • Publication number: 20180181396
    Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
    Type: Application
    Filed: December 27, 2016
    Publication date: June 28, 2018
    Inventors: Alexander Y. OSTANEVICH, Sergey P. SCHERBININ, Jayesh IYER, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
  • Publication number: 20180181400
    Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
    Type: Application
    Filed: December 27, 2016
    Publication date: June 28, 2018
    Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Publication number: 20180181397
    Abstract: An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.
    Type: Application
    Filed: December 27, 2016
    Publication date: June 28, 2018
    Inventors: Alexander Y. OSTANEVICH, Jayesh IYER, Sergey P. SCHERBININ, Dmitry M. MASLENNIKOV, Denis G. MOTIN, Alexander V. ERMOLOVICH, Andrey CHUDNOVETS, Sergey A. ROZHKOV, Boris A. BABAYAN
  • Patent number: 9904546
    Abstract: A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: February 27, 2018
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Jamison D. Collins, Sebastian Winkel, Howard H. Chen
  • Patent number: 9858226
    Abstract: In one embodiment a system comprises an integrated circuit, a plurality of voltage regulators; and a data bus coupled to the integrated circuit and the plurality of voltage regulators. In some embodiments the integrated circuit comprises logic to embed a timing signal on the data bus. Other embodiments may be described.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: January 2, 2018
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Edward R. Stanford, Waseem Kraipak
  • Patent number: 9811340
    Abstract: A computer system, a processor in a computer and a computer-implemented method executable on a computer processor involve dividing a set of computer instructions arranged in a sequential program order into a plurality of instruction sequences. Instructions within each sequence are arranged according to the program order. An increment value is assigned to a preceding instruction in each sequence. The increment value is equal to a difference between a program order value of a subsequent instruction in the sequence and a program order value of the preceding instruction. The processor calculates the program order value of each subsequent instruction based on the program order value and the increment value of a corresponding preceding instruction in the same sequence.
    Type: Grant
    Filed: June 18, 2012
    Date of Patent: November 7, 2017
    Assignee: Intel Corporation
    Inventors: Nikolay Kosarev, Jayesh Iyer, Sergey Shishlov, Andrey Kluchnikov, Alexander Butuzov, Boris A. Babayan, Vladimir Penkovski, Sergey V. Bulenkov
  • Publication number: 20170177343
    Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
    Type: Application
    Filed: December 16, 2015
    Publication date: June 22, 2017
    Inventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
  • Patent number: 9645819
    Abstract: A computer system, a computer processor and a method executable on a computer processor involve placing each sequence of a plurality of sequences of computer instructions being scheduled for execution in the processor into a separate queue. The head instruction from each queue is stored into a first storage unit prior to determining whether the head instruction is ready for scheduling. For each instruction in the first storage unit that is determined to be ready, the instruction is moved from the first storage unit to a second storage unit. During a first processor cycle, each instruction in the first storage unit that is determined to be not ready is retained in the first storage unit, and the determining of whether the instruction is ready is repeated during the next processor cycle. Scheduling logic performs scheduling of instructions contained in the second storage unit.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: May 9, 2017
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Nikolay Kosarev, Sergey Shishlov, Alexey Sivtsov, Alexander Butuzov, Boris A. Babayan, Vladimir Penkovski