Patents by Inventor Paul Caprioli

Paul Caprioli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20150186290
    Abstract: Detailed herein are systems, apparatuses, and methods for transparent page level instruction translation. Exemplary embodiments include an instruction translation lookaside buffer (iTLB), wherein each iTLB entry includes a linear address of a page in memory, a physical address of the page in memory, and a remapping indicator.
    Type: Application
    Filed: December 27, 2013
    Publication date: July 2, 2015
    Inventors: Paul CAPRIOLI, Vedvyas SHANBHOGUE, Koichi YAMADA
  • Patent number: 9032381
    Abstract: State recovery methods and apparatus for computing platforms are disclosed. An example method includes inserting a first instruction into optimized code to cause a first portion of a register in a first state to be saved to memory before execution of a region of the optimized code; and maintaining a value indicative of a manner in which a second portion of the register in the first state is to be restored in connection with a state recovery from the optimized code.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: May 12, 2015
    Assignee: Intel Corporation
    Inventors: Abhay S. Kanhere, Saurabh Shukla, Suriya Subramanian, Paul Caprioli
  • Publication number: 20150095625
    Abstract: Mechanisms for reducing memory access violations are disclosed. Sets of instructions may be identified and the identified sets of instructions may be re-translated or optimized to generate other sets of instructions. Execution of the other sets of instructions is analyzed to determine whether additional memory access violations occur. When additional memory access violations occur, further sets of instructions may be generated or re-translation/optimization of instructions may be disabled.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Inventors: Wessam M. Hassanein, Abhay S. Kanhere, Paul Caprioli
  • Patent number: 8826257
    Abstract: A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: September 2, 2014
    Assignee: Intel Corporation
    Inventors: Muawya M. Al-Otoom, Paul Caprioli, Abhay S. Kanhere, Arvind Krishnaswamy, Omar M. Shaikh
  • Publication number: 20140156933
    Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
  • Publication number: 20140156978
    Abstract: A processor includes an instruction pipeline for executing instructions including a branching instruction, a counter for counting times that the branching instruction is taken, a register for storing a global branch history as a function of a value of the counter, and a branch prediction unit for predicting branching based on the global branch history.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Inventors: Muawya M. AL-OTOOM, Paul CAPRIOLI, Jeffrey J. COOK
  • Patent number: 8732438
    Abstract: Embodiments of the present invention execute an anti-prefetch instruction. These embodiments start by decoding instructions in a decode unit in a processor to prepare the instructions for execution. Upon decoding an anti-prefetch instruction, these embodiments stall the decode unit to prevent decoding subsequent instructions. These embodiments then execute the anti-prefetch instruction, wherein executing the anti-prefetch instruction involves: (1) sending a prefetch request for a cache line in an L1 cache; (2) determining if the prefetch request hits in the L1 cache; (3) if the prefetch request hits in the L1 cache, determining if the cache line contains a predetermined value; and (4) conditionally performing subsequent operations based on whether the prefetch request hits in the L1 cache or the value of the data in the cache line.
    Type: Grant
    Filed: April 16, 2008
    Date of Patent: May 20, 2014
    Assignee: Oracle America, Inc.
    Inventors: Paul Caprioli, Sherman H. Yip, Gideon N. Levinsky
  • Publication number: 20140095842
    Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
  • Publication number: 20140089271
    Abstract: Method and apparatus to efficiently detect violations of data dependency relationships. A memory address associated with a computer instruction may be obtained. A current state of the memory address may be identified. The current state may include whether the memory address is associated with a read or a store instruction, and whether the memory address is associated with a set or a check. A previously accumulated state associated with the memory address may be retrieved from a data structure. The previously accumulated state may include whether the memory address was previously associated with a read or a store instruction, and whether the memory address was previously associated with a set or a check. If a transition from the previously accumulated state to the current state is invalid, a failure condition may be signaled.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 27, 2014
    Inventors: Muawya M. AL-OTOOM, Paul CAPRIOLI, Ryan CARLSON, Ho-Seop KIM, Omar SHAIKH
  • Publication number: 20140007066
    Abstract: State recovery methods and apparatus for computing platforms are disclosed. An example method includes inserting a first instruction into optimized code to cause a first portion of a register in a first state to be saved to memory before execution of a region of the optimized code; and maintaining a value indicative of a manner in which a second portion of the register in the first state is to be restored in connection with a state recovery from the optimized code.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Abhay S. Kanhere, Saurabh Shukla, Suriya Subramanian, Paul Caprioli
  • Publication number: 20130311758
    Abstract: A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.
    Type: Application
    Filed: March 30, 2012
    Publication date: November 21, 2013
    Inventors: Paul Caprioli, Matthew C. Merten, Muawya M. Al-Otoom, Omar M. Shaikh, Abhay S. Kanhere, Suresh Srinivas, Koichi Yamada, Vivek Thakkar, Pawel Osciak
  • Publication number: 20130305019
    Abstract: A dynamic optimization of code for a processor-specific dynamic binary translation of hot code pages (e.g., frequently executed code pages) may be provided by a run-time translation layer. A method may be provided to use an instruction look-aside buffer (iTLB) to map original code pages and translated code pages. The method may comprise fetching an instruction from an original code page, determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated. If both determinations return yes, the method may further comprise fetching a next instruction from a translated code page. If either determinations returns no, the method may further comprise decoding the instruction and fetching the next instruction from the original code page.
    Type: Application
    Filed: September 30, 2011
    Publication date: November 14, 2013
    Inventors: Paul Caprioli, Martin G. Dixon, Brett L. Toll, Muawya M. Al-Otoom, Omar M. Shaikh
  • Patent number: 8572356
    Abstract: Techniques and structures are disclosed for a processor supporting checkpointing to operate effectively in scouting mode while a maximum number of supported checkpoints are active. Operation in scouting mode may include using bypass logic and a set of register storage locations to store and/or forward in-flight instruction results that were calculated during scouting mode. These forwarded results may be used during scouting mode to calculate memory load addresses for yet other in-flight instructions, and the processor may accordingly cause data to be prefetched from these calculated memory load addresses. The set of register storage locations may comprise a working register file or an active portion of a multiported register file.
    Type: Grant
    Filed: January 5, 2010
    Date of Patent: October 29, 2013
    Assignee: Oracle America, Inc.
    Inventors: Sherman H. Yip, Paul Caprioli
  • Publication number: 20130283249
    Abstract: A micro-architecture may provide a hardware and software co-designed dynamic binary translation. The micro-architecture may invoke a method to perform a dynamic binary translation. The method may comprise executing original software code compiled targeting a first instruction set, using processor hardware to detect a hot spot in the software code and passing control to a binary translation translator, determining a hot spot region for translation, generating the translated code using a second instruction set, placing the translated code in a translation cache, executing the translated code from the translated cache, and transitioning back to the original software code after the translated code finishes execution.
    Type: Application
    Filed: September 30, 2011
    Publication date: October 24, 2013
    Inventors: Abhay S. Kanhere, Paul Caprioli, Koichi Yamada, Suriya Madras-Subramanian, Suresh Srinivas
  • Publication number: 20130275715
    Abstract: An apparatus and method is described herein for providing structures to support software memory re-ordering within atomic sections of code. Upon a start or end of a critical section, speculative bits of a translation buffer are reset. When a speculative memory access causes an address translation of a virtual address to a physical address, the translation buffer is searched to determine if another entry (a different virtual address) includes the same physical address. And if another entry does include the same physical address, the speculative execution is failed to provide protection from invalid execution resulting from the memory re-ordering.
    Type: Application
    Filed: December 8, 2011
    Publication date: October 17, 2013
    Inventors: Paul Caprioli, Abhay S. Kanhere
  • Publication number: 20130262838
    Abstract: A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.
    Type: Application
    Filed: March 30, 2012
    Publication date: October 3, 2013
    Inventors: Muawya M. Al-Otoom, Paul Caprioli, Abhay S. Kanhere, Arvind Krishnaswamy, Omar M. Shaikh
  • Patent number: 8484434
    Abstract: Embodiments of the present invention provide a system that generates an index for a cache memory. The system starts by receiving a request to access the cache memory, wherein the request includes address information. The system then obtains non-address information associated with the request. Next, the system generates the index using the address information and the non-address information. The system then uses the index to fulfill access the cache memory.
    Type: Grant
    Filed: February 22, 2012
    Date of Patent: July 9, 2013
    Assignee: Oracle America, Inc.
    Inventors: Paul Caprioli, Martin Karlsson, Shailender Chaudhry
  • Patent number: 8447931
    Abstract: One embodiment of the present invention provides a processor that supports multiple-issue execution. This processor includes a register file, which contains an array of memory cells, wherein the memory cells contain bits for architectural registers of the processor. The register file also includes multiple read ports and multiple write ports to support multiple-issue execution. During operation, if multiple read ports simultaneously read from a given register, the register file is configured to: read each bit of the given register out of the array of memory cells through a single bitline associated with the bit; and to use a driver located outside of the array of memory cells to drive the bit to the multiple read ports. In this way, each memory cell only has to drive a single bitline (instead of multiple bitlines) during a multiple-port read operation, thereby allowing memory cells to use smaller and more power-efficient drivers for read operations.
    Type: Grant
    Filed: July 1, 2005
    Date of Patent: May 21, 2013
    Assignee: Oracle America, Inc.
    Inventors: Shailender Chaudhry, Paul Caprioli, Marc Tremblay
  • Patent number: 8364900
    Abstract: Embodiments of the present invention provide a system that replaces an entry in a least-recently-used way in a skewed-associative cache. The system starts by receiving a cache line address. The system then generates two or more indices using the cache line address. Next, the system generates two or more intermediate indices using the two or more indices. The system then uses at least one of the two or more indices or the two or more intermediate indices to perform a lookup in one or more lookup tables, wherein the lookup returns a value which identifies a least-recently-used way. Next, the system replaces the entry in the least-recently-used way.
    Type: Grant
    Filed: February 12, 2008
    Date of Patent: January 29, 2013
    Assignee: Oracle America, Inc.
    Inventors: Paul Caprioli, Sherman H. Yip, Shailender Chaudhry
  • Patent number: 8316366
    Abstract: Embodiments of the present invention provide a system that executes a transaction on a simultaneous speculative threading (SST) processor. In these embodiments, the processor includes a primary strand and a subordinate strand. Upon encountering a transaction with the primary strand while executing instructions non-transactionally, the processor checkpoints the primary strand and executes the transaction with the primary strand while continuing to non-transactionally execute deferred instructions with the subordinate strand. When the subordinate strand non-transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate the first strand ID. When the primary strand transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate a second strand ID.
    Type: Grant
    Filed: April 2, 2008
    Date of Patent: November 20, 2012
    Assignee: Oracle America, Inc.
    Inventors: Sherman H. Yip, Paul Caprioli, Marc Tremblay