Patents by Inventor Gregory F. Grohoski

Gregory F. Grohoski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8417961
    Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: April 9, 2013
    Assignee: Oracle International Corporation
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
  • Patent number: 8412911
    Abstract: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.
    Type: Grant
    Filed: June 29, 2009
    Date of Patent: April 2, 2013
    Assignee: Oracle America, Inc.
    Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail
  • Patent number: 8356185
    Abstract: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency.
    Type: Grant
    Filed: October 8, 2009
    Date of Patent: January 15, 2013
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Robert T. Golla
  • Patent number: 8335911
    Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: December 18, 2012
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Publication number: 20120290821
    Abstract: Techniques and structures are disclosed relating to a branch target cache (BTC) in a processor. In one embodiment, the BTC is usable to predict whether a control transfer instruction is to be taken, and, if applicable, a target address for the instruction. The BTC may operate in conjunction with a delayed branch predictor (DBP) that is more accurate but slower than the BTC. If the BTC indicates that a control transfer instruction is predicted to be taken, the processor begins to fetch instructions at the target address indicated by the BTC, but may discard those instructions if the DBP subsequently determines that the control transfer instruction was predicted incorrectly. Branch prediction information output from the BTC and the DBP may be used to update the branch target cache for subsequent predictions. In various embodiments, the BTC may simultaneously store entries for multiple processor threads, and may be fully associative.
    Type: Application
    Filed: May 11, 2011
    Publication date: November 15, 2012
    Inventors: Manish K. Shah, Gregory F. Grohoski
  • Patent number: 8301865
    Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.
    Type: Grant
    Filed: June 29, 2009
    Date of Patent: October 30, 2012
    Assignee: Oracle America, Inc.
    Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
  • Publication number: 20120233442
    Abstract: Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries.
    Type: Application
    Filed: March 11, 2011
    Publication date: September 13, 2012
    Inventors: Manish K. Shah, Gregory F. Grohoski, Zeid H. Samoail
  • Publication number: 20120216020
    Abstract: Techniques relating to a processor that provides instruction-level support for a stream cipher are disclosed. In one embodiment, the processor supports a first instruction executable to perform an alpha multiplication, an alpha division, and an exclusive-OR operation using a result of the alpha multiplication and a result of the alpha division. In one embodiment, the processor supports a second instruction executable to perform a modular addition of a value R1 and a value S, and to perform a first exclusive-OR operation on a result of the modular addition and a value R2. In one embodiment, the processor supports a third instruction executable to perform a substitution-box (S-Box) operation on a value R1 to produce a value R2?, and to perform a modular addition using a value R2 to produce a value R1'.
    Type: Application
    Filed: February 21, 2011
    Publication date: August 23, 2012
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Manish K. Shah
  • Patent number: 8195923
    Abstract: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.
    Type: Grant
    Filed: April 7, 2009
    Date of Patent: June 5, 2012
    Assignee: Oracle America, Inc.
    Inventors: Lawrence A. Spracklen, Gregory F. Grohoski, Christopher H. Olson, Robert T. Golla
  • Patent number: 8190864
    Abstract: Advanced programmable interrupt control for a multithreaded multicore processor that supports software compatible with x86 processors. Embodiments provide interrupt control for increased threads with minimal additional hardware by including in each processor core, a core advanced interrupt controller (core APIC) configured to determine a lowest priority thread of its corresponding processor core. Each core APIC reports its lowest priority thread level as a core priority to an input/output APIC. The I/O APIC routes interrupt requests to the core APIC with the lowest core priority. The selected core APIC then routes the interrupt request to the corresponding lowest priority thread. Each core APIC detects changes in priority levels of its corresponding processor core threads, and notifies the I/O APIC of any change to the corresponding core priority. Each core APIC may notify the I/O APIC as the core priority changes, or when the I/O APIC requests status from each core APIC.
    Type: Grant
    Filed: October 25, 2007
    Date of Patent: May 29, 2012
    Assignee: Oracle America, Inc.
    Inventors: Paul J. Jordan, Gregory F. Grohoski
  • Publication number: 20110276790
    Abstract: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R??1.
    Type: Application
    Filed: May 7, 2010
    Publication date: November 10, 2011
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence Spracklen, Nils Gura
  • Publication number: 20110276783
    Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.
    Type: Application
    Filed: May 4, 2010
    Publication date: November 10, 2011
    Inventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
  • Publication number: 20110231636
    Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 22, 2011
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
  • Patent number: 7937556
    Abstract: In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.
    Type: Grant
    Filed: April 30, 2008
    Date of Patent: May 3, 2011
    Assignee: Oracle America, Inc.
    Inventors: Paul J. Jordan, Manish K. Shah, Gregory F. Grohoski
  • Publication number: 20110087895
    Abstract: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency.
    Type: Application
    Filed: October 8, 2009
    Publication date: April 14, 2011
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Robert T. Golla
  • Publication number: 20110087866
    Abstract: A multithreaded microprocessor includes an instruction fetch unit including a perceptron-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. The conditional branch prediction unit may generate a separate index value for accessing each storage by generating a first index value for accessing a first storage by combining one or more portions of a received instruction fetch address, and generating each other index value for accessing the other storages by combining the first index value with a different portion of direction branch history information.
    Type: Application
    Filed: October 14, 2009
    Publication date: April 14, 2011
    Inventors: Manish K. Shah, Gregory F. Grohoski, Robert T. Golla, Jama I. Barreh
  • Publication number: 20110078425
    Abstract: A multithreaded microprocessor includes an instruction fetch unit that may fetch and maintain a plurality of instructions belonging to one or more threads and one or more execution units that may concurrently execute the one or more threads. The instruction fetch unit includes a target branch prediction unit that may provide a predicted branch target address in response to receiving an instruction fetch address of a current indirect branch instruction. The branch prediction unit includes a primary storage and a control unit. The storage includes a plurality of entries, and each entry may store a predicted branch target address corresponding to a previous indirect branch instruction. The control unit may generate an index value for accessing the storage using a portion of the instruction fetch address of the current indirect branch instruction, and branch direction history information associated with a currently executing thread of the one or more threads.
    Type: Application
    Filed: September 25, 2009
    Publication date: March 31, 2011
    Inventors: Manish K. Shah, Gregory F. Grohoski
  • Publication number: 20100332786
    Abstract: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.
    Type: Application
    Filed: June 29, 2009
    Publication date: December 30, 2010
    Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail
  • Publication number: 20100332787
    Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.
    Type: Application
    Filed: June 29, 2009
    Publication date: December 30, 2010
    Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
  • Publication number: 20100329450
    Abstract: Some embodiments of the present invention provide a processor, which includes a set of general-purpose registers and at least one execution unit. Each general-purpose register in the set of general-purpose registers is at least 64 bits wide, and the execution unit supports one or more Data Encryption Standard (DES) instructions. Specifically, the execution unit may support a permutation-rotation instruction for performing DES permutation operations and DES rotation operations. The execution unit may also support a round instruction to perform a DES round operation. Since the DES instructions use general-purpose registers instead of special-purpose registers to perform DES-specific operations, the processor's circuit complexity and area are reduced. Furthermore, in some embodiments, since the DES instructions require at most two operands, the number of bits required to specify the location of the operands are reduced, thereby enabling a larger number of instructions to be supported by the processor.
    Type: Application
    Filed: June 30, 2009
    Publication date: December 30, 2010
    Applicant: SUN MICROSYSTEMS, INC.
    Inventors: Leonard D. Rarick, Christopher H. Olson, Gregory F. Grohoski