Patents by Inventor Gregory F. Grohoski

Gregory F. Grohoski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC)

Patent number: 8417961

Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.

Type: Grant

Filed: March 16, 2010

Date of Patent: April 9, 2013

Assignee: Oracle International Corporation

Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
System and method to invalidate obsolete address translations

Patent number: 8412911

Abstract: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.

Type: Grant

Filed: June 29, 2009

Date of Patent: April 2, 2013

Assignee: Oracle America, Inc.

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail
Apparatus and method for local operand bypassing for cryptographic instructions

Patent number: 8356185

Abstract: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency.

Type: Grant

Filed: October 8, 2009

Date of Patent: January 15, 2013

Assignee: Oracle America, Inc.

Inventors: Christopher H. Olson, Gregory F. Grohoski, Robert T. Golla
Dynamic allocation of resources in a threaded, heterogeneous processor

Patent number: 8335911

Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.

Type: Grant

Filed: September 30, 2009

Date of Patent: December 18, 2012

Assignee: Oracle America, Inc.

Inventors: Robert T. Golla, Gregory F. Grohoski
LOW-LATENCY BRANCH TARGET CACHE

Publication number: 20120290821

Abstract: Techniques and structures are disclosed relating to a branch target cache (BTC) in a processor. In one embodiment, the BTC is usable to predict whether a control transfer instruction is to be taken, and, if applicable, a target address for the instruction. The BTC may operate in conjunction with a delayed branch predictor (DBP) that is more accurate but slower than the BTC. If the BTC indicates that a control transfer instruction is predicted to be taken, the processor begins to fetch instructions at the target address indicated by the BTC, but may discard those instructions if the DBP subsequently determines that the control transfer instruction was predicted incorrectly. Branch prediction information output from the BTC and the DBP may be used to update the branch target cache for subsequent predictions. In various embodiments, the BTC may simultaneously store entries for multiple processor threads, and may be fully associative.

Type: Application

Filed: May 11, 2011

Publication date: November 15, 2012

Inventors: Manish K. Shah, Gregory F. Grohoski
System and method to manage address translation requests

Patent number: 8301865

Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

Type: Grant

Filed: June 29, 2009

Date of Patent: October 30, 2012

Assignee: Oracle America, Inc.

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
RETURN ADDRESS PREDICTION IN MULTITHREADED PROCESSORS

Publication number: 20120233442

Abstract: Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries.

Type: Application

Filed: March 11, 2011

Publication date: September 13, 2012

Inventors: Manish K. Shah, Gregory F. Grohoski, Zeid H. Samoail
INSTRUCTION SUPPORT FOR PERFORMING STREAM CIPHER

Publication number: 20120216020

Abstract: Techniques relating to a processor that provides instruction-level support for a stream cipher are disclosed. In one embodiment, the processor supports a first instruction executable to perform an alpha multiplication, an alpha division, and an exclusive-OR operation using a result of the alpha multiplication and a result of the alpha division. In one embodiment, the processor supports a second instruction executable to perform a modular addition of a value R1 and a value S, and to perform a first exclusive-OR operation on a result of the modular addition and a value R2. In one embodiment, the processor supports a third instruction executable to perform a substitution-box (S-Box) operation on a value R1 to produce a value R2?, and to perform a modular addition using a value R2 to produce a value R1'.

Type: Application

Filed: February 21, 2011

Publication date: August 23, 2012

Inventors: Christopher H. Olson, Gregory F. Grohoski, Manish K. Shah
Methods and mechanisms to support multiple features for a number of opcodes

Patent number: 8195923

Abstract: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.

Type: Grant

Filed: April 7, 2009

Date of Patent: June 5, 2012

Assignee: Oracle America, Inc.

Inventors: Lawrence A. Spracklen, Gregory F. Grohoski, Christopher H. Olson, Robert T. Golla
APIC implementation for a highly-threaded x86 processor

Patent number: 8190864

Abstract: Advanced programmable interrupt control for a multithreaded multicore processor that supports software compatible with x86 processors. Embodiments provide interrupt control for increased threads with minimal additional hardware by including in each processor core, a core advanced interrupt controller (core APIC) configured to determine a lowest priority thread of its corresponding processor core. Each core APIC reports its lowest priority thread level as a core priority to an input/output APIC. The I/O APIC routes interrupt requests to the core APIC with the lowest core priority. The selected core APIC then routes the interrupt request to the corresponding lowest priority thread. Each core APIC detects changes in priority levels of its corresponding processor core threads, and notifies the I/O APIC of any change to the corresponding core priority. Each core APIC may notify the I/O APIC as the core priority changes, or when the I/O APIC requests status from each core APIC.

Type: Grant

Filed: October 25, 2007

Date of Patent: May 29, 2012

Assignee: Oracle America, Inc.

Inventors: Paul J. Jordan, Gregory F. Grohoski
INSTRUCTION SUPPORT FOR PERFORMING MONTGOMERY MULTIPLICATION

Publication number: 20110276790

Abstract: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R??1.

Type: Application

Filed: May 7, 2010

Publication date: November 10, 2011

Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence Spracklen, Nils Gura
THREAD FAIRNESS ON A MULTI-THREADED PROCESSOR WITH MULTI-CYCLE CRYPTOGRAPHIC OPERATIONS

Publication number: 20110276783

Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.

Type: Application

Filed: May 4, 2010

Publication date: November 10, 2011

Inventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR PERFORMING A CYCLIC REDUNDANCY CHECK (CRC)

Publication number: 20110231636

Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.

Type: Application

Filed: March 16, 2010

Publication date: September 22, 2011

Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
Minimizing TLB comparison size

Patent number: 7937556

Abstract: In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.

Type: Grant

Filed: April 30, 2008

Date of Patent: May 3, 2011

Assignee: Oracle America, Inc.

Inventors: Paul J. Jordan, Manish K. Shah, Gregory F. Grohoski
APPARATUS AND METHOD FOR LOCAL OPERAND BYPASSING FOR CRYPTOGRAPHIC INSTRUCTIONS

Publication number: 20110087895

Abstract: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency.

Type: Application

Filed: October 8, 2009

Publication date: April 14, 2011

Inventors: Christopher H. Olson, Gregory F. Grohoski, Robert T. Golla
PERCEPTRON-BASED BRANCH PREDICTION MECHANISM FOR PREDICTING CONDITIONAL BRANCH INSTRUCTIONS ON A MULTITHREADED PROCESSOR

Publication number: 20110087866

Abstract: A multithreaded microprocessor includes an instruction fetch unit including a perceptron-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. The conditional branch prediction unit may generate a separate index value for accessing each storage by generating a first index value for accessing a first storage by combining one or more portions of a received instruction fetch address, and generating each other index value for accessing the other storages by combining the first index value with a different portion of direction branch history information.

Type: Application

Filed: October 14, 2009

Publication date: April 14, 2011

Inventors: Manish K. Shah, Gregory F. Grohoski, Robert T. Golla, Jama I. Barreh
BRANCH PREDICTION MECHANISM FOR PREDICTING INDIRECT BRANCH TARGETS

Publication number: 20110078425

Abstract: A multithreaded microprocessor includes an instruction fetch unit that may fetch and maintain a plurality of instructions belonging to one or more threads and one or more execution units that may concurrently execute the one or more threads. The instruction fetch unit includes a target branch prediction unit that may provide a predicted branch target address in response to receiving an instruction fetch address of a current indirect branch instruction. The branch prediction unit includes a primary storage and a control unit. The storage includes a plurality of entries, and each entry may store a predicted branch target address corresponding to a previous indirect branch instruction. The control unit may generate an index value for accessing the storage using a portion of the instruction fetch address of the current indirect branch instruction, and branch direction history information associated with a currently executing thread of the one or more threads.

Type: Application

Filed: September 25, 2009

Publication date: March 31, 2011

Inventors: Manish K. Shah, Gregory F. Grohoski
System and Method to Invalidate Obsolete Address Translations

Publication number: 20100332786

Abstract: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.

Type: Application

Filed: June 29, 2009

Publication date: December 30, 2010

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail
System and Method to Manage Address Translation Requests

Publication number: 20100332787

Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

Type: Application

Filed: June 29, 2009

Publication date: December 30, 2010

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
INSTRUCTIONS FOR PERFORMING DATA ENCRYPTION STANDARD (DES) COMPUTATIONS USING GENERAL-PURPOSE REGISTERS

Publication number: 20100329450

Abstract: Some embodiments of the present invention provide a processor, which includes a set of general-purpose registers and at least one execution unit. Each general-purpose register in the set of general-purpose registers is at least 64 bits wide, and the execution unit supports one or more Data Encryption Standard (DES) instructions. Specifically, the execution unit may support a permutation-rotation instruction for performing DES permutation operations and DES rotation operations. The execution unit may also support a round instruction to perform a DES round operation. Since the DES instructions use general-purpose registers instead of special-purpose registers to perform DES-specific operations, the processor's circuit complexity and area are reduced. Furthermore, in some embodiments, since the DES instructions require at most two operands, the number of bits required to specify the location of the operands are reduced, thereby enabling a larger number of instructions to be supported by the processor.

Type: Application

Filed: June 30, 2009

Publication date: December 30, 2010

Applicant: SUN MICROSYSTEMS, INC.

Inventors: Leonard D. Rarick, Christopher H. Olson, Gregory F. Grohoski

prev 1 2 3 4 next