Patents by Inventor Robert T. Golla

Robert T. Golla has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20130290675
    Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.
    Type: Application
    Filed: April 26, 2012
    Publication date: October 31, 2013
    Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell
  • Patent number: 8560814
    Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.
    Type: Grant
    Filed: May 4, 2010
    Date of Patent: October 15, 2013
    Assignee: Oracle International Corporation
    Inventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
  • Patent number: 8504805
    Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).
    Type: Grant
    Filed: April 22, 2009
    Date of Patent: August 6, 2013
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittle, Yuan C. Chou, Jared C. Smolens
  • Patent number: 8458446
    Abstract: A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: June 4, 2013
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Xiang Shan Li, Robert T. Golla
  • Patent number: 8438208
    Abstract: A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M.
    Type: Grant
    Filed: June 19, 2009
    Date of Patent: May 7, 2013
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla, Paul J. Jordan
  • Patent number: 8429386
    Abstract: Various techniques for dynamically allocating instruction tags and using those tags are disclosed. These techniques may apply to processors supporting out-of-order execution and to architectures that supports multiple threads. A group of instructions may be assigned a tag value from a pool of available tag values. A tag value may be usable to determine the program order of a group of instructions relative to other instructions in a thread. After the group of instructions has been (or is about to be) committed, the tag value may be freed so that it can be re-used on a second group of instructions. Tag values are dynamically allocated between threads; accordingly, a particular tag value or range of tag values is not dedicated to a particular thread.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: April 23, 2013
    Assignee: Oracle America, Inc.
    Inventors: Paul J. Jordan, Robert T. Golla, Jama I. Barreh
  • Patent number: 8356185
    Abstract: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency.
    Type: Grant
    Filed: October 8, 2009
    Date of Patent: January 15, 2013
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Robert T. Golla
  • Patent number: 8347309
    Abstract: Systems and methods for efficient thread arbitration in a processor. A processor comprises a multi-threaded resource. The resource may include an array of entries which may be allocated by threads. A thread arbitration table corresponding to a given thread stores a high and a low threshold value in each table entry. A thread history shift register (HSR) indexes the table, wherein each bit of the HSR indicates whether the given thread is a thread hog. When the given thread has more allocated entries in the array than the high threshold of the table entry, the given thread is stalled from further allocating array entries. Similarly, when the given thread has fewer allocated entries in the array than the low threshold of the selected table entry, the given thread is permitted to allocate entries. In this manner, threads that hog dynamic resources can be mitigated such that more resources are available to other threads that are not thread hogs.
    Type: Grant
    Filed: July 29, 2009
    Date of Patent: January 1, 2013
    Assignee: Oracle America, Inc.
    Inventors: Jared C. Smolens, Robert T. Golla, Matthew B. Smittle
  • Patent number: 8335911
    Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: December 18, 2012
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Patent number: 8335912
    Abstract: Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.
    Type: Grant
    Filed: April 22, 2009
    Date of Patent: December 18, 2012
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Jama I. Barreh, Jeffrey S. Brooks, Howard L. Levy
  • Patent number: 8301865
    Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.
    Type: Grant
    Filed: June 29, 2009
    Date of Patent: October 30, 2012
    Assignee: Oracle America, Inc.
    Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
  • Publication number: 20120233441
    Abstract: An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read.
    Type: Application
    Filed: March 7, 2011
    Publication date: September 13, 2012
    Inventors: Jama I. Barreh, Robert T. Golla, Manish K. Shah
  • Patent number: 8225034
    Abstract: In one embodiment, a storage buffer includes a plurality of storage locations configured to store a plurality of incoming instructions. The storage buffer also includes a shift FIFO that is coupled to the plurality of storage locations. The shift FIFO includes an entry configured to store an instruction that is next in a program order. In response to receiving a shift signal, control functionality that is coupled to the plurality of storage locations and to the shift FIFO may cause the instruction that is next in the program order to be moved from a given location of the plurality of storage locations to the entry of the shift FIFO.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: July 17, 2012
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Yue Chang, Jama I. Barreh
  • Patent number: 8195919
    Abstract: Determining an effective address of a memory with a three-operand add operation in single execution cycle of a multithreaded processor that can access both segmented memory and non-segmented memory. During that cycle, the processor determines whether a memory segment base is zero. If the segment base is zero, the processor can access a memory location at the effective address without adding the segment base. If the segment base is not zero, such as when executing legacy code, the processor consumes another cycle to add the segment base to the effective address. Similarly, the processor consumes another cycle if the effective address or the linear address is misaligned. An integer execution unit that performs the three-operand add using a carry-save adder coupled to a carry look-ahead adder. If the segment base is not zero, the effective address is fed back through the integer execution unit to add the segment base.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: June 5, 2012
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Robert T. Golla, Manish Shah, Jeffrey S. Brooks
  • Patent number: 8195923
    Abstract: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.
    Type: Grant
    Filed: April 7, 2009
    Date of Patent: June 5, 2012
    Assignee: Oracle America, Inc.
    Inventors: Lawrence A. Spracklen, Gregory F. Grohoski, Christopher H. Olson, Robert T. Golla
  • Patent number: 8099586
    Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: January 17, 2012
    Assignee: Oracle America, Inc.
    Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
  • Patent number: 8095778
    Abstract: Sharing functional units within a multithreaded processor. In one embodiment, the multithreaded processor may include a multithreaded instruction source that may provide an instruction from each of a plurality of thread groups in a given cycle. A given thread group may include one or more instructions from one or more threads. The arbitration functionality may arbitrate between the plurality of thread groups for access to a functional unit such as a load store unit, for example, that may be shared between the thread groups.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: January 10, 2012
    Assignee: Open Computing Trust I & II
    Inventor: Robert T. Golla
  • Publication number: 20110276783
    Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.
    Type: Application
    Filed: May 4, 2010
    Publication date: November 10, 2011
    Inventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
  • Publication number: 20110138153
    Abstract: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.
    Type: Application
    Filed: February 14, 2011
    Publication date: June 9, 2011
    Inventor: Robert T. Golla
  • Patent number: 7941642
    Abstract: In one embodiment, a multithreaded processor includes a multithreaded instruction source that may provide a plurality of instructions each corresponding to a respective one of a plurality of threads. The multithreaded processor also includes a pick unit coupled to the multithreaded instruction source. The pick unit may select in a given cycle, a first divide instruction corresponding to one thread of the plurality of threads and a second divide instruction corresponding to another thread of the plurality of threads based upon a thread selection algorithm. Further, the multithreaded processor includes a storage coupled to a functional unit including a divider configured to execute the first divide instruction and the second divide instruction. The storage may store one of the first and the second divide instructions during execution of the other of the first and the second divide instructions.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: May 10, 2011
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Jeffrey S. Brooks, Christopher H. Olson