Patents by Inventor Robert T. Golla
Robert T. Golla has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20130290675Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.Type: ApplicationFiled: April 26, 2012Publication date: October 31, 2013Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell
-
Patent number: 8560814Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.Type: GrantFiled: May 4, 2010Date of Patent: October 15, 2013Assignee: Oracle International CorporationInventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
-
Patent number: 8504805Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).Type: GrantFiled: April 22, 2009Date of Patent: August 6, 2013Assignee: Oracle America, Inc.Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittle, Yuan C. Chou, Jared C. Smolens
-
Patent number: 8458446Abstract: A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.Type: GrantFiled: September 30, 2009Date of Patent: June 4, 2013Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Xiang Shan Li, Robert T. Golla
-
Patent number: 8438208Abstract: A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M.Type: GrantFiled: June 19, 2009Date of Patent: May 7, 2013Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla, Paul J. Jordan
-
Patent number: 8429386Abstract: Various techniques for dynamically allocating instruction tags and using those tags are disclosed. These techniques may apply to processors supporting out-of-order execution and to architectures that supports multiple threads. A group of instructions may be assigned a tag value from a pool of available tag values. A tag value may be usable to determine the program order of a group of instructions relative to other instructions in a thread. After the group of instructions has been (or is about to be) committed, the tag value may be freed so that it can be re-used on a second group of instructions. Tag values are dynamically allocated between threads; accordingly, a particular tag value or range of tag values is not dedicated to a particular thread.Type: GrantFiled: June 30, 2009Date of Patent: April 23, 2013Assignee: Oracle America, Inc.Inventors: Paul J. Jordan, Robert T. Golla, Jama I. Barreh
-
Patent number: 8356185Abstract: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency.Type: GrantFiled: October 8, 2009Date of Patent: January 15, 2013Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Gregory F. Grohoski, Robert T. Golla
-
Patent number: 8347309Abstract: Systems and methods for efficient thread arbitration in a processor. A processor comprises a multi-threaded resource. The resource may include an array of entries which may be allocated by threads. A thread arbitration table corresponding to a given thread stores a high and a low threshold value in each table entry. A thread history shift register (HSR) indexes the table, wherein each bit of the HSR indicates whether the given thread is a thread hog. When the given thread has more allocated entries in the array than the high threshold of the table entry, the given thread is stalled from further allocating array entries. Similarly, when the given thread has fewer allocated entries in the array than the low threshold of the selected table entry, the given thread is permitted to allocate entries. In this manner, threads that hog dynamic resources can be mitigated such that more resources are available to other threads that are not thread hogs.Type: GrantFiled: July 29, 2009Date of Patent: January 1, 2013Assignee: Oracle America, Inc.Inventors: Jared C. Smolens, Robert T. Golla, Matthew B. Smittle
-
Patent number: 8335911Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.Type: GrantFiled: September 30, 2009Date of Patent: December 18, 2012Assignee: Oracle America, Inc.Inventors: Robert T. Golla, Gregory F. Grohoski
-
Patent number: 8335912Abstract: Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.Type: GrantFiled: April 22, 2009Date of Patent: December 18, 2012Assignee: Oracle America, Inc.Inventors: Robert T. Golla, Jama I. Barreh, Jeffrey S. Brooks, Howard L. Levy
-
Patent number: 8301865Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.Type: GrantFiled: June 29, 2009Date of Patent: October 30, 2012Assignee: Oracle America, Inc.Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
-
Publication number: 20120233441Abstract: An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read.Type: ApplicationFiled: March 7, 2011Publication date: September 13, 2012Inventors: Jama I. Barreh, Robert T. Golla, Manish K. Shah
-
Patent number: 8225034Abstract: In one embodiment, a storage buffer includes a plurality of storage locations configured to store a plurality of incoming instructions. The storage buffer also includes a shift FIFO that is coupled to the plurality of storage locations. The shift FIFO includes an entry configured to store an instruction that is next in a program order. In response to receiving a shift signal, control functionality that is coupled to the plurality of storage locations and to the shift FIFO may cause the instruction that is next in the program order to be moved from a given location of the plurality of storage locations to the entry of the shift FIFO.Type: GrantFiled: June 30, 2004Date of Patent: July 17, 2012Assignee: Oracle America, Inc.Inventors: Robert T. Golla, Yue Chang, Jama I. Barreh
-
Patent number: 8195919Abstract: Determining an effective address of a memory with a three-operand add operation in single execution cycle of a multithreaded processor that can access both segmented memory and non-segmented memory. During that cycle, the processor determines whether a memory segment base is zero. If the segment base is zero, the processor can access a memory location at the effective address without adding the segment base. If the segment base is not zero, such as when executing legacy code, the processor consumes another cycle to add the segment base to the effective address. Similarly, the processor consumes another cycle if the effective address or the linear address is misaligned. An integer execution unit that performs the three-operand add using a carry-save adder coupled to a carry look-ahead adder. If the segment base is not zero, the effective address is fed back through the integer execution unit to add the segment base.Type: GrantFiled: October 29, 2007Date of Patent: June 5, 2012Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Robert T. Golla, Manish Shah, Jeffrey S. Brooks
-
Patent number: 8195923Abstract: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.Type: GrantFiled: April 7, 2009Date of Patent: June 5, 2012Assignee: Oracle America, Inc.Inventors: Lawrence A. Spracklen, Gregory F. Grohoski, Christopher H. Olson, Robert T. Golla
-
Patent number: 8099586Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.Type: GrantFiled: December 30, 2008Date of Patent: January 17, 2012Assignee: Oracle America, Inc.Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
-
Patent number: 8095778Abstract: Sharing functional units within a multithreaded processor. In one embodiment, the multithreaded processor may include a multithreaded instruction source that may provide an instruction from each of a plurality of thread groups in a given cycle. A given thread group may include one or more instructions from one or more threads. The arbitration functionality may arbitrate between the plurality of thread groups for access to a functional unit such as a load store unit, for example, that may be shared between the thread groups.Type: GrantFiled: June 30, 2004Date of Patent: January 10, 2012Assignee: Open Computing Trust I & IIInventor: Robert T. Golla
-
Publication number: 20110276783Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.Type: ApplicationFiled: May 4, 2010Publication date: November 10, 2011Inventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
-
Publication number: 20110138153Abstract: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.Type: ApplicationFiled: February 14, 2011Publication date: June 9, 2011Inventor: Robert T. Golla
-
Patent number: 7941642Abstract: In one embodiment, a multithreaded processor includes a multithreaded instruction source that may provide a plurality of instructions each corresponding to a respective one of a plurality of threads. The multithreaded processor also includes a pick unit coupled to the multithreaded instruction source. The pick unit may select in a given cycle, a first divide instruction corresponding to one thread of the plurality of threads and a second divide instruction corresponding to another thread of the plurality of threads based upon a thread selection algorithm. Further, the multithreaded processor includes a storage coupled to a functional unit including a divider configured to execute the first divide instruction and the second divide instruction. The storage may store one of the first and the second divide instructions during execution of the other of the first and the second divide instructions.Type: GrantFiled: June 30, 2004Date of Patent: May 10, 2011Assignee: Oracle America, Inc.Inventors: Robert T. Golla, Jeffrey S. Brooks, Christopher H. Olson