Patents by Inventor Robert T. Golla

Robert T. Golla has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20110087895
    Abstract: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency.
    Type: Application
    Filed: October 8, 2009
    Publication date: April 14, 2011
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Robert T. Golla
  • Publication number: 20110078697
    Abstract: Systems and methods for efficient out-of-order dynamic deallocation of entries within a shared storage resource in a processor. A processor comprises a unified pick queue that includes an array configured to dynamically allocate any entry of a plurality of entries for a decoded and renamed instruction. This instruction may correspond to any available active threads supported by the processor. The processor includes circuitry configured to determine whether an instruction corresponding to an allocated entry of the plurality of entries is dependent on a speculative instruction and whether the instruction has a fixed instruction execution latency. In response to determining the instruction is not dependent on a speculative instruction, the instruction has a fixed instruction execution latency, and said latency has transpired, the circuitry may deallocate the instruction from the allocated entry.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Inventors: Matthew B. Smittle, Robert T. Golla
  • Publication number: 20110078414
    Abstract: A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Inventors: Christopher H. Olson, Xiang Shan Li, Robert T. Golla
  • Patent number: 7890734
    Abstract: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: February 15, 2011
    Assignee: Open Computing Trust I & II
    Inventor: Robert T. Golla
  • Publication number: 20110029978
    Abstract: Systems and methods for efficient thread arbitration in a processor. A processor comprises a multi-threaded resource. The resource may include an array 8of entries which may be allocated by threads. A thread arbitration table corresponding to a given thread stores a high and a low threshold value in each table entry. A thread history shift register (HSR) indexes the table, wherein each bit of the HSR indicates whether the given thread is a thread hog. When the given thread has more allocated entries in the array than the high threshold of the table entry, the given thread is stalled from further allocating array entries. Similarly, when the given thread has fewer allocated entries in the array than the low threshold of the selected table entry, the given thread is permitted to allocate entries. In this manner, threads that hog dynamic resources can be mitigated such that more resources are available to other threads that are not thread hogs.
    Type: Application
    Filed: July 29, 2009
    Publication date: February 3, 2011
    Inventors: Jared C. Smolens, Robert T. Golla, Matthew B. Smittle
  • Publication number: 20100332787
    Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.
    Type: Application
    Filed: June 29, 2009
    Publication date: December 30, 2010
    Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
  • Publication number: 20100333098
    Abstract: Various techniques for dynamically allocating instruction tags and using those tags are disclosed. These techniques may apply to processors supporting out-of-order execution and to architectures that supports multiple threads. A group of instructions may be assigned a tag value from a pool of available tag values. A tag value may be usable to determine the program order of a group of instructions relative to other instructions in a thread. After the group of instructions has been (or is about to be) committed, the tag value may be freed so that it can be re-used on a second group of instructions. Tag values are dynamically allocated between threads; accordingly, a particular tag value or range of tag values is not dedicated to a particular thread.
    Type: Application
    Filed: June 30, 2009
    Publication date: December 30, 2010
    Inventors: Paul J. Jordan, Robert T. Golla, Jama I. Barreh
  • Publication number: 20100332804
    Abstract: Systems and methods for efficient picking of instructions for out-of-order issue and execution in a processor. In one embodiment, a processor comprises a unified pick queue that is dynamically allocated. Each entry is configured to store age and dependency information relative to other decoded instructions. Also, each entry stores a picked field, which when asserted indicates the decoded instruction has already been picked for out-of-order issue and execution. When asserted, a trigger field indicates a result of a corresponding decoded instruction will be available a predetermined number of clock cycles afterward. A younger instruction dependent on a result of an older instruction is ready to be picked before the result of the older instruction is available. In this case, the older instruction has asserted picked and trigger fields.
    Type: Application
    Filed: June 29, 2009
    Publication date: December 30, 2010
    Inventors: Robert T. Golla, Matthew B. Smittle, Mark A. Luttrell, Xiang Shan Li
  • Publication number: 20100332806
    Abstract: Systems and methods for identification of dependent instructions on speculative load operations in a processor. A processor allocates entries of a unified pick queue for decoded and renamed instructions. Each entry of a corresponding dependency matrix is configured to store a dependency bit for each other instruction in the pick queue. The processor speculates that loads will hit in the data cache, hit in the TLB and not have a read after write (RAW) hazard. For each unresolved load, the pick queue tracks dependent instructions via dependency vectors based upon the dependency matrix. If a load speculation is found to be incorrect, dependent instructions in the pick queue are reset to allow for subsequent picking, and dependent instructions in flight are canceled. On completion of a load miss, dependent operations are re-issued. On resolution of a TLB miss or RAW hazard, the original load is replayed and dependent operations are issued again from the pick queue.
    Type: Application
    Filed: June 30, 2009
    Publication date: December 30, 2010
    Inventors: Robert T. Golla, Matthew B. Smittle, Xiang Shan Li
  • Patent number: 7861063
    Abstract: In one embodiment, a processor comprises a fetch unit and a pick unit. The fetch unit is configured to fetch instructions for execution by the processor. The pick unit is configured to schedule instructions fetched by the fetch unit for execution in the processor. The pick unit is configured to inhibit scheduling a delayed control transfer instruction (DCTI) until a delay slot instruction of the DCTI is available for scheduling. For example, in some embodiments, the pick unit may inhibit scheduling until the delay slot instruction is written to an instruction buffer, until the delay slot instruction is fetched, etc.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: December 28, 2010
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh
  • Publication number: 20100325394
    Abstract: A system and method for balancing instruction loads between multiple execution units are disclosed. One or more execution units may be represented by a slot configured to accept instructions on behalf of the execution unit(s). A decode unit may assign instructions to a particular slot for subsequent scheduling for execution. Slot assignments may be made based on an instruction's type and/or on a history of previous slot assignments. A cumulative slot assignment history may be maintained in a bias counter, the value of which reflects the bias of previous slot assignments. Slot assignments may be determined based on the value of the bias counter, in order to balance the instruction load across all slots, and all execution units. The bias counter may reflect slot assignments made only within a desired historical window. A separate data structure may store data reflecting the actual slot assignments made during the desired historical window.
    Type: Application
    Filed: June 23, 2009
    Publication date: December 23, 2010
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Publication number: 20100325188
    Abstract: A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M.
    Type: Application
    Filed: June 19, 2009
    Publication date: December 23, 2010
    Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla, Paul J. Jordan
  • Publication number: 20100318998
    Abstract: A system and method for managing the dynamic sharing of processor resources between threads in a multi-threaded processor are disclosed. Out-of-order allocation and deallocation may be employed to efficiently use the various resources of the processor. Each element of an allocate vector may indicate whether a corresponding resource is available for allocation. A search of the allocate vector may be performed to identify resources available for allocation. Upon allocation of a resource, a thread identifier associated with the thread to which the resource is allocated may be associated with the allocate vector entry corresponding to the allocated resource. Multiple instances of a particular resource type may be allocated or deallocated in a single processor execution cycle. Each element of a deallocate vector may indicate whether a corresponding resource is ready for deallocation. Examples of resources that may be dynamically shared between threads are reorder buffers, load buffers and store buffers.
    Type: Application
    Filed: June 16, 2009
    Publication date: December 16, 2010
    Inventor: Robert T. Golla
  • Publication number: 20100306510
    Abstract: Systems and methods for providing single cycle movement of data between a floating-point register file (FRF) and a general purpose or integer register file (RF) of a microprocessor system are provided. The system may include an integer execution unit operative to execute instructions with single cycle latency, a floating-point execution unit, a working register file (WRF), an FRF, and an IRF. To achieve the single cycle movement functionality, the integer execution unit may physically own the WRF, IRF, and FRF, and may monitor and control any dependencies between them. Thus, since the integer execution unit has direct read access to both the IRF and the FRF, data may be moved between the two register files using the single cycle operation of the integer execution unit, without the need to store and load the data from memory.
    Type: Application
    Filed: June 2, 2009
    Publication date: December 2, 2010
    Applicant: Sun Microsystems, Inc.
    Inventors: Christopher Olson, Robert T. Golla, Jeffrey S. Brooks
  • Publication number: 20100299499
    Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.
    Type: Application
    Filed: September 30, 2009
    Publication date: November 25, 2010
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Publication number: 20100274993
    Abstract: Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Robert T. Golla, Jama I. Barreh, Jeffrey S. Brooks, Howard L. Levy
  • Publication number: 20100274961
    Abstract: Techniques and systems are described herein to maintain a mapping of logical to physical registers—for example, in the context of a multithreaded processor that supports renaming. A mapping unit may have a plurality of entries, each of which stores rename information for a dedicated one of a set of physical registers available to the processor for renaming. This physically-indexed mapping unit may support multiple threads, and may comprise a content-addressable memory (CAM) in certain embodiments. The mapping unit may support various combinations of read operations (to determine if a logical register is mapped to a physical register), write operations (to create or modify one or more entries containing mapping information), thread flush operations, and commit operations. More than one of such operations may be performed substantially simultaneously in certain embodiments.
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Robert T. Golla, Jama I. Barreh, Howard L. Levy
  • Publication number: 20100274994
    Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittel, Yuan C. Chou, Jared C. Smolens
  • Publication number: 20100257338
    Abstract: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.
    Type: Application
    Filed: April 7, 2009
    Publication date: October 7, 2010
    Inventors: Lawrence A. Spracklen, Gregory F. Grohoski, Christopher H. Olson, Robert T. Golla
  • Publication number: 20100250966
    Abstract: A processor including instruction support for implementing hash algorithms may issue, for execution, programmer-selectable hash instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include hash instructions defined within the ISA. In addition, the hash instructions may be executable by the cryptographic unit to implement a hash that is compliant with one or more respective hash algorithm specifications. In response to receiving a particular hash instruction defined within the ISA, the cryptographic unit may retrieve a set of input data blocks from a predetermined set of architectural registers of the processor, and generate a hash value of the set of input data blocks according to a hash algorithm that corresponds to the particular hash instruction.
    Type: Application
    Filed: March 31, 2009
    Publication date: September 30, 2010
    Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla