Instruction Issuing Patents (Class 712/214)
  • Publication number: 20080282067
    Abstract: A multi-threaded in-order superscalar processor 2 includes an issue stage 12 including issue circuitry 22, 24 for selecting instructions to be issued to execution units 14, 16 in dependence upon a currently selected issue policy. A plurality of different issue policies are provided by associated different policy circuitry 28, 30, 32 and a selection between which of these instances of the policy circuitry 28, 30, 32 is active is made by policy selecting circuitry 34 in dependence upon detected dynamic behaviour of the processor 2.
    Type: Application
    Filed: March 27, 2008
    Publication date: November 13, 2008
    Applicant: ARM Limited
    Inventors: Emre Ozer, Stuart David Biles
  • Patent number: 7444498
    Abstract: The present invention allows a microprocessor to identify and speculatively execute future load instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The data for such future load instructions can be prefetched from a distant cache or main memory such that when the load instruction is re-executed (non speculative executed) after the stall condition expires, its data will reside either in the L1 cache, or will be enroute to the processor, resulting in a reduced execution latency. When an extended stall condition is detected, load lookahead prefetch is started allowing speculative execution of instructions that would normally have been stalled.
    Type: Grant
    Filed: December 17, 2004
    Date of Patent: October 28, 2008
    Assignee: International Business Machines Corporation
    Inventors: Richard James Eickemeyer, Hung Qui Le, Dung Quoc Nguyen, Benjamin Walter Stolt, Brian William Thompto
  • Publication number: 20080256337
    Abstract: In the field of data communications, it is desirable to track bits of a bit sequence remaining to be decoded by a decoder. A method of decoding the bit sequence that corresponds to a PDU comprises reading-in a bit sequence and processing the bit sequence. In order to maintain a record of the bits reaming to be processed, a data stack is used during decoding of the bit sequence.
    Type: Application
    Filed: April 14, 2008
    Publication date: October 16, 2008
    Applicant: AGILENT TECHNOLOGIES, INC.
    Inventors: Kevin Mitchell, Tony Kirkham
  • Patent number: 7437538
    Abstract: An apparatus and method for floating-point special case handling. In one embodiment, a processor may include a first execution unit configured to execute a longer-latency floating-point instruction, and a second execution unit configured to execute a shorter-latency floating-point instruction. In response to the longer-latency floating-point instruction being issued to the first execution unit, the second execution unit may be further configured to detect whether a result of the longer-latency floating-point instruction is determinable from one or more operands of the longer-latency floating-point instruction independently of the first execution unit executing the longer-latency floating-point instruction. In response to detecting that the result is determinable, the second execution unit may be further configured to flush the longer-latency floating-point instruction from the first execution unit and to determine the result.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: October 14, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Publication number: 20080229076
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.
    Type: Application
    Filed: May 23, 2008
    Publication date: September 18, 2008
    Inventor: Jeffry E. Gonion
  • Publication number: 20080222394
    Abstract: Systems and methods for distributing thread instructions in the pipeline of a multi-threading digital processor are disclosed. More particularly, hardware and software are disclosed for successively selecting threads in an ordered sequence for execution in the processor pipeline. If a thread to be selected cannot execute, then a complementary thread is selected for execution.
    Type: Application
    Filed: May 21, 2008
    Publication date: September 11, 2008
    Inventors: Claude Basso, Jean Louis Calvignac, Chih-jen Chang, Gordon Taylor David, Harm Peter Hofstee, Fabrice Jean Verplanken, Colin Beaton Verrilli
  • Publication number: 20080215854
    Abstract: A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.
    Type: Application
    Filed: May 15, 2008
    Publication date: September 4, 2008
    Inventors: Sameh W. Asaad, Richard Gerard Hofmann
  • Publication number: 20080209176
    Abstract: A multi-core microprocessor has a plurality of processor cores which are coupled to a bridge element. The bridge element sends transactions to and/or receives transactions from the processor cores, where each transaction has one or more packets. The transactions include atomic transactions. The bridge element comprises a buffer unit storing a time stamp for each packet sent or received. Furthermore, a multi-core multi-node processor system is provided that has debug hardware to capture and time stamp intra-node and/or inter-node transaction packets. Atomic operations are, for example, atomic read-modify-write instructions.
    Type: Application
    Filed: August 9, 2007
    Publication date: August 28, 2008
    Inventors: Padmaraj Singh, Todd Foster, Dennis Lastor
  • Publication number: 20080209178
    Abstract: A method is provided for evaluating two or more instructions in an out of order issue queue during a particular cycle of the queue, to select an instruction for issue during the next following cycle. If an instruction was previously designated to issue during the particular cycle, one or more instructions in the queue are evaluated to determine if any of them are dependent on the designated instruction. For the evaluation, each instruction placed into the queue is accompanied by corresponding logic elements that provide destination to source compares for the instruction. In an embodiment comprising a method, the oldest ready instruction in the queue during a particular cycle is identified.
    Type: Application
    Filed: May 2, 2008
    Publication date: August 28, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: William Elton Burky, Raymond Cheung Yeung
  • Publication number: 20080209177
    Abstract: A method and system for maintaining a best-case demand redispatch of an instruction to allow for maximizing the time a rejected thread may execute in lookahead execution mode, while maintaining the smallest L1 cache miss penalty supported by the memory subsystem. In response to a demand miss, a load/store unit sends a fetch request to the next level cache. The cache line of the demand miss is examined to identify the critical sector. Once the critical sector is identified, a best-case data return time is determined based on the fastest time the next level cache is able to return the critical sector of the cache line. The load/store unit then sends a speculative warning to the dispatch unit to coincide with the best-case data return, wherein the speculative warning prepares the dispatch unit to resend the instruction for execution as soon as data is available to the processor core.
    Type: Application
    Filed: May 1, 2008
    Publication date: August 28, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Scott Bruce Frommer, Sheldon B. Levenstein, Bruce Joseph Ronchetti, Anthony Saporito
  • Patent number: 7418576
    Abstract: A graphics processor buffers vertex thread and pixel threads. The different types of threads issue instructions corresponding to different sets of operations. A plurality of different types of execution units are provided, each type of execution unit servicing a different class of operations, such as an executing unit supporting texture operations, an execution unit supporting blending operations, and an execution unit supporting mathematical operations. Current instructions of the threads are buffered and prioritized in a common instruction buffer. A set of high priority instructions is issued per cycle to the plurality of different types of execution units.
    Type: Grant
    Filed: November 17, 2004
    Date of Patent: August 26, 2008
    Assignee: Nvidia Corporation
    Inventors: John E. Lindholm, Brett W. Coon
  • Publication number: 20080189521
    Abstract: A method for optimizing throughput in a microprocessor that is capable of processing multiple threads of instructions simultaneously. Instruction issue logic is provided between the input buffers and the pipeline of the microprocessor. The instruction issue logic speculatively issues instructions from a given thread based on the probability that the required operands will be available when the instruction reaches the stage in the pipeline where they are required. Issue of an instruction is blocked if the current pipeline conditions indicate that there is a significant probability that the instruction will need to stall in a shared resource to wait for operands. Once the probability that the instruction will stall is below a certain threshold, based on current pipeline conditions, the instruction is allowed to issue.
    Type: Application
    Filed: April 17, 2008
    Publication date: August 7, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Victor Roberts Augsburg, Jeffrey Todd Bridges, Michael Scott Mcilvaine, Thomas Andrew Sartorius, Rodney Wayne Smith
  • Publication number: 20080189520
    Abstract: A method for dispatching instructions in the data processing system, having in memory for storing instructions and a plurality of central processing units, where each central processing unit includes a circuit to provide data indicating internal performance, the method having steps of receiving internal performance data signals from a pool of central processing units, selecting a central processing unit according to the received internal performance data and dispatching instructions from the memory to the selected central processing unit.
    Type: Application
    Filed: February 6, 2007
    Publication date: August 7, 2008
    Inventors: Deepak K. Singh, Francois Ibrahim Atallah
  • Patent number: 7409530
    Abstract: A VLIW instruction format is introduced having a set of control bits which identify subinstruction sharing conditions. At compilation the VLIW instruction is analyzed to identify subinstruction sharing opportunities. Such opportunities are encoded in the control bits of the instruction. Before the instruction is moved into the instruction cache, the instruction is compressed into the new format to delete select redundant occurrences of a subinstruction. Specifically, where a subinstruction is to be shared by corresponding functional processing units of respective clusters, the subinstruction need only appear in the instruction once. The redundant appearance is deleted. The control bits are decoded at instruction parsing time to route a shared subinstruction to the associated functional processing units.
    Type: Grant
    Filed: December 17, 2004
    Date of Patent: August 5, 2008
    Assignee: University of Washington
    Inventors: Donglok Kim, Stefan G. Berg, Weiyun Sun, Yongmin Kim
  • Publication number: 20080177981
    Abstract: A method and apparatus for executing instructions in a pipeline processor. The method decreases the latency between an instruction cache and a pipeline processor when bubbles occur in the processing stream due to an execution of a branch correction, or when an interrupt changes the sequence of an instruction stream. The latency is reduced when a decode stage for detecting branch prediction and a related instruction queue location have invalid data representing a bubble in the processing stream. Instructions for execution are inserted in parallel into the decode stage and instruction queue, thereby reducing by one cycle time the length of the pipeline stage.
    Type: Application
    Filed: October 8, 2007
    Publication date: July 24, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James N. Dieffenderfer, Richard W. Doing, Brian M. Stempel, Steven R. Testa, Kenichi Tsuchiya
  • Patent number: 7401206
    Abstract: An apparatus and method for fine-grained multithreading in a multipipelined processor core. According to one embodiment, a processor may include instruction fetch logic configured to assign a given one of a plurality of threads to a corresponding one of a plurality of thread groups, where each of the plurality of thread groups may comprise a subset of the plurality of threads, to issue a first instruction from one of the plurality of threads during one execution cycle, and to issue a second instruction from another one of the plurality of threads during a successive execution cycle. The processor may further include a plurality of execution units, each configured to execute instructions issued from a respective thread group.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: July 15, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Ricky C. Hetherington, Gregory F. Grohoski, Robert T. Golla
  • Publication number: 20080168260
    Abstract: A method is provided for processing instructions by a processor, in which instructions are queued in an instruction pipeline in a queued order. A first instruction is identified from the queued instructions in the instruction pipeline, the first instruction being identified as having a dependency which is satisfiable within a number of instruction cycles after a current instruction in the instruction pipeline is issued. The first instruction is placed in a side buffer and at least one second instruction is issued from the remaining queued instructions while the first instruction remains in the side buffer. Then, the first instruction is issued from the side buffer after issuing the at least one second instruction in the queued order when the dependency of the first instruction has cleared and after the number of instruction cycles have passed.
    Type: Application
    Filed: January 8, 2007
    Publication date: July 10, 2008
    Inventors: Victor Zyuban, Michael K. Gschwind, John-David Wellman
  • Publication number: 20080162884
    Abstract: A processor core and method of executing instructions, both of which utilizes schedules, are presented. Each of the schedules includes a sequence of instructions, an address of a first of the instructions in the schedule, an order vector of an original order of the instructions in the schedule, a rename map of registers for each register in the schedule, and a list of register names used in the schedule. The schedule exploits instruction-level parallelism in executing out-of-order instructions. The processor core includes a schedule cache that is configured to store schedules, a shared cache configured to store both I-side and D-side cache data, and an execution resource for requesting a schedule to be executed from the schedule cache. The processor core further includes a scheduler disposed between the schedule cache and the cache.
    Type: Application
    Filed: January 2, 2007
    Publication date: July 3, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Krishnan K. Kailas, Ravi Nair, Sumedh W. Sathaye, Wolfram Sauer, John-David Wellman
  • Publication number: 20080162887
    Abstract: Method, system and computer program product for generating effective addresses in a data processing system. A method, in a data processing system, for generating an effective address includes generating a first portion of the effective address by calculating a first plurality of effective address bits of the effective address, and generating a second portion of the effective address by guessing a second plurality of effective address bits of the effective address. By intelligently guessing a plurality of the effective address bits that form the effective address, the effective address can be generated and sent to a translation unit more quickly than in a system in which all the effective address bits of the effective address are calculated. The method and system is particularly suitable for generating effective addresses in a CAM-based effective address translation design in a multi-threaded environment.
    Type: Application
    Filed: March 14, 2008
    Publication date: July 3, 2008
    Inventors: RACHEL MARIE FLOOD, Scott Bruce Frommer, David Allen Hrusecky, Sheldon B. Levenstein, Michael Thomas Vaden
  • Publication number: 20080162886
    Abstract: A method and apparatus for enabling a Software Transactional Memory (STM) with precompiled binaries is herein described. Upon encountering an access operation in a transaction, an annotation field associated with a memory location referenced by the access is checked. In response to the memory location representing a previous similar access within the transaction, the access is performed without access barriers. However, if the annotation field is in a default state representing no previous access during a pendancy of the transaction, then a mode of the processor is determined. If the processor mode is in implicit mode, an access handler/barrier is asynchronously executed. Conversely, in an explicit mode, a flag is set instead of asynchronously executing the handler. In addition, during compilation convert explicit and convert implicit instructions are inserted to intelligently convert modes for precompiled and newly compiled binaries.
    Type: Application
    Filed: December 28, 2006
    Publication date: July 3, 2008
    Inventors: Bratin Saha, Ali-Reza Adl-Tabatabai, Quinn Jacobsen
  • Publication number: 20080162885
    Abstract: A method and apparatus for ensuring integrity of transaction exit functions is herein described. Dead local data in a transaction is prevented from overwriting local variables associated with a transaction exit function. In a write-buffering Software Transactional Memory (STM) system, a commit function is associated with a private stack to store local variables to ensure write-back of local dead data in a write-buffer does not corrupt the commit function. Similarly, in a roll-back STM, an abort function is associated with a private stack to store local variables to ensure the roll-back of a program stack with local dead data from a write log does not corrupt the abort function. Alternatively, one stack may be used for the transaction including a first function and an exit function. Here, local dead variables are detected and prevented from overwriting local variables of the exit function.
    Type: Application
    Filed: December 28, 2006
    Publication date: July 3, 2008
    Inventors: Cheng Wang, Youfeng Wu, Bratin Saha, Ali-Reza Adl-Tabatabai
  • Patent number: 7395082
    Abstract: Methods and systems for application framework development for wireless devices are provided herein. Aspects of the method may include acquiring an MMI event from an MMI event queue within the MMI wireless framework. An identity of the acquired MMI event may be determined and the acquired MMI event may be dispatched to an event handler based on the determined identity of the acquired event. If the acquired MMI event comprises a timing event, the acquired MMI event may be dispatched to an MMI event owner within the MMI wireless framework. If the acquired MMI event comprises a keypad event, the acquired MMI event may be dispatched to a currently active MMI view within the MMI wireless framework. If the acquired MMI event comprises an addressed event, the acquired MMI event may be dispatched to a destination handler within the MMI wireless framework.
    Type: Grant
    Filed: August 25, 2004
    Date of Patent: July 1, 2008
    Assignee: Broadcom Corporation
    Inventors: Derek John Foster, Lori Yoshida, Richard Zhang
  • Patent number: 7392367
    Abstract: A method, apparatus, system, and signal-bearing medium that in various embodiments determine whether to execute a command in a queue or whether to wait until another command or commands completed. The determination is based on a combination of an in-use vector and a scorecard vector. The in-use vector indicates which slots in various queues contain commands. The scorecard vector indicates the dependencies between various queues. In this way, the scorecard vector, and the thus the queue dependencies can be set and modified after the logic that processes the commands has been designed.
    Type: Grant
    Filed: June 19, 2006
    Date of Patent: June 24, 2008
    Assignee: International Business Machines Corporation
    Inventors: Scott D. Clark, Scott M. Willenborg
  • Patent number: 7392366
    Abstract: A multithreaded processor, fetch control for a multithreaded processor and a method of fetching in the multithreaded processor. Processor event and use (EU) signs are monitored for downstream pipeline conditions indicating pipeline execution thread states. Instruction cache fetches are skipped for any thread that is incapable of receiving fetched cache contents, e.g., because the thread is full or stalled. Also, consecutive fetches may be selected for the same thread, e.g., on a branch mis-predict. Thus, the processor avoids wasting power on unnecessary or place keeper fetches.
    Type: Grant
    Filed: September 16, 2005
    Date of Patent: June 24, 2008
    Assignee: International Business Machines Corp.
    Inventors: Pradip Bose, Alper Buyuktosunoglu, Richard J. Eickemeyer, Lee E. Eisen, Philip G. Emma, John B. Griswell, Zhigang Hu, Hung Q. Le, Douglas R. Logan, Balaram Sinharoy
  • Publication number: 20080148021
    Abstract: An issue unit includes a first instruction stage, a second instruction stage, and issue control logic. During a first instruction cycle, the issue unit performs two tasks, which are 1) the instructions located in the first instruction stage are moved to a second instruction stage, and 2) the issue control logic determines whether to issue or stall the instructions that are moved to the second instruction stage based upon their particular instruction attributes and the issue control unit's previous state. During a second instruction cycle that immediately follows the first instruction cycle, the second instruction stage's instructions are either issued or stalled based upon the issue control logic's decision from the first instruction cycle.
    Type: Application
    Filed: February 25, 2008
    Publication date: June 19, 2008
    Inventors: Jonathan James DeMent, Kurt Alan Feiste, Robert Alan Philhower, David Shippy
  • Publication number: 20080140999
    Abstract: A data processing method with multiple issue multiple datapath architecture in a video signal processor (VSP) is provided. In the method, commands are received from the external signal processor. The received commands are routed to a plurality of separate command sequencers, an Input/output (IO) processor or a plurality of configure registers according to different command types. Each of the separate command sequencers packs the received commands into a plurality of instruction packets and sending the instruction packets to a plurality of instruction dispatch units, in which each of the instruction packets includes one or more instructions. The instruction packets are dispatched to respective function units for performing operations in response to the received instruction packets.
    Type: Application
    Filed: December 10, 2007
    Publication date: June 12, 2008
    Applicant: NEMOCHIPS, INC.
    Inventor: DANIAN GONG
  • Publication number: 20080140998
    Abstract: A microprocessor core includes a plurality of inputs that indicate whether a corresponding plurality of independently occurring events has occurred. The inputs are non-memory address inputs. The core also includes a yield instruction in its instruction set architecture, comprising a user-visible output operand and an explicit input operand. The input operand specifies one or more of the independently occurring events. The yield instruction instructs the microprocessor core to suspend issuing for execution instructions of a program thread until at least one of the independently occurring events specified by the input operand has occurred. The program thread contains the yield instruction. The yield instruction further instructs the microprocessor core to return a value in the output operand indicating which of the independently occurring events occurred to cause the microprocessor core to resume issuing the instructions of the program thread.
    Type: Application
    Filed: December 3, 2007
    Publication date: June 12, 2008
    Applicant: MIPS TECHNOLOGIES, INC.
    Inventor: Kevin D. Kissell
  • Publication number: 20080133885
    Abstract: A hierarchical microprocessor. An embodiment of a hierarchical microprocessor includes a plurality of first-level instruction pipeline elements; a plurality of execution clusters, where each execution cluster is operatively coupled with each of the first-level instruction pipeline elements. Each execution cluster includes a plurality of second-level instruction pipeline elements, where each of the second-level instruction pipeline elements corresponds with a respective first-level instruction pipeline element, and one or more instruction execution units operatively coupled with each of the second-level instruction pipeline elements, where the microprocessor is configured to execute multiple execution threads using the plurality of first-level instruction pipeline elements and the plurality of execution clusters.
    Type: Application
    Filed: October 31, 2007
    Publication date: June 5, 2008
    Applicant: CENTAURUS DATA LLC
    Inventor: Andrew Forsyth Glew
  • Publication number: 20080133889
    Abstract: A hierarchical instruction scheduler included in a hierarchical microprocessor comprising a plurality of execution clusters. In one embodiment, a hierarchical instruction scheduler comprises a first-level instruction scheduler configured to receive instructions for execution; store first operand status information for respective operands of the instructions; and dispatch the instructions to respective execution clusters based on the instructions' respective first operand status information.
    Type: Application
    Filed: October 31, 2007
    Publication date: June 5, 2008
    Applicant: CENTAURUS DATA LLC
    Inventor: Andrew Forsyth Glew
  • Publication number: 20080133888
    Abstract: The data processor executes an instruction having a direction for write to a reference register of other instruction flow and an instruction having a direction for reference register invalidation. The data processor is arranged as a data processor having typical functions as an integrated whole of processors (CPU1 and CPU2) which execute simple instruction flows. When executing the instruction having a direction for write to a reference register of other instruction flow, the processor confirms whether a write register is invalid. The processor waits for the register to be made invalid, if the register is not invalid, and performs write if the register is invalid. After having executed the instruction having a direction for reference register invalidation, the processor invalidates the register to which a reference has been made. When the reference register is invalid, execution of the referring instruction is suspended until it is made valid.
    Type: Application
    Filed: February 16, 2007
    Publication date: June 5, 2008
    Inventor: Fumio Arakawa
  • Patent number: 7383425
    Abstract: This invention is directed to a method and apparatus for providing low, predictable latencies in processing IP packets. The apparatus provides a specialized microprocessor or hardwired circuitry to process IP packets for video communications and control of the video source without an operating system. The method relates to operation of a microprocessor which is suitably arranged to carry out the steps of the method. The method includes details of operation of the specialized microprocessor.
    Type: Grant
    Filed: February 27, 2004
    Date of Patent: June 3, 2008
    Assignee: Pleora Technologies Inc.
    Inventors: Eric Boisvert, Alain Rivard, George Chamberlain
  • Publication number: 20080122843
    Abstract: A logic unit is provided for performing operations in multiple threads on vertex data. The logic unit comprises a macro instruction register file, a flow control instruction register file, and a flow controller. The macro instruction register file stores macro blocks with each macro block including at least one instruction. The flow control instruction register file stores flow control instructions with each flow control instruction including at least one called macro block and dependency information of the called macro block.
    Type: Application
    Filed: July 20, 2006
    Publication date: May 29, 2008
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: Hsine-Chu Chung, Ko-Fang Wang, Chit-Keng Huang
  • Patent number: 7380105
    Abstract: A method and apparatus for improving the operation of a computer processor by utilizing an asymmetric clustered processor architecture are disclosed. The asymmetric clustered processor apparatus includes a narrow cluster, a wide cluster, a steering logic utilizing a cluster predictor for providing a decoded instruction to either the narrow cluster or the wide cluster; address registers which are not part of the ISA, and a translation look-aside buffer for translating the virtual address of a load/store instruction in parallel with an execute stage. The method includes the steps of: predictably steering the instruction to either a W-bit Wide integer cluster or an N-bit Narrow integer cluster, managing the Address register file, and processing any instruction in the Wide integer cluster but processing only N-bit instructions in the Narrow integer cluster.
    Type: Grant
    Filed: June 16, 2006
    Date of Patent: May 27, 2008
    Assignee: The Regents of the University of California
    Inventors: Alexander V. Veidenbaum, Adrian Cristal Kestelman, Mateo Valero Cortes, Ruben Gonzalez Garcia
  • Patent number: 7380104
    Abstract: A method is provided for evaluating two or more instructions in an out of order issue queue during a particular cycle of the queue, to select an instruction for issue during the next following cycle. If an instruction was previously designated to issue during the particular cycle, one or more instructions in the queue are evaluated to determine if any of them are dependent on the designated instruction. For the evaluation, each instruction placed into the queue is accompanied by corresponding logic elements that provide destination to source compares for the instruction. In an embodiment comprising a method, the oldest ready instruction in the queue during a particular cycle is identified.
    Type: Grant
    Filed: April 25, 2006
    Date of Patent: May 27, 2008
    Assignee: International Business Machines Corporation
    Inventors: William Elton Burky, Raymond Cheung Yeung
  • Patent number: 7376954
    Abstract: A mechanism for assuring quality of service for a context in a digital processor has a first scheduling register dedicated to the context, the register having N out of M bits set, and a first scheduler that consults the register to assign issue slots to the context. The first scheduler grants issue slots for the context by referencing the N bits in the first register, and repeats a pattern of assignments of issue slots after referencing the M bits of the first register.
    Type: Grant
    Filed: October 10, 2003
    Date of Patent: May 20, 2008
    Assignee: MIPS Technologies, Inc.
    Inventor: Kevin D. Kissell
  • Patent number: 7370176
    Abstract: A system and method for a high frequency stall design is presented. An issue unit includes a first instruction stage, a second instruction stage, and issue control logic. During a first instruction cycle, the issue unit performs two tasks, which are 1) the instructions located in the first instruction stage are moved to a second instruction stage, and 2) the issue control logic determines whether to issue or stall the instructions that are moved to the second instruction stage based upon their particular instruction attributes and the issue control unit's previous state. During a second instruction cycle that immediately follows the first instruction cycle, the second instruction stage's instructions are either issued or stalled based upon the issue control logic's decision from the first instruction cycle.
    Type: Grant
    Filed: August 16, 2005
    Date of Patent: May 6, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jonathan James DeMent, Kurt Alan Feiste, Robert Alan Philhower, David Shippy
  • Patent number: 7366878
    Abstract: A processor buffers asynchronous threads. Current instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one math operation and at least one texture cache access operation. Instructions within each phase are qualified and prioritized, with texture cache access operations in a subsequent phase not qualified until all of the texture cache access operations in a current phase have completed. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the instructions. The instructions may also be qualified based on an age of each instruction, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute current instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.
    Type: Grant
    Filed: April 13, 2006
    Date of Patent: April 29, 2008
    Assignee: NVIDIA Corporation
    Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
  • Patent number: 7366877
    Abstract: A method for optimizing throughput in a microprocessor that is capable of processing multiple threads of instructions simultaneously. Instruction issue logic is provided between the input buffers and the pipeline of the microprocessor. The instruction issue logic speculatively issues instructions from a given thread based on the probability that the required operands will be available when the instruction reaches the stage in the pipeline where they are required. Issue of an instruction is blocked if the current pipeline conditions indicate that there is a significant probability that the instruction will need to stall in a shared resource to wait for operands. Once the probability that the instruction will stall is below a certain threshold, based on current pipeline conditions, the instruction is allowed to issue.
    Type: Grant
    Filed: September 17, 2003
    Date of Patent: April 29, 2008
    Assignee: International Business Machines Corporation
    Inventors: Victor Roberts Augsburg, Jeffrey Todd Bridges, Michael Scott McIlvaine, Thomas Andrew Sartorius, Rodney Wayne Smith
  • Patent number: 7363467
    Abstract: An apparatus and method for a processor microarchitecture that quickly and efficiently takes large steps through program segments without fetching all intervening instructions. The microarchitecture processes descriptors of trace sequences in program order so as to locate and dispatch descriptors of dependence chains that are used to fetch and execute the instructions of the dependence chain in data flow order.
    Type: Grant
    Filed: January 3, 2002
    Date of Patent: April 22, 2008
    Assignee: Intel Corporation
    Inventors: Sriram Vajapeyam, Bohuslav Rychlik, John P. Shen
  • Patent number: 7363625
    Abstract: An SMT system is designed to allow software alteration of thread priority. In one case, the system signals a change in a thread priority based on the state of instruction execution and in particular when the instruction has completed execution. To alter the priority of a thread, the software uses a special form of a “no operation” (NOP) instruction (hereafter termed thread priority NOP). When the thread priority NOP is dispatched, its special NOP is decoded in the decode unit of the IDU into an operation that writes a special code into the completion table for the thread priority NOP. A “trouble” bit is also set in the completion table that indicates which instruction group contains the thread priority NOP. The trouble bit indicates that special processing is required after instruction completion. The thread priority instruction is processed after completion using the special code to change a thread's priority.
    Type: Grant
    Filed: April 24, 2003
    Date of Patent: April 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: William E. Burky, Ronald N. Kalla, David A. Schroter, Balaram Sinharoy
  • Patent number: 7360064
    Abstract: The present invention provides a network multithreaded processor, such as a network processor, including a thread interleaver that implements fine-grained thread decisions to avoid underutilization of instruction execution resources in spite of large communication latencies. In an upper pipeline, an instruction unit determines an instruction fetch sequence responsive to an instruction queue depth on a per thread basis. In a lower pipeline, a thread interleaver determines a thread interleave sequence responsive to thread conditions including thread latency conditions. The thread interleaver selects threads using a two-level round robin arbitration. Thread latency signals are active responsive to thread latencies such as thread stalls, cache misses, and interlocks. During the subsequent one or more clock cycles, the thread is ineligible for arbitration. In one embodiment, other thread conditions affect selection decisions such as local priority, global stalls, and late stalls.
    Type: Grant
    Filed: December 10, 2003
    Date of Patent: April 15, 2008
    Assignee: Cisco Technology, Inc.
    Inventors: Donald Steiss, Earl T Cohen, John J Williams, Jr.
  • Publication number: 20080086622
    Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledge indication is asserted that corresponds to an identified replay case of the subset.
    Type: Application
    Filed: October 10, 2006
    Publication date: April 10, 2008
    Applicant: P.A. Semi, Inc.
    Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
  • Patent number: 7350056
    Abstract: An information handling system includes a processor that issues instructions out of program order. The processor includes an issue queue that may advance instructions toward issue even though some instructions in the queue are not ready-to-issue. The issue queue includes a matrix of storage cells configured in rows and columns including a first row that couples to execution units. Instructions advance toward issuance from row to row as unoccupied storage cells appear. Unoccupied cells appear when instructions advance toward the first row and upon issuance. When a particular row includes an instruction that is not ready-to-issue a stall condition occurs for that instruction. However, to prevent the entire issue queue and processor from stalling, a ready-to-issue instruction in another row may bypass the row including the stalled or not-ready-to-issue instruction. Out-of-order issuance of instructions to the execution units thus continues.
    Type: Grant
    Filed: September 27, 2005
    Date of Patent: March 25, 2008
    Assignee: International Business Machines Corporation
    Inventors: Christopher Michael Abernathy, Jonathan James DeMent, Kurt Alan Feiste, David Shippy
  • Publication number: 20080072017
    Abstract: A method for processing predetermined instructions in a processing system having a plurality of processing units includes providing a global program counter and setting a counter value of the global program counter as an instruction of the predetermined instructions is executed; assigning each processing unit a local program counter and setting a counter value of the local program counter according to a current instruction being executed by the processing unit; and enabling at least one of the processing units to execute a specific instruction of the predetermined instructions according to counter values stored in local program counters of the processing units and the global program counter.
    Type: Application
    Filed: September 19, 2006
    Publication date: March 20, 2008
    Inventor: Hsueh-Bing Yen
  • Patent number: 7343475
    Abstract: A processor including an integer processing unit and a data processing unit. The processor can be operated by a first instruction format or a second instruction format. The first instruction format includes only an instruction for the integer processing unit, and is executed in the integer processing unit alone. The second instruction format includes instructions for the integer processing unit and the data processing unit, the second instructions being executed in both the integer processing unit and the data processing unit in parallel. When an instruction in the first instruction format is to be executed, only in the integer processing unit, a control signal is generated in the integer processing unit and is supplied to the data processing unit to halt an operation of the data processing unit.
    Type: Grant
    Filed: July 12, 2006
    Date of Patent: March 11, 2008
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takashi Miyamori
  • Patent number: 7343474
    Abstract: In one embodiment, a processor comprises a plurality of pipeline stages and a first circuit operable at a first pipeline stage of the plurality of pipeline stages. The first circuit is configured to maintain a plurality of program counters (PCs), each of which corresponds to one of a plurality of threads that the processor is configured to have concurrently in process with respect to the plurality of pipeline stages. The first circuit is configured to provide a first PC to a second pipeline stage of the plurality of pipeline stages. The first PC is derived from one of the plurality of PCs that corresponds to a first thread of the plurality of threads, and a first instruction entering the second pipeline stage is from the first thread.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: March 11, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Paul J. Jordan, Robert T. Golla, Jama I. Barreh
  • Patent number: 7337303
    Abstract: A method and apparatus for controlling issue rate of instructions for an instruction thread to be executed by a processor is provided. The rate at which instructions are to be executed for an instruction thread are stored and requests are issued to cause instructions to execute in response to the stored rate. The rate at which instruction requests are issued is reduced in response to instruction executions and is increased in the absence of instruction executions. In a multi-threaded processor, instruction rate is controlled by storing the average rate at which each thread should execute instructions. A value representative of the number of instructions available and not yet issued is monitored and is decreased in response to instruction executions. Execution of instructions is prevented on a thread if the number of instructions available but not yet issued falls below a defined value. A ranking order is assigned to a plurality of instructions threads for execution on a multi-threaded processor.
    Type: Grant
    Filed: September 21, 2006
    Date of Patent: February 26, 2008
    Assignee: Imagination Technologies Limited
    Inventors: Adrian John Anderson, Martin John Woodhead
  • Publication number: 20080046694
    Abstract: A multiprocessor system includes a judging unit judging whether a read command inputted to a global address crossbar is a read command to a memory on an own system board, an executing unit speculatively executing, when the judging unit judges that the read command is a read command to the memory on the own system board, the read command before global access based on an address notified from the global address crossbar, a setting unit setting for queuing data read from the memory in a data queue provided on a CPU without queuing the data in a data queue provided on the memory, and an instructing unit instructing, based on notification from the global address crossbar, the data queue provided on the CPU to discard the data or transmit the data to the CPU.
    Type: Application
    Filed: April 20, 2007
    Publication date: February 21, 2008
    Applicant: Fujitsu Limited
    Inventors: Toshikazu Ueki, Takaharu Ishizuka, Makoto Hataida, Takashi Yamamoto, Yuka Hosokawa, Takeshi Owaki, Daisuke Itou
  • Publication number: 20080046695
    Abstract: In a system controller including a CPU-issued request queue having a circuit that processes plural requests having identical addresses not to be inputted to the CPU-issued request queue, a latest request other than a cache replace request is retained by an input-request retaining section. Consequently, even if an address of an issued request for cache replace request matches an address of a request retained by the CPU-issued request queue, the issued request for the cache replace request is not retried but is queued in the CPU-issued request queue when the address of the issued request for the cache replace request does not match the entire address retained by the input-request retaining section.
    Type: Application
    Filed: April 24, 2007
    Publication date: February 21, 2008
    Applicant: FUJITSU LIMITED
    Inventors: Takaharu Ishizuka, Toshikazu Ueki, Makoto Hataida, Takashi Yamamoto, Yuka Hosokawa, Takeshi Owaki, Daisuke Itou
  • Publication number: 20080046692
    Abstract: Instruction execution delay is alterable after the system design has been finalized, thus enabling the system to dynamically account for various conditions that impact instruction execution. In some embodiments, the dynamic delay is determined by an application to be executed by the processing system. In other embodiments, the dynamic delay is determined by analyzing the history of previously executed instructions. In yet other embodiments, the dynamic delay is determined by assessing the processing resources available to a given application. Regardless, the delay may be dynamically altered on a per-instruction, multiple instruction, or application basis. Processor instruction execution may be controlled by determining a first delay value for a first set of one or more instructions and a second delay value for a second set of one or more instructions. Execution of the sets of instructions is delayed based on the corresponding delay value.
    Type: Application
    Filed: August 16, 2006
    Publication date: February 21, 2008
    Inventors: Gerald Paul Michalak, Kenneth Alan Dockser