Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
  • Patent number: 7676657
    Abstract: Instruction dispatch in a multithreaded microprocessor such as a graphics processor is not constrained by an order among the threads. Instructions for each thread are fetched, and a dispatch circuit determines which instructions in the buffer are ready to execute. The dispatch circuit may issue any ready instruction for execution, and an instruction from one thread may be issued prior to an instruction from another thread regardless of which instruction was fetched first. If multiple functional units are available, multiple instructions can be dispatched in parallel.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: March 9, 2010
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett Coon, Simon S. Moy
  • Patent number: 7672409
    Abstract: A method of multi-user detection in a given uplink and downlink time slot in a software-defined receiver which includes filtering and sampling a received signal; forming a block-banded matrix A of the sampled signals; and solving {circumflex over (d)}=T?1y, where T=(AHA), y=AHx. The methods of solving for the matrix T includes a) computing Cholesky factors of the matrix T by approximating using the block-banded property of the matrix T and A; b) Schur decomposition for Cholesky factors of the matrix T and approximating the lower triangular Cholesky factor matrix R using block Toeplitz property of matrix T; or c) Fourier Transformation.
    Type: Grant
    Filed: July 15, 2005
    Date of Patent: March 2, 2010
    Assignee: Sandbridge Technologies, Inc.
    Inventor: Sanyogita Shamsunder
  • Patent number: 7669038
    Abstract: A method is provided for evaluating two or more instructions in an out of order issue queue during a particular cycle of the queue, to select an instruction for issue during the next following cycle. If an instruction was previously designated to issue during the particular cycle, one or more instructions in the queue are evaluated to determine if any of them are dependent on the designated instruction. For the evaluation, each instruction placed into the queue is accompanied by corresponding logic elements that provide destination to source compares for the instruction. In an embodiment comprising a method, the oldest ready instruction in the queue during a particular cycle is identified.
    Type: Grant
    Filed: May 2, 2008
    Date of Patent: February 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: William Elton Burky, Raymond Cheung Yeung
  • Patent number: 7664929
    Abstract: A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.
    Type: Grant
    Filed: September 17, 2003
    Date of Patent: February 16, 2010
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Carlos Antonio Alba Pinto, Ramanathan Sethuraman, Srinivasan Balakrishnan, Harm Johannes Antonius Maria Peters, Rafael Peset Llopis
  • Patent number: 7660969
    Abstract: A concurrent instruction dispatch apparatus includes a group indicator for each of a plurality of threads that indicates which one of a plurality of groups of the threads the thread belongs to. A group priority indicator for each group indicates an instruction dispatch priority relative to the other groups. Selection logic selects a thread for dispatching an instruction thereof based on the group and group priority indicators. A bifurcated scheduler includes first scheduler logic that issues instructions of the threads to an execution unit, second scheduler logic that enforces a thread scheduling policy, and an interface. A group indicator indicates which group each thread belongs to, a priority for each group, and execution information for each thread. The first scheduler logic issues the instructions based on the group priorities and group indicators, and the second scheduler logic updates the group indicators based on the instruction execution information.
    Type: Grant
    Filed: January 5, 2007
    Date of Patent: February 9, 2010
    Assignee: MIPS Technologies, Inc.
    Inventors: Michael Gottlieb Jensen, Ryan C. Kinter
  • Patent number: 7656412
    Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: February 2, 2010
    Assignee: Microsoft Corporation
    Inventors: Denis Demandolx, Steven White
  • Patent number: 7647518
    Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledge indication is asserted that corresponds to an identified replay case of the subset.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: January 12, 2010
    Assignee: Apple Inc.
    Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
  • Patent number: 7636836
    Abstract: A dynamic multistreaming processor has instruction queues, each instruction queue corresponding to an instruction stream, and execution units. The dynamic multistreaming processor also has a dispatch stage to select at least one instruction from one of the instruction queues and to dispatch the selected at least one instruction to one of the execution units. Lastly the dynamic multistreaming processor has a queue counter, associated with each instruction queue, for indicating the number of instructions in each queue, and a fetch counter, associated with each instruction queue, for indicating an address from which to obtain instructions when the associated instruction queue is not full. The dynamic multistreaming processor might also have fetch counters for indicating a next instruction address from which to obtain at least one instruction when the associated instruction queue is not full. The dynamic multistreaming processor could also have a second counter for indicating a next instruction address.
    Type: Grant
    Filed: July 15, 2008
    Date of Patent: December 22, 2009
    Assignee: MIPS Technologies, Inc.
    Inventors: Mario D. Nemirovsky, Adolfo M. Nemirovsky, Narendra Sankar, Enrique Musoll
  • Patent number: 7617494
    Abstract: The program to be executed is compiled by translating it into native instructions of the instruction-set architecture of the processor system, organizing the instructions deriving from the translation of the program into respective bundles in an order of successive bundles, each bundle grouping together instructions adapted to be executed in parallel by the processor system. The bundles of instructions are ordered into respective sub-bundles, said sub-bundles identifying a first set of instructions, which must be executed before the instructions belonging to the next bundle of said order, and a second set of instructions, which can be executed both before and in parallel with respect to the instructions belonging to said subsequent bundle of said order.
    Type: Grant
    Filed: July 1, 2003
    Date of Patent: November 10, 2009
    Assignee: STMicroelectronics S.r.l.
    Inventors: Fabrizio Simone Rovati, Antonio Maria Borneo, Danilo Pietro Pau
  • Patent number: 7603544
    Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.
    Type: Grant
    Filed: September 12, 2005
    Date of Patent: October 13, 2009
    Assignee: Intel Corporation
    Inventor: Thomas A. Piazza
  • Patent number: 7600221
    Abstract: A processing architecture supports executing instructions in parallel after identifying at least one level of dependency associated with a set of traces within a segment of code. Each trace represents a sequence of logical instructions within the segment of code that can be executed in a corresponding operand stack. Scheduling information is generated based on a dependency order identified among the set of traces. Thus, multiple traces may be scheduled for parallel execution unless a dependency order indicates that a second trace is dependent upon a first trace. In this instance, the first trace is executed prior to the second trace. Trace dependencies may be identified at run-time as well as prior to execution of traces in parallel. Results associated with execution of a trace are stored in a temporary buffer (instead of memory) until after it is known that a data dependency was not detected at run-time.
    Type: Grant
    Filed: October 6, 2003
    Date of Patent: October 6, 2009
    Assignee: Sun Microsystems, Inc.
    Inventor: Achutha Raman Rangachari
  • Patent number: 7594078
    Abstract: A method and apparatus for D-cache miss prediction and scheduling is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group resulted in a cache miss during a previous execution of the first instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 9, 2006
    Date of Patent: September 22, 2009
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Publication number: 20090228687
    Abstract: A processor includes: an instruction buffer which holds a group of instructions that can be executed in parallel; an instruction decoding unit which decodes part or all of the group of instructions; and an instruction issuance control unit which detects whether or not a factor obstructing simultaneous execution of the group of instructions exists in the group of instructions and supplies the group of instructions to the instruction decoding unit by controlling the instruction buffer so that the instructions of the group of instructions are sequentially supplied when the factor exists and all the instructions of the group of instructions are simultaneously supplied when the factor does not exist.
    Type: Application
    Filed: March 9, 2006
    Publication date: September 10, 2009
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventor: Tetsu Hosoki
  • Publication number: 20090217001
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Application
    Filed: May 6, 2009
    Publication date: August 27, 2009
    Inventors: Cheryl D. Senter, Johannes Wang
  • Publication number: 20090210656
    Abstract: A system and method for overlapping execution (OE) of instructions through non-uniform execution pipelines in an in-order processor are provided. The system includes a first execution unit to perform instruction execution in a first execution pipeline. The system also includes a second execution unit to perform instruction execution in a second execution pipeline, where the second execution pipeline includes a greater number of stages than the first execution pipeline. The system further includes an instruction dispatch unit (IDU), the IDU including OE registers and logic for dispatching an OE-capable instruction to the first execution unit such that the instruction completes execution prior to completing execution of a previously dispatched instruction to the second execution unit. The system additionally includes a latch to hold a result of the execution of the OE-capable instruction until after the second execution unit completes the execution of the previously dispatched instruction.
    Type: Application
    Filed: February 20, 2008
    Publication date: August 20, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David S. Hutton, Khary J. Alexander, Fadi Y. Busaba, Bruce C. Giamei, John G. Rell, JR., Eric M. Schwarz, Chung-Lung Kevin Shum
  • Publication number: 20090210667
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Application
    Filed: February 19, 2008
    Publication date: August 20, 2009
    Inventor: David A. Luick
  • Publication number: 20090210665
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. The method, among others, can be broadly summarized by the following steps: receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Application
    Filed: February 19, 2008
    Publication date: August 20, 2009
    Inventors: Jeffrey P. Bradford, David A. Luick
  • Publication number: 20090210666
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: receive an issue group of instructions; determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; determine if there is a issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one load instruction in a different execution pipeline; and schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Application
    Filed: February 19, 2008
    Publication date: August 20, 2009
    Inventor: David A. Luick
  • Publication number: 20090204792
    Abstract: Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times. Separate processor cores may be morphed to appear differently for different applications. For example, two processor cores each capable of executing N-wide issue groups of instructions may be morphed to appear as a single processor core capable of executing 2N-wide issue groups.
    Type: Application
    Filed: February 13, 2008
    Publication date: August 13, 2009
    Inventor: David A. Luick
  • Patent number: 7571301
    Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.
    Type: Grant
    Filed: March 31, 2006
    Date of Patent: August 4, 2009
    Assignee: Intel Corporation
    Inventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee
  • Publication number: 20090182987
    Abstract: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.
    Type: Application
    Filed: January 11, 2008
    Publication date: July 16, 2009
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Publication number: 20090177868
    Abstract: An apparatus, system, and method are disclosed for discontiguous multiple issue of instructions. An assignment unit assigns a plurality of instruction blocks to a plurality of issue units. The plurality of issue units each comprises a renaming map that maps each architecturally visible register address to a rename register. Each issue unit maps each architecturally visible register in the decoded instruction to a register placeholder if the renaming map entry for that architecturally visible register is invalid else maps the architecturally visible register in the decoded instruction to a rename register if the rename register entry is valid. Each issue unit further receives predecessor mapping information from the renaming map of the issue unit's predecessor issue unit in response to the assignment unit identifying a relationship with the predecessor issue unit and the final mapping information being available from the predecessor issue unit.
    Type: Application
    Filed: January 3, 2008
    Publication date: July 9, 2009
    Inventor: Russell Lee Lewis
  • Publication number: 20090172359
    Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.
    Type: Application
    Filed: December 31, 2007
    Publication date: July 2, 2009
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Gene Shen, Sean Lie
  • Publication number: 20090164758
    Abstract: A mechanism for performing locked operations in a processing unit. A dispatch unit may dispatch a plurality of instructions including a locked instruction and a plurality of non-locked instructions. One or more of the non-locked instructions may be dispatched before and after the locked instruction. An execution unit may execute the plurality of instructions including the non-locked and locked instruction. A retirement unit may retire the locked instruction after execution of the locked instruction. During retirement, the processing unit may begin enforcing a previously obtained exclusive ownership of a cache line accessed by the locked instruction. Furthermore, the processing unit may stall the retirement of the one or more non-locked instructions dispatched after the locked instruction until after the writeback operation for the locked instruction is completed.
    Type: Application
    Filed: December 20, 2007
    Publication date: June 25, 2009
    Inventor: Michael J. Haertel
  • Publication number: 20090113181
    Abstract: A method and apparatus for executing instructions in a processor are provided. In one embodiment of the invention, the method includes receiving a plurality of instructions. The plurality of instructions includes first instructions in a first thread and second instructions in a second thread. The method further includes forming a common issue group including an instruction of a first instruction type and an instruction of a second instruction type. The method also includes issuing the common issue group to a first execution unit and a second execution unit. The instruction of the first instruction type is issued to the first execution unit and the instruction of the second instruction type is issued to the second execution unit.
    Type: Application
    Filed: October 24, 2007
    Publication date: April 30, 2009
    Inventors: Miguel Comparan, Brent Francis Hilgart, Brian Lee Koehler, Eric Oliver Mejdrich, Adam James Muff, Alfred Thomas Watson, III
  • Publication number: 20090106534
    Abstract: A system and computer-implementable method for implementing software-supported thread assist within a data processing system, wherein the data processing system supports processing instructions within at least a first thread and a second thread. An instruction dispatch unit (IDU) places the first thread into a sleep mode. The IDU separates an instruction stream for the second thread into at least a first independent instruction stream and a second independent instruction stream. The first independent instruction stream is processed utilizing facilities allocated to the first thread and the second independent instruction stream is processed utilizing facilities allocated to the second thread.
    Type: Application
    Filed: October 23, 2007
    Publication date: April 23, 2009
    Inventors: Hung Q. Le, Dung Q. Nguyen
  • Publication number: 20090089551
    Abstract: Provided are a method and apparatus for avoiding bank conflict. A first instruction that is one of access instructions that are predicted to cause the bank conflict is replaced with a second instruction by changing an execute timing of the first instruction to a timing prior to the execute timing of the first instruction so as for the access instructions not to cause the bank conflict. Next, a load/store unit that is scheduled to access the bank according to the first instruction accesses the bank and reads out a data from the bank at an execute timing of the second instruction, and after that, the load/store unit is allowed to be inputted the read data at the execute timing of the first instruction. Accordingly, although the access instructions that are predicted to cause the bank conflict are allocated to the load/store units, the bank conflict can be prevented, so that it is possible to avoid deterioration in performance due the occurrence of the bank conflict.
    Type: Application
    Filed: February 27, 2008
    Publication date: April 2, 2009
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Kyoung-june Min, Chan-min Park, Suk-jin Kim, Won-jong Lee, Kwon-taek Kwon, Hee-seok Kim
  • Patent number: 7509483
    Abstract: A computing architecture and software techniques are described which modifies the basic sequential instruction fetching mechanism of a processor by separating a program's control flow from its functional execution flow. A compiled sequential HLL program's static control structures are analyzed and a separate program based on its own unique instructions is created that primarily generates addresses for the selection of functional execution instructions. The original program is now represented by an instruction fetch program and a set of function/logic execution instructions. This basic split allows multiple instruction addresses to be generated in parallel to access multiple instruction memories. These multiple instruction memories contain only the function/logic instructions of the program and no control structure operations such as branches or calls. All the original program's control instructions are split from the original program and used to create the instruction addressing program.
    Type: Grant
    Filed: February 22, 2007
    Date of Patent: March 24, 2009
    Assignee: Renesky Tap III, Limited Liability Company
    Inventor: Gerald George Pechanek
  • Patent number: 7509482
    Abstract: A memory device stores entries waiting to be processed. Row numbers of matrix information correspond to storage positions within the memory device, column numbers correspond to positions within the order of the entries, and every matrix element corresponding to the storage position and the position within the order of the entry stored in this storage position has a predetermined value. An operation between the first vector information indicating storage positions of processable entries and each column of the matrix information is performed and the second vector information indicating positions within the order of the processable entries is generated. Then, a position to be processed is selected from among the positions of processable entries indicated by the second vector information, an element having the predetermined value in the column corresponding to the selected position is obtained, and an entry in the storage position corresponding to the element is processed.
    Type: Grant
    Filed: June 16, 2006
    Date of Patent: March 24, 2009
    Assignee: Fujitsu Limited
    Inventors: Takuji Takahashi, Masahiro Kuramoto
  • Publication number: 20090070559
    Abstract: A data processing circuit comprises a register file (14) having read ports and write ports. A plurality of functional units (21 a-c), is coupled to receive operand data from a same combination of read ports. Each functional unit is coupled to a respective one of the write ports for writing a respective result. An instruction issue slot has outputs (11) for supplying register selection information to said combination read ports and to the respective ones of the write ports. The output of the issue slot also supplies an operation code. The functional units (21 a-c) in the plurality are arranged to respond to at least to one value of the operation code by each executing a respective operation using the same operands from said same combination and each functional unit producing a respective result at a respective ones of the write ports.
    Type: Application
    Filed: September 21, 2005
    Publication date: March 12, 2009
    Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.
    Inventor: Antonius Adrianus Maria Van Wel
  • Publication number: 20090049279
    Abstract: The present invention provides a network multithreaded processor, such as a network processor, including a thread interleaver that implements fine-grained thread decisions to avoid underutilization of instruction execution resources in spite of large communication latencies. In an upper pipeline, an instruction unit determines an-instruction fetch sequence responsive to an instruction queue depth on a per thread basis. In a lower pipeline, a thread interleaver determines a thread interleave sequence responsive to thread conditions including thread latency conditions. The thread interleaver selects threads using a two-level round robin arbitration. Thread latency signals are active responsive to thread latencies such as thread stalls, cache misses, and interlocks. During the subsequent one or more clock cycles, the thread is ineligible for arbitration. In one embodiment, other thread conditions affect selection decisions such as local priority, global stalls, and late stalls.
    Type: Application
    Filed: April 14, 2008
    Publication date: February 19, 2009
    Inventors: Donald E. Steiss, Earl T. Cohen, John J. Williams
  • Publication number: 20090031114
    Abstract: To guarantee response time while strictly maintaining the priority specified by software, a processor (1) which is a multithread processor having a thread multiplexer (10), and an issue information buffer (ISINF). An instruction code, and issue information (isid) for instructions issued at and after the next operating cycle which is added to the instruction code, are supplied to the thread multiplexer. The issue information is valid from the second and subsequent instruction flows, and is saved temporarily in an issue information buffer. This issue information is for example the position of an operating cycle which can issue a high priority instruction, i.e., information showing a slot. The thread multiplexer issues a low priority instruction at another operating cycle at which a high priority instruction is not issued according to the issue information.
    Type: Application
    Filed: September 30, 2008
    Publication date: January 29, 2009
    Inventor: FUMIO ARAKAWA
  • Patent number: 7475223
    Abstract: An improved method, apparatus, and computer instructions for grouping instructions. A set of instructions is received for placement into an instruction cache in the data processing system. Instructions in the set of instructions are grouped into a dispatch grouping of instructions prior to the set of instructions being placed in the instruction cache.
    Type: Grant
    Filed: February 3, 2005
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Brian R. Konigsburg, Hung Qui Le, David Stephen Levitan, John Wesley Ward, III
  • Publication number: 20090006816
    Abstract: A VLIW processor has a hierarchy of functional unit clusters that communicate through explicit control in the instruction stream and store data in register files at each level of the hierarchy. Explicit instructions transfer values between sub-clusters through a cluster level switch network. Transfer instructions issue in dedicated instruction issue slots in parallel with instructions that perform computation in functional units. The switch network can perform permutations on the data being moved. The switch network enables for operands to be broadcast between the sub-clusters, global register file and memory.
    Type: Application
    Filed: June 27, 2007
    Publication date: January 1, 2009
    Inventors: David J. Hoyle, Amitabh Menon
  • Patent number: 7472257
    Abstract: Processor (100) has a plurality of registers (120) for storing instructions for execution by the plurality of execution units (160). The plurality of registers (120) are coupled to the plurality of execution units (160) via distribution means (140). Distribution means (140) have a plurality of dispatch units (144) coupled to the plurality of execution units (160) and a reroutable network, e.g. a data communication bus (142), coupling the plurality of execution units (120) to the plurality of dispatch units (144). The data communication bus (142) is controlled by control unit (148). Dispatch units (144) are arranged to detect dedicated instructions in the instruction flow, which signal the beginning of an inactive period of an execution unit (160a, 160b, 160c, 160d) in the plurality of execution units (160).
    Type: Grant
    Filed: November 20, 2002
    Date of Patent: December 30, 2008
    Assignee: NXP B.V.
    Inventor: Francesco Pessolano
  • Publication number: 20080313433
    Abstract: A processor is disclosed including several features allowing the processor to simultaneously execute instructions of multiple conditional execution instruction groups. Each conditional execution instruction group includes a conditional execution instruction and a code block specified by the conditional execution instruction. In one embodiment, the processor includes multiple registers for storing marking data pertaining to a number of instructions in each of multiple execution pipeline stages. In another embodiment, the processor includes write enable logic and an execution unit. The write enable logic produces write enable signals dependent upon received attributes, and the execution unit saves results of instructions of conditional execution instruction groups dependent upon the write enable signals.
    Type: Application
    Filed: August 21, 2008
    Publication date: December 18, 2008
    Applicant: VeriSilicon Holdings (Cayman Islands) Co. Ltd.
    Inventors: Hung Nguyen, Shannon Wichman
  • Patent number: 7457938
    Abstract: In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: November 25, 2008
    Assignee: Intel Corporation
    Inventors: Stephan Jourdan, Avinash Sodani, Michael Fetterman, Per Hammarlund, Ronak Singhal, Glenn Hinton
  • Publication number: 20080288745
    Abstract: A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more predicate values corresponding to any detected conflict between the memory addresses, where a given predicate value indicates elements in at least the portion of the vector that can be processed in parallel. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more predicate values.
    Type: Application
    Filed: July 11, 2008
    Publication date: November 20, 2008
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 7454597
    Abstract: A processor core and method of executing instructions, both of which utilizes schedules, are presented. Each of the schedules includes a sequence of instructions, an address of a first of the instructions in the schedule, an order vector of an original order of the instructions in the schedule, a rename map of registers for each register in the schedule, and a list of register names used in the schedule. The schedule exploits instruction-level parallelism in executing out-of-order instructions. The processor core includes a schedule cache that is configured to store schedules, a shared cache configured to store both I-side and D-side cache data, and an execution resource for requesting a schedule to be executed from the schedule cache. The processor core further includes a scheduler disposed between the schedule cache and the cache.
    Type: Grant
    Filed: January 2, 2007
    Date of Patent: November 18, 2008
    Assignee: International Business Machines Corporation
    Inventors: Krishnan K. Kailas, Ravi Nair, Sumedh W. Sathaye, Wolfram Sauer, John-David Wellman
  • Patent number: 7447879
    Abstract: A method and apparatus for minimizing unscheduled D-cache miss pipeline stalls is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group is a load instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is not delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 9, 2006
    Date of Patent: November 4, 2008
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7447887
    Abstract: To guarantee response time while strictly maintaining the priority specified by software, a processor (1) which is a multithread processor having a thread multiplexer (10), and an issue information buffer (ISINF). An instruction code, and issue information (isid) for instructions issued at and after the next operating cycle which is added to the instruction code, are supplied to the thread multiplexer. The issue information is valid from the second and subsequent instruction flows, and is saved temporarily in an issue information buffer. This issue information is for example the position of an operating cycle which can issue a high priority instruction, i.e., information showing a slot. The thread multiplexer issues a low priority instruction at another operating cycle at which a high priority instruction is not issued according to the issue information.
    Type: Grant
    Filed: July 7, 2006
    Date of Patent: November 4, 2008
    Assignee: Hitachi, Ltd.
    Inventor: Fumio Arakawa
  • Publication number: 20080263330
    Abstract: A processor has an interface portion and an internal environment. The interface portion comprises at least one port. The internal environment comprises an execution unit arranged to execute instructions in dependence on a first timing signal and to transfer data between the interior portion and the at least one port in dependence on the first timing signal; and a thread scheduler for scheduling a plurality of threads for execution by the execution unit, each thread comprising a sequence of instructions and the thread scheduler being arranged to schedule the threads in dependence on the first timing signal. The port is arranged to transfer data between the port and an external environment in dependence on a second timing signal, and to alter a ready signal in dependence on the second timing signal to indicate a transfer of data with the external environment. The thread scheduler is configured to schedule one or more associated threads for execution in dependence on the ready signal.
    Type: Application
    Filed: April 17, 2007
    Publication date: October 23, 2008
    Inventors: Michael David May, Peter Hedinger, Alastair Dixon
  • Patent number: 7441098
    Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.
    Type: Grant
    Filed: May 6, 2005
    Date of Patent: October 21, 2008
    Assignee: Broadcom Corporation
    Inventor: Sophie Wilson
  • Patent number: 7437544
    Abstract: A data processing apparatus and method are provided for executing a sequence of instructions including at least one multiple iteration instruction. The data processing apparatus comprises an instruction store for storing the sequence of instructions, and a processing unit for executing the sequence of instructions, the processing unit comprising at least a first processing path and a second processing path to enable at least two instructions of the sequence to be executed in parallel. When executing instructions in parallel, the first processing path executes an instruction which is earlier in the sequence than the instruction executing in the second processing path. The processing unit is operable when executing a multiple iteration instruction to allow a first iteration of the multiple iteration instruction to be executed in either the first processing path or the second processing path, but to cause all remaining iterations of the multiple iteration instruction to be executed in the first processing path.
    Type: Grant
    Filed: April 29, 2005
    Date of Patent: October 14, 2008
    Assignee: ARM Limited
    Inventors: Ann Sekli Chin, David James Williamson
  • Patent number: 7430651
    Abstract: A tag monitoring system for assigning tags to instructions. A source supplies instructions to be executed by a functional unit. A register file stores information required for the execution of each instruction. A queue having a plurality of slots containing tags which are used for tagging the instructions. The tags are arranged in the queue in an order specified by the program order of their corresponding instructions. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file stores an instruction's information at a location in the register file defined by the tag assigned to that instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.
    Type: Grant
    Filed: January 25, 2006
    Date of Patent: September 30, 2008
    Assignee: Seiko-Epson Corporation
    Inventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
  • Patent number: 7430653
    Abstract: A pipelined instruction dispatch or grouping circuit allows instruction dispatch decisions to be made over multiple processor cycles. In one embodiment, the grouping circuit performs resource allocation and data dependency checks on an instruction group, based on a state vector which includes representation of source and destination registers of instructions within said instruction group and corresponding state vectors for instruction groups of a number of preceding processor cycles.
    Type: Grant
    Filed: August 2, 1999
    Date of Patent: September 30, 2008
    Assignee: Sun Microsystems, Inc.
    Inventor: Marc Tremblay
  • Patent number: 7430643
    Abstract: The present invention provides a method and apparatus for increased efficiency for translation lookaside buffers by collapsing redundant translation table entries into a single translation table entry (TTE). In the present invention, each thread of a multithreaded processor is provided with multiple context registers. Each of these context registers is compared independently to the context of the TTE. If any of the contexts match (and the other match conditions are satisfied), then the translation is allowed to proceed. Two applications attempting to share one page but that still keep separate pages can then employ three total contexts. One context is for one application's private use; one of the contexts is for the other application's private use; and a third context is for the shared page. In one embodiment of the invention, two contexts are implemented per thread. However, the teachings of the present invention can be extended to a higher number of contexts per thread.
    Type: Grant
    Filed: December 30, 2004
    Date of Patent: September 30, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Paul J. Jordan, William J. Kucharski, Roman M. Zajcew, Ashley N. Saulsbury, Quinn A. Jacobson
  • Publication number: 20080222336
    Abstract: To allow to use arithmetic circuits of sharable resources by priority with a simple procedure. In a data processing system including central processing units and a plurality of arithmetic circuits, wherein the central processing units are able to supply a command to one arithmetic circuit based on one fetched instruction and supply a command to other arithmetic circuit based on other fetched instruction, a memory circuit is provided which is used to store first information indicating which arithmetic circuit is executing a command, and second information indicating which central processing unit has reserved the arithmetic circuit for execution of the next command. When the arithmetic circuit is already executing a command, reservation of the arithmetic circuit for execution of the next command using the second information of the memory circuit, makes it possible, after the execution, to assign operation commands fast to the arithmetic circuits and cause them to execute the commands.
    Type: Application
    Filed: January 14, 2008
    Publication date: September 11, 2008
    Inventors: Yoshikazu KIYOSHIGE, Shunichi Iwata, Kesami Hagiwara, Akihiko Tomita
  • Patent number: 7421571
    Abstract: A multi-threaded processor is provided. The multi-threading processor includes a first instruction fetch unit and a second instruction fetch unit. A multi-thread scheduler unit is coupled to the first instruction fetch unit and the second instruction fetch unit. An execution unit, which executes a first active thread and a second active thread is coupled to the scheduler unit. The multi-threading processor also includes a register file coupled to the execution unit. The register file switches one of the first active thread and the second active threads with a first inactive thread.
    Type: Grant
    Filed: August 25, 2006
    Date of Patent: September 2, 2008
    Assignee: Intel Corporation
    Inventor: Ken Shoemaker
  • Patent number: RE41012
    Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibility in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.
    Type: Grant
    Filed: June 3, 2004
    Date of Patent: November 24, 2009
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand