Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
-
Patent number: 7676657Abstract: Instruction dispatch in a multithreaded microprocessor such as a graphics processor is not constrained by an order among the threads. Instructions for each thread are fetched, and a dispatch circuit determines which instructions in the buffer are ready to execute. The dispatch circuit may issue any ready instruction for execution, and an instruction from one thread may be issued prior to an instruction from another thread regardless of which instruction was fetched first. If multiple functional units are available, multiple instructions can be dispatched in parallel.Type: GrantFiled: October 10, 2006Date of Patent: March 9, 2010Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Brett Coon, Simon S. Moy
-
Patent number: 7672409Abstract: A method of multi-user detection in a given uplink and downlink time slot in a software-defined receiver which includes filtering and sampling a received signal; forming a block-banded matrix A of the sampled signals; and solving {circumflex over (d)}=T?1y, where T=(AHA), y=AHx. The methods of solving for the matrix T includes a) computing Cholesky factors of the matrix T by approximating using the block-banded property of the matrix T and A; b) Schur decomposition for Cholesky factors of the matrix T and approximating the lower triangular Cholesky factor matrix R using block Toeplitz property of matrix T; or c) Fourier Transformation.Type: GrantFiled: July 15, 2005Date of Patent: March 2, 2010Assignee: Sandbridge Technologies, Inc.Inventor: Sanyogita Shamsunder
-
Method and apparatus for back to back issue of dependent instructions in an out of order issue queue
Patent number: 7669038Abstract: A method is provided for evaluating two or more instructions in an out of order issue queue during a particular cycle of the queue, to select an instruction for issue during the next following cycle. If an instruction was previously designated to issue during the particular cycle, one or more instructions in the queue are evaluated to determine if any of them are dependent on the designated instruction. For the evaluation, each instruction placed into the queue is accompanied by corresponding logic elements that provide destination to source compares for the instruction. In an embodiment comprising a method, the oldest ready instruction in the queue during a particular cycle is identified.Type: GrantFiled: May 2, 2008Date of Patent: February 23, 2010Assignee: International Business Machines CorporationInventors: William Elton Burky, Raymond Cheung Yeung -
Patent number: 7664929Abstract: A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.Type: GrantFiled: September 17, 2003Date of Patent: February 16, 2010Assignee: Koninklijke Philips Electronics N.V.Inventors: Carlos Antonio Alba Pinto, Ramanathan Sethuraman, Srinivasan Balakrishnan, Harm Johannes Antonius Maria Peters, Rafael Peset Llopis
-
Patent number: 7660969Abstract: A concurrent instruction dispatch apparatus includes a group indicator for each of a plurality of threads that indicates which one of a plurality of groups of the threads the thread belongs to. A group priority indicator for each group indicates an instruction dispatch priority relative to the other groups. Selection logic selects a thread for dispatching an instruction thereof based on the group and group priority indicators. A bifurcated scheduler includes first scheduler logic that issues instructions of the threads to an execution unit, second scheduler logic that enforces a thread scheduling policy, and an interface. A group indicator indicates which group each thread belongs to, a priority for each group, and execution information for each thread. The first scheduler logic issues the instructions based on the group priorities and group indicators, and the second scheduler logic updates the group indicators based on the instruction execution information.Type: GrantFiled: January 5, 2007Date of Patent: February 9, 2010Assignee: MIPS Technologies, Inc.Inventors: Michael Gottlieb Jensen, Ryan C. Kinter
-
Patent number: 7656412Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.Type: GrantFiled: December 21, 2005Date of Patent: February 2, 2010Assignee: Microsoft CorporationInventors: Denis Demandolx, Steven White
-
Patent number: 7647518Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledge indication is asserted that corresponds to an identified replay case of the subset.Type: GrantFiled: October 10, 2006Date of Patent: January 12, 2010Assignee: Apple Inc.Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
-
Patent number: 7636836Abstract: A dynamic multistreaming processor has instruction queues, each instruction queue corresponding to an instruction stream, and execution units. The dynamic multistreaming processor also has a dispatch stage to select at least one instruction from one of the instruction queues and to dispatch the selected at least one instruction to one of the execution units. Lastly the dynamic multistreaming processor has a queue counter, associated with each instruction queue, for indicating the number of instructions in each queue, and a fetch counter, associated with each instruction queue, for indicating an address from which to obtain instructions when the associated instruction queue is not full. The dynamic multistreaming processor might also have fetch counters for indicating a next instruction address from which to obtain at least one instruction when the associated instruction queue is not full. The dynamic multistreaming processor could also have a second counter for indicating a next instruction address.Type: GrantFiled: July 15, 2008Date of Patent: December 22, 2009Assignee: MIPS Technologies, Inc.Inventors: Mario D. Nemirovsky, Adolfo M. Nemirovsky, Narendra Sankar, Enrique Musoll
-
Patent number: 7617494Abstract: The program to be executed is compiled by translating it into native instructions of the instruction-set architecture of the processor system, organizing the instructions deriving from the translation of the program into respective bundles in an order of successive bundles, each bundle grouping together instructions adapted to be executed in parallel by the processor system. The bundles of instructions are ordered into respective sub-bundles, said sub-bundles identifying a first set of instructions, which must be executed before the instructions belonging to the next bundle of said order, and a second set of instructions, which can be executed both before and in parallel with respect to the instructions belonging to said subsequent bundle of said order.Type: GrantFiled: July 1, 2003Date of Patent: November 10, 2009Assignee: STMicroelectronics S.r.l.Inventors: Fabrizio Simone Rovati, Antonio Maria Borneo, Danilo Pietro Pau
-
Patent number: 7603544Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.Type: GrantFiled: September 12, 2005Date of Patent: October 13, 2009Assignee: Intel CorporationInventor: Thomas A. Piazza
-
Patent number: 7600221Abstract: A processing architecture supports executing instructions in parallel after identifying at least one level of dependency associated with a set of traces within a segment of code. Each trace represents a sequence of logical instructions within the segment of code that can be executed in a corresponding operand stack. Scheduling information is generated based on a dependency order identified among the set of traces. Thus, multiple traces may be scheduled for parallel execution unless a dependency order indicates that a second trace is dependent upon a first trace. In this instance, the first trace is executed prior to the second trace. Trace dependencies may be identified at run-time as well as prior to execution of traces in parallel. Results associated with execution of a trace are stored in a temporary buffer (instead of memory) until after it is known that a data dependency was not detected at run-time.Type: GrantFiled: October 6, 2003Date of Patent: October 6, 2009Assignee: Sun Microsystems, Inc.Inventor: Achutha Raman Rangachari
-
Patent number: 7594078Abstract: A method and apparatus for D-cache miss prediction and scheduling is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group resulted in a cache miss during a previous execution of the first instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 9, 2006Date of Patent: September 22, 2009Assignee: International Business Machines CorporationInventor: David A. Luick
-
Publication number: 20090228687Abstract: A processor includes: an instruction buffer which holds a group of instructions that can be executed in parallel; an instruction decoding unit which decodes part or all of the group of instructions; and an instruction issuance control unit which detects whether or not a factor obstructing simultaneous execution of the group of instructions exists in the group of instructions and supplies the group of instructions to the instruction decoding unit by controlling the instruction buffer so that the instructions of the group of instructions are sequentially supplied when the factor exists and all the instructions of the group of instructions are simultaneously supplied when the factor does not exist.Type: ApplicationFiled: March 9, 2006Publication date: September 10, 2009Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.Inventor: Tetsu Hosoki
-
Publication number: 20090217001Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.Type: ApplicationFiled: May 6, 2009Publication date: August 27, 2009Inventors: Cheryl D. Senter, Johannes Wang
-
Publication number: 20090210656Abstract: A system and method for overlapping execution (OE) of instructions through non-uniform execution pipelines in an in-order processor are provided. The system includes a first execution unit to perform instruction execution in a first execution pipeline. The system also includes a second execution unit to perform instruction execution in a second execution pipeline, where the second execution pipeline includes a greater number of stages than the first execution pipeline. The system further includes an instruction dispatch unit (IDU), the IDU including OE registers and logic for dispatching an OE-capable instruction to the first execution unit such that the instruction completes execution prior to completing execution of a previously dispatched instruction to the second execution unit. The system additionally includes a latch to hold a result of the execution of the OE-capable instruction until after the second execution unit completes the execution of the previously dispatched instruction.Type: ApplicationFiled: February 20, 2008Publication date: August 20, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David S. Hutton, Khary J. Alexander, Fadi Y. Busaba, Bruce C. Giamei, John G. Rell, JR., Eric M. Schwarz, Chung-Lung Kevin Shum
-
Publication number: 20090210667Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.Type: ApplicationFiled: February 19, 2008Publication date: August 20, 2009Inventor: David A. Luick
-
Publication number: 20090210665Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. The method, among others, can be broadly summarized by the following steps: receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit.Type: ApplicationFiled: February 19, 2008Publication date: August 20, 2009Inventors: Jeffrey P. Bradford, David A. Luick
-
Publication number: 20090210666Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: receive an issue group of instructions; determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; determine if there is a issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one load instruction in a different execution pipeline; and schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.Type: ApplicationFiled: February 19, 2008Publication date: August 20, 2009Inventor: David A. Luick
-
Publication number: 20090204792Abstract: Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times. Separate processor cores may be morphed to appear differently for different applications. For example, two processor cores each capable of executing N-wide issue groups of instructions may be morphed to appear as a single processor core capable of executing 2N-wide issue groups.Type: ApplicationFiled: February 13, 2008Publication date: August 13, 2009Inventor: David A. Luick
-
Patent number: 7571301Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.Type: GrantFiled: March 31, 2006Date of Patent: August 4, 2009Assignee: Intel CorporationInventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee
-
Publication number: 20090182987Abstract: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.Type: ApplicationFiled: January 11, 2008Publication date: July 16, 2009Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
-
Publication number: 20090177868Abstract: An apparatus, system, and method are disclosed for discontiguous multiple issue of instructions. An assignment unit assigns a plurality of instruction blocks to a plurality of issue units. The plurality of issue units each comprises a renaming map that maps each architecturally visible register address to a rename register. Each issue unit maps each architecturally visible register in the decoded instruction to a register placeholder if the renaming map entry for that architecturally visible register is invalid else maps the architecturally visible register in the decoded instruction to a rename register if the rename register entry is valid. Each issue unit further receives predecessor mapping information from the renaming map of the issue unit's predecessor issue unit in response to the assignment unit identifying a relationship with the predecessor issue unit and the final mapping information being available from the predecessor issue unit.Type: ApplicationFiled: January 3, 2008Publication date: July 9, 2009Inventor: Russell Lee Lewis
-
Publication number: 20090172359Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.Type: ApplicationFiled: December 31, 2007Publication date: July 2, 2009Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Gene Shen, Sean Lie
-
Publication number: 20090164758Abstract: A mechanism for performing locked operations in a processing unit. A dispatch unit may dispatch a plurality of instructions including a locked instruction and a plurality of non-locked instructions. One or more of the non-locked instructions may be dispatched before and after the locked instruction. An execution unit may execute the plurality of instructions including the non-locked and locked instruction. A retirement unit may retire the locked instruction after execution of the locked instruction. During retirement, the processing unit may begin enforcing a previously obtained exclusive ownership of a cache line accessed by the locked instruction. Furthermore, the processing unit may stall the retirement of the one or more non-locked instructions dispatched after the locked instruction until after the writeback operation for the locked instruction is completed.Type: ApplicationFiled: December 20, 2007Publication date: June 25, 2009Inventor: Michael J. Haertel
-
Publication number: 20090113181Abstract: A method and apparatus for executing instructions in a processor are provided. In one embodiment of the invention, the method includes receiving a plurality of instructions. The plurality of instructions includes first instructions in a first thread and second instructions in a second thread. The method further includes forming a common issue group including an instruction of a first instruction type and an instruction of a second instruction type. The method also includes issuing the common issue group to a first execution unit and a second execution unit. The instruction of the first instruction type is issued to the first execution unit and the instruction of the second instruction type is issued to the second execution unit.Type: ApplicationFiled: October 24, 2007Publication date: April 30, 2009Inventors: Miguel Comparan, Brent Francis Hilgart, Brian Lee Koehler, Eric Oliver Mejdrich, Adam James Muff, Alfred Thomas Watson, III
-
System and Method for Implementing a Software-Supported Thread Assist Mechanism for a Microprocessor
Publication number: 20090106534Abstract: A system and computer-implementable method for implementing software-supported thread assist within a data processing system, wherein the data processing system supports processing instructions within at least a first thread and a second thread. An instruction dispatch unit (IDU) places the first thread into a sleep mode. The IDU separates an instruction stream for the second thread into at least a first independent instruction stream and a second independent instruction stream. The first independent instruction stream is processed utilizing facilities allocated to the first thread and the second independent instruction stream is processed utilizing facilities allocated to the second thread.Type: ApplicationFiled: October 23, 2007Publication date: April 23, 2009Inventors: Hung Q. Le, Dung Q. Nguyen -
Publication number: 20090089551Abstract: Provided are a method and apparatus for avoiding bank conflict. A first instruction that is one of access instructions that are predicted to cause the bank conflict is replaced with a second instruction by changing an execute timing of the first instruction to a timing prior to the execute timing of the first instruction so as for the access instructions not to cause the bank conflict. Next, a load/store unit that is scheduled to access the bank according to the first instruction accesses the bank and reads out a data from the bank at an execute timing of the second instruction, and after that, the load/store unit is allowed to be inputted the read data at the execute timing of the first instruction. Accordingly, although the access instructions that are predicted to cause the bank conflict are allocated to the load/store units, the bank conflict can be prevented, so that it is possible to avoid deterioration in performance due the occurrence of the bank conflict.Type: ApplicationFiled: February 27, 2008Publication date: April 2, 2009Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Kyoung-june Min, Chan-min Park, Suk-jin Kim, Won-jong Lee, Kwon-taek Kwon, Hee-seok Kim
-
Patent number: 7509483Abstract: A computing architecture and software techniques are described which modifies the basic sequential instruction fetching mechanism of a processor by separating a program's control flow from its functional execution flow. A compiled sequential HLL program's static control structures are analyzed and a separate program based on its own unique instructions is created that primarily generates addresses for the selection of functional execution instructions. The original program is now represented by an instruction fetch program and a set of function/logic execution instructions. This basic split allows multiple instruction addresses to be generated in parallel to access multiple instruction memories. These multiple instruction memories contain only the function/logic instructions of the program and no control structure operations such as branches or calls. All the original program's control instructions are split from the original program and used to create the instruction addressing program.Type: GrantFiled: February 22, 2007Date of Patent: March 24, 2009Assignee: Renesky Tap III, Limited Liability CompanyInventor: Gerald George Pechanek
-
Patent number: 7509482Abstract: A memory device stores entries waiting to be processed. Row numbers of matrix information correspond to storage positions within the memory device, column numbers correspond to positions within the order of the entries, and every matrix element corresponding to the storage position and the position within the order of the entry stored in this storage position has a predetermined value. An operation between the first vector information indicating storage positions of processable entries and each column of the matrix information is performed and the second vector information indicating positions within the order of the processable entries is generated. Then, a position to be processed is selected from among the positions of processable entries indicated by the second vector information, an element having the predetermined value in the column corresponding to the selected position is obtained, and an entry in the storage position corresponding to the element is processed.Type: GrantFiled: June 16, 2006Date of Patent: March 24, 2009Assignee: Fujitsu LimitedInventors: Takuji Takahashi, Masahiro Kuramoto
-
Publication number: 20090070559Abstract: A data processing circuit comprises a register file (14) having read ports and write ports. A plurality of functional units (21 a-c), is coupled to receive operand data from a same combination of read ports. Each functional unit is coupled to a respective one of the write ports for writing a respective result. An instruction issue slot has outputs (11) for supplying register selection information to said combination read ports and to the respective ones of the write ports. The output of the issue slot also supplies an operation code. The functional units (21 a-c) in the plurality are arranged to respond to at least to one value of the operation code by each executing a respective operation using the same operands from said same combination and each functional unit producing a respective result at a respective ones of the write ports.Type: ApplicationFiled: September 21, 2005Publication date: March 12, 2009Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.Inventor: Antonius Adrianus Maria Van Wel
-
Publication number: 20090049279Abstract: The present invention provides a network multithreaded processor, such as a network processor, including a thread interleaver that implements fine-grained thread decisions to avoid underutilization of instruction execution resources in spite of large communication latencies. In an upper pipeline, an instruction unit determines an-instruction fetch sequence responsive to an instruction queue depth on a per thread basis. In a lower pipeline, a thread interleaver determines a thread interleave sequence responsive to thread conditions including thread latency conditions. The thread interleaver selects threads using a two-level round robin arbitration. Thread latency signals are active responsive to thread latencies such as thread stalls, cache misses, and interlocks. During the subsequent one or more clock cycles, the thread is ineligible for arbitration. In one embodiment, other thread conditions affect selection decisions such as local priority, global stalls, and late stalls.Type: ApplicationFiled: April 14, 2008Publication date: February 19, 2009Inventors: Donald E. Steiss, Earl T. Cohen, John J. Williams
-
Publication number: 20090031114Abstract: To guarantee response time while strictly maintaining the priority specified by software, a processor (1) which is a multithread processor having a thread multiplexer (10), and an issue information buffer (ISINF). An instruction code, and issue information (isid) for instructions issued at and after the next operating cycle which is added to the instruction code, are supplied to the thread multiplexer. The issue information is valid from the second and subsequent instruction flows, and is saved temporarily in an issue information buffer. This issue information is for example the position of an operating cycle which can issue a high priority instruction, i.e., information showing a slot. The thread multiplexer issues a low priority instruction at another operating cycle at which a high priority instruction is not issued according to the issue information.Type: ApplicationFiled: September 30, 2008Publication date: January 29, 2009Inventor: FUMIO ARAKAWA
-
Patent number: 7475223Abstract: An improved method, apparatus, and computer instructions for grouping instructions. A set of instructions is received for placement into an instruction cache in the data processing system. Instructions in the set of instructions are grouped into a dispatch grouping of instructions prior to the set of instructions being placed in the instruction cache.Type: GrantFiled: February 3, 2005Date of Patent: January 6, 2009Assignee: International Business Machines CorporationInventors: Brian R. Konigsburg, Hung Qui Le, David Stephen Levitan, John Wesley Ward, III
-
Publication number: 20090006816Abstract: A VLIW processor has a hierarchy of functional unit clusters that communicate through explicit control in the instruction stream and store data in register files at each level of the hierarchy. Explicit instructions transfer values between sub-clusters through a cluster level switch network. Transfer instructions issue in dedicated instruction issue slots in parallel with instructions that perform computation in functional units. The switch network can perform permutations on the data being moved. The switch network enables for operands to be broadcast between the sub-clusters, global register file and memory.Type: ApplicationFiled: June 27, 2007Publication date: January 1, 2009Inventors: David J. Hoyle, Amitabh Menon
-
Patent number: 7472257Abstract: Processor (100) has a plurality of registers (120) for storing instructions for execution by the plurality of execution units (160). The plurality of registers (120) are coupled to the plurality of execution units (160) via distribution means (140). Distribution means (140) have a plurality of dispatch units (144) coupled to the plurality of execution units (160) and a reroutable network, e.g. a data communication bus (142), coupling the plurality of execution units (120) to the plurality of dispatch units (144). The data communication bus (142) is controlled by control unit (148). Dispatch units (144) are arranged to detect dedicated instructions in the instruction flow, which signal the beginning of an inactive period of an execution unit (160a, 160b, 160c, 160d) in the plurality of execution units (160).Type: GrantFiled: November 20, 2002Date of Patent: December 30, 2008Assignee: NXP B.V.Inventor: Francesco Pessolano
-
Publication number: 20080313433Abstract: A processor is disclosed including several features allowing the processor to simultaneously execute instructions of multiple conditional execution instruction groups. Each conditional execution instruction group includes a conditional execution instruction and a code block specified by the conditional execution instruction. In one embodiment, the processor includes multiple registers for storing marking data pertaining to a number of instructions in each of multiple execution pipeline stages. In another embodiment, the processor includes write enable logic and an execution unit. The write enable logic produces write enable signals dependent upon received attributes, and the execution unit saves results of instructions of conditional execution instruction groups dependent upon the write enable signals.Type: ApplicationFiled: August 21, 2008Publication date: December 18, 2008Applicant: VeriSilicon Holdings (Cayman Islands) Co. Ltd.Inventors: Hung Nguyen, Shannon Wichman
-
Patent number: 7457938Abstract: In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.Type: GrantFiled: September 30, 2005Date of Patent: November 25, 2008Assignee: Intel CorporationInventors: Stephan Jourdan, Avinash Sodani, Michael Fetterman, Per Hammarlund, Ronak Singhal, Glenn Hinton
-
Publication number: 20080288745Abstract: A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more predicate values corresponding to any detected conflict between the memory addresses, where a given predicate value indicates elements in at least the portion of the vector that can be processed in parallel. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more predicate values.Type: ApplicationFiled: July 11, 2008Publication date: November 20, 2008Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Patent number: 7454597Abstract: A processor core and method of executing instructions, both of which utilizes schedules, are presented. Each of the schedules includes a sequence of instructions, an address of a first of the instructions in the schedule, an order vector of an original order of the instructions in the schedule, a rename map of registers for each register in the schedule, and a list of register names used in the schedule. The schedule exploits instruction-level parallelism in executing out-of-order instructions. The processor core includes a schedule cache that is configured to store schedules, a shared cache configured to store both I-side and D-side cache data, and an execution resource for requesting a schedule to be executed from the schedule cache. The processor core further includes a scheduler disposed between the schedule cache and the cache.Type: GrantFiled: January 2, 2007Date of Patent: November 18, 2008Assignee: International Business Machines CorporationInventors: Krishnan K. Kailas, Ravi Nair, Sumedh W. Sathaye, Wolfram Sauer, John-David Wellman
-
Patent number: 7447879Abstract: A method and apparatus for minimizing unscheduled D-cache miss pipeline stalls is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group is a load instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is not delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 9, 2006Date of Patent: November 4, 2008Assignee: International Business Machines CorporationInventor: David A. Luick
-
Patent number: 7447887Abstract: To guarantee response time while strictly maintaining the priority specified by software, a processor (1) which is a multithread processor having a thread multiplexer (10), and an issue information buffer (ISINF). An instruction code, and issue information (isid) for instructions issued at and after the next operating cycle which is added to the instruction code, are supplied to the thread multiplexer. The issue information is valid from the second and subsequent instruction flows, and is saved temporarily in an issue information buffer. This issue information is for example the position of an operating cycle which can issue a high priority instruction, i.e., information showing a slot. The thread multiplexer issues a low priority instruction at another operating cycle at which a high priority instruction is not issued according to the issue information.Type: GrantFiled: July 7, 2006Date of Patent: November 4, 2008Assignee: Hitachi, Ltd.Inventor: Fumio Arakawa
-
Publication number: 20080263330Abstract: A processor has an interface portion and an internal environment. The interface portion comprises at least one port. The internal environment comprises an execution unit arranged to execute instructions in dependence on a first timing signal and to transfer data between the interior portion and the at least one port in dependence on the first timing signal; and a thread scheduler for scheduling a plurality of threads for execution by the execution unit, each thread comprising a sequence of instructions and the thread scheduler being arranged to schedule the threads in dependence on the first timing signal. The port is arranged to transfer data between the port and an external environment in dependence on a second timing signal, and to alter a ready signal in dependence on the second timing signal to indicate a transfer of data with the external environment. The thread scheduler is configured to schedule one or more associated threads for execution in dependence on the ready signal.Type: ApplicationFiled: April 17, 2007Publication date: October 23, 2008Inventors: Michael David May, Peter Hedinger, Alastair Dixon
-
Patent number: 7441098Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.Type: GrantFiled: May 6, 2005Date of Patent: October 21, 2008Assignee: Broadcom CorporationInventor: Sophie Wilson
-
Patent number: 7437544Abstract: A data processing apparatus and method are provided for executing a sequence of instructions including at least one multiple iteration instruction. The data processing apparatus comprises an instruction store for storing the sequence of instructions, and a processing unit for executing the sequence of instructions, the processing unit comprising at least a first processing path and a second processing path to enable at least two instructions of the sequence to be executed in parallel. When executing instructions in parallel, the first processing path executes an instruction which is earlier in the sequence than the instruction executing in the second processing path. The processing unit is operable when executing a multiple iteration instruction to allow a first iteration of the multiple iteration instruction to be executed in either the first processing path or the second processing path, but to cause all remaining iterations of the multiple iteration instruction to be executed in the first processing path.Type: GrantFiled: April 29, 2005Date of Patent: October 14, 2008Assignee: ARM LimitedInventors: Ann Sekli Chin, David James Williamson
-
Patent number: 7430651Abstract: A tag monitoring system for assigning tags to instructions. A source supplies instructions to be executed by a functional unit. A register file stores information required for the execution of each instruction. A queue having a plurality of slots containing tags which are used for tagging the instructions. The tags are arranged in the queue in an order specified by the program order of their corresponding instructions. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file stores an instruction's information at a location in the register file defined by the tag assigned to that instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.Type: GrantFiled: January 25, 2006Date of Patent: September 30, 2008Assignee: Seiko-Epson CorporationInventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
-
Patent number: 7430653Abstract: A pipelined instruction dispatch or grouping circuit allows instruction dispatch decisions to be made over multiple processor cycles. In one embodiment, the grouping circuit performs resource allocation and data dependency checks on an instruction group, based on a state vector which includes representation of source and destination registers of instructions within said instruction group and corresponding state vectors for instruction groups of a number of preceding processor cycles.Type: GrantFiled: August 2, 1999Date of Patent: September 30, 2008Assignee: Sun Microsystems, Inc.Inventor: Marc Tremblay
-
Patent number: 7430643Abstract: The present invention provides a method and apparatus for increased efficiency for translation lookaside buffers by collapsing redundant translation table entries into a single translation table entry (TTE). In the present invention, each thread of a multithreaded processor is provided with multiple context registers. Each of these context registers is compared independently to the context of the TTE. If any of the contexts match (and the other match conditions are satisfied), then the translation is allowed to proceed. Two applications attempting to share one page but that still keep separate pages can then employ three total contexts. One context is for one application's private use; one of the contexts is for the other application's private use; and a third context is for the shared page. In one embodiment of the invention, two contexts are implemented per thread. However, the teachings of the present invention can be extended to a higher number of contexts per thread.Type: GrantFiled: December 30, 2004Date of Patent: September 30, 2008Assignee: Sun Microsystems, Inc.Inventors: Paul J. Jordan, William J. Kucharski, Roman M. Zajcew, Ashley N. Saulsbury, Quinn A. Jacobson
-
Publication number: 20080222336Abstract: To allow to use arithmetic circuits of sharable resources by priority with a simple procedure. In a data processing system including central processing units and a plurality of arithmetic circuits, wherein the central processing units are able to supply a command to one arithmetic circuit based on one fetched instruction and supply a command to other arithmetic circuit based on other fetched instruction, a memory circuit is provided which is used to store first information indicating which arithmetic circuit is executing a command, and second information indicating which central processing unit has reserved the arithmetic circuit for execution of the next command. When the arithmetic circuit is already executing a command, reservation of the arithmetic circuit for execution of the next command using the second information of the memory circuit, makes it possible, after the execution, to assign operation commands fast to the arithmetic circuits and cause them to execute the commands.Type: ApplicationFiled: January 14, 2008Publication date: September 11, 2008Inventors: Yoshikazu KIYOSHIGE, Shunichi Iwata, Kesami Hagiwara, Akihiko Tomita
-
Patent number: 7421571Abstract: A multi-threaded processor is provided. The multi-threading processor includes a first instruction fetch unit and a second instruction fetch unit. A multi-thread scheduler unit is coupled to the first instruction fetch unit and the second instruction fetch unit. An execution unit, which executes a first active thread and a second active thread is coupled to the scheduler unit. The multi-threading processor also includes a register file coupled to the execution unit. The register file switches one of the first active thread and the second active threads with a first inactive thread.Type: GrantFiled: August 25, 2006Date of Patent: September 2, 2008Assignee: Intel CorporationInventor: Ken Shoemaker
-
Patent number: RE41012Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibility in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.Type: GrantFiled: June 3, 2004Date of Patent: November 24, 2009Assignee: Altera CorporationInventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand