Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)

Across-thread out-of-order instruction dispatch in a multithreaded microprocessor

Patent number: 7676657

Abstract: Instruction dispatch in a multithreaded microprocessor such as a graphics processor is not constrained by an order among the threads. Instructions for each thread are fetched, and a dispatch circuit determines which instructions in the buffer are ready to execute. The dispatch circuit may issue any ready instruction for execution, and an instruction from one thread may be issued prior to an instruction from another thread regardless of which instruction was fetched first. If multiple functional units are available, multiple instructions can be dispatched in parallel.

Type: Grant

Filed: October 10, 2006

Date of Patent: March 9, 2010

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett Coon, Simon S. Moy
Base station software for multi-user detection uplinks and downlinks and method thereof

Patent number: 7672409

Abstract: A method of multi-user detection in a given uplink and downlink time slot in a software-defined receiver which includes filtering and sampling a received signal; forming a block-banded matrix A of the sampled signals; and solving {circumflex over (d)}=T?1y, where T=(AHA), y=AHx. The methods of solving for the matrix T includes a) computing Cholesky factors of the matrix T by approximating using the block-banded property of the matrix T and A; b) Schur decomposition for Cholesky factors of the matrix T and approximating the lower triangular Cholesky factor matrix R using block Toeplitz property of matrix T; or c) Fourier Transformation.

Type: Grant

Filed: July 15, 2005

Date of Patent: March 2, 2010

Assignee: Sandbridge Technologies, Inc.

Inventor: Sanyogita Shamsunder
Method and apparatus for back to back issue of dependent instructions in an out of order issue queue

Patent number: 7669038

Abstract: A method is provided for evaluating two or more instructions in an out of order issue queue during a particular cycle of the queue, to select an instruction for issue during the next following cycle. If an instruction was previously designated to issue during the particular cycle, one or more instructions in the queue are evaluated to determine if any of them are dependent on the designated instruction. For the evaluation, each instruction placed into the queue is accompanied by corresponding logic elements that provide destination to source compares for the instruction. In an embodiment comprising a method, the oldest ready instruction in the queue during a particular cycle is identified.

Type: Grant

Filed: May 2, 2008

Date of Patent: February 23, 2010

Assignee: International Business Machines Corporation

Inventors: William Elton Burky, Raymond Cheung Yeung
Data processing apparatus with parallel operating functional units

Patent number: 7664929

Abstract: A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.

Type: Grant

Filed: September 17, 2003

Date of Patent: February 16, 2010

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Carlos Antonio Alba Pinto, Ramanathan Sethuraman, Srinivasan Balakrishnan, Harm Johannes Antonius Maria Peters, Rafael Peset Llopis
Multithreading instruction scheduler employing thread group priorities

Patent number: 7660969

Abstract: A concurrent instruction dispatch apparatus includes a group indicator for each of a plurality of threads that indicates which one of a plurality of groups of the threads the thread belongs to. A group priority indicator for each group indicates an instruction dispatch priority relative to the other groups. Selection logic selects a thread for dispatching an instruction thereof based on the group and group priority indicators. A bifurcated scheduler includes first scheduler logic that issues instructions of the threads to an execution unit, second scheduler logic that enforces a thread scheduling policy, and an interface. A group indicator indicates which group each thread belongs to, a priority for each group, and execution information for each thread. The first scheduler logic issues the instructions based on the group priorities and group indicators, and the second scheduler logic updates the group indicators based on the instruction execution information.

Type: Grant

Filed: January 5, 2007

Date of Patent: February 9, 2010

Assignee: MIPS Technologies, Inc.

Inventors: Michael Gottlieb Jensen, Ryan C. Kinter
Texture resampling with a processor

Patent number: 7656412

Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.

Type: Grant

Filed: December 21, 2005

Date of Patent: February 2, 2010

Assignee: Microsoft Corporation

Inventors: Denis Demandolx, Steven White
Replay reduction for power saving

Patent number: 7647518

Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledge indication is asserted that corresponds to an identified replay case of the subset.

Type: Grant

Filed: October 10, 2006

Date of Patent: January 12, 2010

Assignee: Apple Inc.

Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
Fetch and dispatch disassociation apparatus for multistreaming processors

Patent number: 7636836

Abstract: A dynamic multistreaming processor has instruction queues, each instruction queue corresponding to an instruction stream, and execution units. The dynamic multistreaming processor also has a dispatch stage to select at least one instruction from one of the instruction queues and to dispatch the selected at least one instruction to one of the execution units. Lastly the dynamic multistreaming processor has a queue counter, associated with each instruction queue, for indicating the number of instructions in each queue, and a fetch counter, associated with each instruction queue, for indicating an address from which to obtain instructions when the associated instruction queue is not full. The dynamic multistreaming processor might also have fetch counters for indicating a next instruction address from which to obtain at least one instruction when the associated instruction queue is not full. The dynamic multistreaming processor could also have a second counter for indicating a next instruction address.

Type: Grant

Filed: July 15, 2008

Date of Patent: December 22, 2009

Assignee: MIPS Technologies, Inc.

Inventors: Mario D. Nemirovsky, Adolfo M. Nemirovsky, Narendra Sankar, Enrique Musoll
Process for running programs with selectable instruction length processors and corresponding processor system

Patent number: 7617494

Abstract: The program to be executed is compiled by translating it into native instructions of the instruction-set architecture of the processor system, organizing the instructions deriving from the translation of the program into respective bundles in an order of successive bundles, each bundle grouping together instructions adapted to be executed in parallel by the processor system. The bundles of instructions are ordered into respective sub-bundles, said sub-bundles identifying a first set of instructions, which must be executed before the instructions belonging to the next bundle of said order, and a second set of instructions, which can be executed both before and in parallel with respect to the instructions belonging to said subsequent bundle of said order.

Type: Grant

Filed: July 1, 2003

Date of Patent: November 10, 2009

Assignee: STMicroelectronics S.r.l.

Inventors: Fabrizio Simone Rovati, Antonio Maria Borneo, Danilo Pietro Pau
Dynamic allocation of a buffer across multiple clients in multi-threaded processor without performing a complete flush of data associated with allocation

Patent number: 7603544

Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.

Type: Grant

Filed: September 12, 2005

Date of Patent: October 13, 2009

Assignee: Intel Corporation

Inventor: Thomas A. Piazza
Methods and apparatus of an architecture supporting execution of instructions in parallel

Patent number: 7600221

Abstract: A processing architecture supports executing instructions in parallel after identifying at least one level of dependency associated with a set of traces within a segment of code. Each trace represents a sequence of logical instructions within the segment of code that can be executed in a corresponding operand stack. Scheduling information is generated based on a dependency order identified among the set of traces. Thus, multiple traces may be scheduled for parallel execution unless a dependency order indicates that a second trace is dependent upon a first trace. In this instance, the first trace is executed prior to the second trace. Trace dependencies may be identified at run-time as well as prior to execution of traces in parallel. Results associated with execution of a trace are stored in a temporary buffer (instead of memory) until after it is known that a data dependency was not detected at run-time.

Type: Grant

Filed: October 6, 2003

Date of Patent: October 6, 2009

Assignee: Sun Microsystems, Inc.

Inventor: Achutha Raman Rangachari
D-cache miss prediction and scheduling

Patent number: 7594078

Abstract: A method and apparatus for D-cache miss prediction and scheduling is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group resulted in a cache miss during a previous execution of the first instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.

Type: Grant

Filed: February 9, 2006

Date of Patent: September 22, 2009

Assignee: International Business Machines Corporation

Inventor: David A. Luick
PROCESSOR

Publication number: 20090228687

Abstract: A processor includes: an instruction buffer which holds a group of instructions that can be executed in parallel; an instruction decoding unit which decodes part or all of the group of instructions; and an instruction issuance control unit which detects whether or not a factor obstructing simultaneous execution of the group of instructions exists in the group of instructions and supplies the group of instructions to the instruction decoding unit by controlling the instruction buffer so that the instructions of the group of instructions are sequentially supplied when the factor exists and all the instructions of the group of instructions are simultaneously supplied when the factor does not exist.

Type: Application

Filed: March 9, 2006

Publication date: September 10, 2009

Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.

Inventor: Tetsu Hosoki
System and Method for Handling Load and/or Store Operations in a Superscalar Microprocessor

Publication number: 20090217001

Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.

Type: Application

Filed: May 6, 2009

Publication date: August 27, 2009

Inventors: Cheryl D. Senter, Johannes Wang
METHOD AND SYSTEM FOR OVERLAPPING EXECUTION OF INSTRUCTIONS THROUGH NON-UNIFORM EXECUTION PIPELINES IN AN IN-ORDER PROCESSOR

Publication number: 20090210656

Abstract: A system and method for overlapping execution (OE) of instructions through non-uniform execution pipelines in an in-order processor are provided. The system includes a first execution unit to perform instruction execution in a first execution pipeline. The system also includes a second execution unit to perform instruction execution in a second execution pipeline, where the second execution pipeline includes a greater number of stages than the first execution pipeline. The system further includes an instruction dispatch unit (IDU), the IDU including OE registers and logic for dispatching an OE-capable instruction to the first execution unit such that the instruction completes execution prior to completing execution of a previously dispatched instruction to the second execution unit. The system additionally includes a latch to hold a result of the execution of the OE-capable instruction until after the second execution unit completes the execution of the previously dispatched instruction.

Type: Application

Filed: February 20, 2008

Publication date: August 20, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David S. Hutton, Khary J. Alexander, Fadi Y. Busaba, Bruce C. Giamei, John G. Rell, JR., Eric M. Schwarz, Chung-Lung Kevin Shum
System and Method for Optimization Within a Group Priority Issue Schema for a Cascaded Pipeline

Publication number: 20090210667

Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.

Type: Application

Filed: February 19, 2008

Publication date: August 20, 2009

Inventor: David A. Luick
System and Method for a Group Priority Issue Schema for a Cascaded Pipeline

Publication number: 20090210665

Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. The method, among others, can be broadly summarized by the following steps: receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit.

Type: Application

Filed: February 19, 2008

Publication date: August 20, 2009

Inventors: Jeffrey P. Bradford, David A. Luick
System and Method for Resolving Issue Conflicts of Load Instructions

Publication number: 20090210666

Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: receive an issue group of instructions; determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; determine if there is a issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one load instruction in a different execution pipeline; and schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.

Type: Application

Filed: February 19, 2008

Publication date: August 20, 2009

Inventor: David A. Luick
Scalar Processor Instruction Level Parallelism (ILP) Coupled Pair Morph Mechanism

Publication number: 20090204792

Abstract: Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times. Separate processor cores may be morphed to appear differently for different applications. For example, two processor cores each capable of executing N-wide issue groups of instructions may be morphed to appear as a single processor core capable of executing 2N-wide issue groups.

Type: Application

Filed: February 13, 2008

Publication date: August 13, 2009

Inventor: David A. Luick
Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors

Patent number: 7571301

Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

Type: Grant

Filed: March 31, 2006

Date of Patent: August 4, 2009

Assignee: Intel Corporation

Inventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee
Processing Unit Incorporating Multirate Execution Unit

Publication number: 20090182987

Abstract: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.

Type: Application

Filed: January 11, 2008

Publication date: July 16, 2009

Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
APPARATUS, SYSTEM, AND METHOD FOR DISCONTIGUOUS MULTIPLE ISSUE OF INSTRUCTIONS

Publication number: 20090177868

Abstract: An apparatus, system, and method are disclosed for discontiguous multiple issue of instructions. An assignment unit assigns a plurality of instruction blocks to a plurality of issue units. The plurality of issue units each comprises a renaming map that maps each architecturally visible register address to a rename register. Each issue unit maps each architecturally visible register in the decoded instruction to a register placeholder if the renaming map entry for that architecturally visible register is invalid else maps the architecturally visible register in the decoded instruction to a rename register if the rename register entry is valid. Each issue unit further receives predecessor mapping information from the renaming map of the issue unit's predecessor issue unit in response to the assignment unit identifying a relationship with the predecessor issue unit and the final mapping information being available from the predecessor issue unit.

Type: Application

Filed: January 3, 2008

Publication date: July 9, 2009

Inventor: Russell Lee Lewis
PROCESSING PIPELINE HAVING PARALLEL DISPATCH AND METHOD THEREOF

Publication number: 20090172359

Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.

Type: Application

Filed: December 31, 2007

Publication date: July 2, 2009

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Gene Shen, Sean Lie
System and Method for Performing Locked Operations

Publication number: 20090164758

Abstract: A mechanism for performing locked operations in a processing unit. A dispatch unit may dispatch a plurality of instructions including a locked instruction and a plurality of non-locked instructions. One or more of the non-locked instructions may be dispatched before and after the locked instruction. An execution unit may execute the plurality of instructions including the non-locked and locked instruction. A retirement unit may retire the locked instruction after execution of the locked instruction. During retirement, the processing unit may begin enforcing a previously obtained exclusive ownership of a cache line accessed by the locked instruction. Furthermore, the processing unit may stall the retirement of the one or more non-locked instructions dispatched after the locked instruction until after the writeback operation for the locked instruction is completed.

Type: Application

Filed: December 20, 2007

Publication date: June 25, 2009

Inventor: Michael J. Haertel
Method and Apparatus for Executing Instructions

Publication number: 20090113181

Abstract: A method and apparatus for executing instructions in a processor are provided. In one embodiment of the invention, the method includes receiving a plurality of instructions. The plurality of instructions includes first instructions in a first thread and second instructions in a second thread. The method further includes forming a common issue group including an instruction of a first instruction type and an instruction of a second instruction type. The method also includes issuing the common issue group to a first execution unit and a second execution unit. The instruction of the first instruction type is issued to the first execution unit and the instruction of the second instruction type is issued to the second execution unit.

Type: Application

Filed: October 24, 2007

Publication date: April 30, 2009

Inventors: Miguel Comparan, Brent Francis Hilgart, Brian Lee Koehler, Eric Oliver Mejdrich, Adam James Muff, Alfred Thomas Watson, III
System and Method for Implementing a Software-Supported Thread Assist Mechanism for a Microprocessor

Publication number: 20090106534

Abstract: A system and computer-implementable method for implementing software-supported thread assist within a data processing system, wherein the data processing system supports processing instructions within at least a first thread and a second thread. An instruction dispatch unit (IDU) places the first thread into a sleep mode. The IDU separates an instruction stream for the second thread into at least a first independent instruction stream and a second independent instruction stream. The first independent instruction stream is processed utilizing facilities allocated to the first thread and the second independent instruction stream is processed utilizing facilities allocated to the second thread.

Type: Application

Filed: October 23, 2007

Publication date: April 23, 2009

Inventors: Hung Q. Le, Dung Q. Nguyen
Apparatus and method of avoiding bank conflict in single-port multi-bank memory system

Publication number: 20090089551

Abstract: Provided are a method and apparatus for avoiding bank conflict. A first instruction that is one of access instructions that are predicted to cause the bank conflict is replaced with a second instruction by changing an execute timing of the first instruction to a timing prior to the execute timing of the first instruction so as for the access instructions not to cause the bank conflict. Next, a load/store unit that is scheduled to access the bank according to the first instruction accesses the bank and reads out a data from the bank at an execute timing of the second instruction, and after that, the load/store unit is allowed to be inputted the read data at the execute timing of the first instruction. Accordingly, although the access instructions that are predicted to cause the bank conflict are allocated to the load/store units, the bank conflict can be prevented, so that it is possible to avoid deterioration in performance due the occurrence of the bank conflict.

Type: Application

Filed: February 27, 2008

Publication date: April 2, 2009

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Kyoung-june Min, Chan-min Park, Suk-jin Kim, Won-jong Lee, Kwon-taek Kwon, Hee-seok Kim
Methods and apparatus for meta-architecture defined programmable instruction fetch functions supporting assembled variable length instruction processors

Patent number: 7509483

Abstract: A computing architecture and software techniques are described which modifies the basic sequential instruction fetching mechanism of a processor by separating a program's control flow from its functional execution flow. A compiled sequential HLL program's static control structures are analyzed and a separate program based on its own unique instructions is created that primarily generates addresses for the selection of functional execution instructions. The original program is now represented by an instruction fetch program and a set of function/logic execution instructions. This basic split allows multiple instruction addresses to be generated in parallel to access multiple instruction memories. These multiple instruction memories contain only the function/logic instructions of the program and no control structure operations such as branches or calls. All the original program's control instructions are split from the original program and used to create the instruction addressing program.

Type: Grant

Filed: February 22, 2007

Date of Patent: March 24, 2009

Assignee: Renesky Tap III, Limited Liability Company

Inventor: Gerald George Pechanek
Orderly processing ready entries from non-sequentially stored entries using arrival order matrix reordered upon removal of processed entries

Patent number: 7509482

Abstract: A memory device stores entries waiting to be processed. Row numbers of matrix information correspond to storage positions within the memory device, column numbers correspond to positions within the order of the entries, and every matrix element corresponding to the storage position and the position within the order of the entry stored in this storage position has a predetermined value. An operation between the first vector information indicating storage positions of processable entries and each column of the matrix information is performed and the second vector information indicating positions within the order of the processable entries is generated. Then, a position to be processed is selected from among the positions of processable entries indicated by the second vector information, an element having the predetermined value in the column corresponding to the selected position is obtained, and an entry in the storage position corresponding to the element is processed.

Type: Grant

Filed: June 16, 2006

Date of Patent: March 24, 2009

Assignee: Fujitsu Limited

Inventors: Takuji Takahashi, Masahiro Kuramoto
DATA PROCESSING CIRCUIT WHEREIN FUNCTIONAL UNITS SHARE READ PORTS

Publication number: 20090070559

Abstract: A data processing circuit comprises a register file (14) having read ports and write ports. A plurality of functional units (21 a-c), is coupled to receive operand data from a same combination of read ports. Each functional unit is coupled to a respective one of the write ports for writing a respective result. An instruction issue slot has outputs (11) for supplying register selection information to said combination read ports and to the respective ones of the write ports. The output of the issue slot also supplies an operation code. The functional units (21 a-c) in the plurality are arranged to respond to at least to one value of the operation code by each executing a respective operation using the same operands from said same combination and each functional unit producing a respective result at a respective ones of the write ports.

Type: Application

Filed: September 21, 2005

Publication date: March 12, 2009

Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.

Inventor: Antonius Adrianus Maria Van Wel
THREAD INTERLEAVING IN A MULTITHREADED EMBEDDED PROCESSOR

Publication number: 20090049279

Abstract: The present invention provides a network multithreaded processor, such as a network processor, including a thread interleaver that implements fine-grained thread decisions to avoid underutilization of instruction execution resources in spite of large communication latencies. In an upper pipeline, an instruction unit determines an-instruction fetch sequence responsive to an instruction queue depth on a per thread basis. In a lower pipeline, a thread interleaver determines a thread interleave sequence responsive to thread conditions including thread latency conditions. The thread interleaver selects threads using a two-level round robin arbitration. Thread latency signals are active responsive to thread latencies such as thread stalls, cache misses, and interlocks. During the subsequent one or more clock cycles, the thread is ineligible for arbitration. In one embodiment, other thread conditions affect selection decisions such as local priority, global stalls, and late stalls.

Type: Application

Filed: April 14, 2008

Publication date: February 19, 2009

Inventors: Donald E. Steiss, Earl T. Cohen, John J. Williams
MULTITHREAD PROCESSOR

Publication number: 20090031114

Abstract: To guarantee response time while strictly maintaining the priority specified by software, a processor (1) which is a multithread processor having a thread multiplexer (10), and an issue information buffer (ISINF). An instruction code, and issue information (isid) for instructions issued at and after the next operating cycle which is added to the instruction code, are supplied to the thread multiplexer. The issue information is valid from the second and subsequent instruction flows, and is saved temporarily in an issue information buffer. This issue information is for example the position of an operating cycle which can issue a high priority instruction, i.e., information showing a slot. The thread multiplexer issues a low priority instruction at another operating cycle at which a high priority instruction is not issued according to the issue information.

Type: Application

Filed: September 30, 2008

Publication date: January 29, 2009

Inventor: FUMIO ARAKAWA
Fetch-side instruction dispatch group formation

Patent number: 7475223

Abstract: An improved method, apparatus, and computer instructions for grouping instructions. A set of instructions is received for placement into an instruction cache in the data processing system. Instructions in the set of instructions are grouped into a dispatch grouping of instructions prior to the set of instructions being placed in the instruction cache.

Type: Grant

Filed: February 3, 2005

Date of Patent: January 6, 2009

Assignee: International Business Machines Corporation

Inventors: Brian R. Konigsburg, Hung Qui Le, David Stephen Levitan, John Wesley Ward, III
Inter-Cluster Communication Network And Heirarchical Register Files For Clustered VLIW Processors

Publication number: 20090006816

Abstract: A VLIW processor has a hierarchy of functional unit clusters that communicate through explicit control in the instruction stream and store data in register files at each level of the hierarchy. Explicit instructions transfer values between sub-clusters through a cluster level switch network. Transfer instructions issue in dedicated instruction issue slots in parallel with instructions that perform computation in functional units. The switch network can perform permutations on the data being moved. The switch network enables for operands to be broadcast between the sub-clusters, global register file and memory.

Type: Application

Filed: June 27, 2007

Publication date: January 1, 2009

Inventors: David J. Hoyle, Amitabh Menon
Rerouting VLIW instructions to accommodate execution units deactivated upon detection by dispatch units of dedicated instruction alerting multiple successive removed NOPs

Patent number: 7472257

Abstract: Processor (100) has a plurality of registers (120) for storing instructions for execution by the plurality of execution units (160). The plurality of registers (120) are coupled to the plurality of execution units (160) via distribution means (140). Distribution means (140) have a plurality of dispatch units (144) coupled to the plurality of execution units (160) and a reroutable network, e.g. a data communication bus (142), coupling the plurality of execution units (120) to the plurality of dispatch units (144). The data communication bus (142) is controlled by control unit (148). Dispatch units (144) are arranged to detect dedicated instructions in the instruction flow, which signal the beginning of an inactive period of an execution unit (160a, 160b, 160c, 160d) in the plurality of execution units (160).

Type: Grant

Filed: November 20, 2002

Date of Patent: December 30, 2008

Assignee: NXP B.V.

Inventor: Francesco Pessolano
PROCESSOR FOR SIMULTANEOUSLY EXECUTING MULTIPLE CONDITIONAL EXECUTION INSTRUCTION GROUPS

Publication number: 20080313433

Abstract: A processor is disclosed including several features allowing the processor to simultaneously execute instructions of multiple conditional execution instruction groups. Each conditional execution instruction group includes a conditional execution instruction and a code block specified by the conditional execution instruction. In one embodiment, the processor includes multiple registers for storing marking data pertaining to a number of instructions in each of multiple execution pipeline stages. In another embodiment, the processor includes write enable logic and an execution unit. The write enable logic produces write enable signals dependent upon received attributes, and the execution unit saves results of instructions of conditional execution instruction groups dependent upon the write enable signals.

Type: Application

Filed: August 21, 2008

Publication date: December 18, 2008

Applicant: VeriSilicon Holdings (Cayman Islands) Co. Ltd.

Inventors: Hung Nguyen, Shannon Wichman
Staggered execution stack for vector processing

Patent number: 7457938

Abstract: In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.

Type: Grant

Filed: September 30, 2005

Date of Patent: November 25, 2008

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Avinash Sodani, Michael Fetterman, Per Hammarlund, Ronak Singhal, Glenn Hinton
GENERATING PREDICATE VALUES DURING VECTOR PROCESSING

Publication number: 20080288745

Abstract: A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more predicate values corresponding to any detected conflict between the memory addresses, where a given predicate value indicates elements in at least the portion of the vector that can be processed in parallel. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more predicate values.

Type: Application

Filed: July 11, 2008

Publication date: November 20, 2008

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Computer processing system employing an instruction schedule cache

Patent number: 7454597

Abstract: A processor core and method of executing instructions, both of which utilizes schedules, are presented. Each of the schedules includes a sequence of instructions, an address of a first of the instructions in the schedule, an order vector of an original order of the instructions in the schedule, a rename map of registers for each register in the schedule, and a list of register names used in the schedule. The schedule exploits instruction-level parallelism in executing out-of-order instructions. The processor core includes a schedule cache that is configured to store schedules, a shared cache configured to store both I-side and D-side cache data, and an execution resource for requesting a schedule to be executed from the schedule cache. The processor core further includes a scheduler disposed between the schedule cache and the cache.

Type: Grant

Filed: January 2, 2007

Date of Patent: November 18, 2008

Assignee: International Business Machines Corporation

Inventors: Krishnan K. Kailas, Ravi Nair, Sumedh W. Sathaye, Wolfram Sauer, John-David Wellman
Scheduling instructions in a cascaded delayed execution pipeline to minimize pipeline stalls caused by a cache miss

Patent number: 7447879

Abstract: A method and apparatus for minimizing unscheduled D-cache miss pipeline stalls is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group is a load instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is not delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.

Type: Grant

Filed: February 9, 2006

Date of Patent: November 4, 2008

Assignee: International Business Machines Corporation

Inventor: David A. Luick
Multithread processor

Patent number: 7447887

Abstract: To guarantee response time while strictly maintaining the priority specified by software, a processor (1) which is a multithread processor having a thread multiplexer (10), and an issue information buffer (ISINF). An instruction code, and issue information (isid) for instructions issued at and after the next operating cycle which is added to the instruction code, are supplied to the thread multiplexer. The issue information is valid from the second and subsequent instruction flows, and is saved temporarily in an issue information buffer. This issue information is for example the position of an operating cycle which can issue a high priority instruction, i.e., information showing a slot. The thread multiplexer issues a low priority instruction at another operating cycle at which a high priority instruction is not issued according to the issue information.

Type: Grant

Filed: July 7, 2006

Date of Patent: November 4, 2008

Assignee: Hitachi, Ltd.

Inventor: Fumio Arakawa
Clocked ports

Publication number: 20080263330

Abstract: A processor has an interface portion and an internal environment. The interface portion comprises at least one port. The internal environment comprises an execution unit arranged to execute instructions in dependence on a first timing signal and to transfer data between the interior portion and the at least one port in dependence on the first timing signal; and a thread scheduler for scheduling a plurality of threads for execution by the execution unit, each thread comprising a sequence of instructions and the thread scheduler being arranged to schedule the threads in dependence on the first timing signal. The port is arranged to transfer data between the port and an external environment in dependence on a second timing signal, and to alter a ready signal in dependence on the second timing signal to indicate a transfer of data with the external environment. The thread scheduler is configured to schedule one or more associated threads for execution in dependence on the ready signal.

Type: Application

Filed: April 17, 2007

Publication date: October 23, 2008

Inventors: Michael David May, Peter Hedinger, Alastair Dixon
Conditional execution of instructions in a computer

Patent number: 7441098

Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.

Type: Grant

Filed: May 6, 2005

Date of Patent: October 21, 2008

Assignee: Broadcom Corporation

Inventor: Sophie Wilson
Data processing apparatus and method for executing a sequence of instructions including a multiple iteration instruction

Patent number: 7437544

Abstract: A data processing apparatus and method are provided for executing a sequence of instructions including at least one multiple iteration instruction. The data processing apparatus comprises an instruction store for storing the sequence of instructions, and a processing unit for executing the sequence of instructions, the processing unit comprising at least a first processing path and a second processing path to enable at least two instructions of the sequence to be executed in parallel. When executing instructions in parallel, the first processing path executes an instruction which is earlier in the sequence than the instruction executing in the second processing path. The processing unit is operable when executing a multiple iteration instruction to allow a first iteration of the multiple iteration instruction to be executed in either the first processing path or the second processing path, but to cause all remaining iterations of the multiple iteration instruction to be executed in the first processing path.

Type: Grant

Filed: April 29, 2005

Date of Patent: October 14, 2008

Assignee: ARM Limited

Inventors: Ann Sekli Chin, David James Williamson
System and method for assigning tags to control instruction processing in a superscalar processor

Patent number: 7430651

Abstract: A tag monitoring system for assigning tags to instructions. A source supplies instructions to be executed by a functional unit. A register file stores information required for the execution of each instruction. A queue having a plurality of slots containing tags which are used for tagging the instructions. The tags are arranged in the queue in an order specified by the program order of their corresponding instructions. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file stores an instruction's information at a location in the register file defined by the tag assigned to that instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.

Type: Grant

Filed: January 25, 2006

Date of Patent: September 30, 2008

Assignee: Seiko-Epson Corporation

Inventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
Pipelined processor with multi-cycle grouping for instruction dispatch with inter-group and intra-group dependency checking

Patent number: 7430653

Abstract: A pipelined instruction dispatch or grouping circuit allows instruction dispatch decisions to be made over multiple processor cycles. In one embodiment, the grouping circuit performs resource allocation and data dependency checks on an instruction group, based on a state vector which includes representation of source and destination registers of instructions within said instruction group and corresponding state vectors for instruction groups of a number of preceding processor cycles.

Type: Grant

Filed: August 2, 1999

Date of Patent: September 30, 2008

Assignee: Sun Microsystems, Inc.

Inventor: Marc Tremblay
Multiple contexts for efficient use of translation lookaside buffer

Patent number: 7430643

Abstract: The present invention provides a method and apparatus for increased efficiency for translation lookaside buffers by collapsing redundant translation table entries into a single translation table entry (TTE). In the present invention, each thread of a multithreaded processor is provided with multiple context registers. Each of these context registers is compared independently to the context of the TTE. If any of the contexts match (and the other match conditions are satisfied), then the translation is allowed to proceed. Two applications attempting to share one page but that still keep separate pages can then employ three total contexts. One context is for one application's private use; one of the contexts is for the other application's private use; and a third context is for the shared page. In one embodiment of the invention, two contexts are implemented per thread. However, the teachings of the present invention can be extended to a higher number of contexts per thread.

Type: Grant

Filed: December 30, 2004

Date of Patent: September 30, 2008

Assignee: Sun Microsystems, Inc.

Inventors: Paul J. Jordan, William J. Kucharski, Roman M. Zajcew, Ashley N. Saulsbury, Quinn A. Jacobson
DATA PROCESSING SYSTEM

Publication number: 20080222336

Abstract: To allow to use arithmetic circuits of sharable resources by priority with a simple procedure. In a data processing system including central processing units and a plurality of arithmetic circuits, wherein the central processing units are able to supply a command to one arithmetic circuit based on one fetched instruction and supply a command to other arithmetic circuit based on other fetched instruction, a memory circuit is provided which is used to store first information indicating which arithmetic circuit is executing a command, and second information indicating which central processing unit has reserved the arithmetic circuit for execution of the next command. When the arithmetic circuit is already executing a command, reservation of the arithmetic circuit for execution of the next command using the second information of the memory circuit, makes it possible, after the execution, to assign operation commands fast to the arithmetic circuits and cause them to execute the commands.

Type: Application

Filed: January 14, 2008

Publication date: September 11, 2008

Inventors: Yoshikazu KIYOSHIGE, Shunichi Iwata, Kesami Hagiwara, Akihiko Tomita
Apparatus and method for switching threads in multi-threading processors

Patent number: 7421571

Abstract: A multi-threaded processor is provided. The multi-threading processor includes a first instruction fetch unit and a second instruction fetch unit. A multi-thread scheduler unit is coupled to the first instruction fetch unit and the second instruction fetch unit. An execution unit, which executes a first active thread and a second active thread is coupled to the scheduler unit. The multi-threading processor also includes a register file coupled to the execution unit. The register file switches one of the first active thread and the second active threads with a first inactive thread.

Type: Grant

Filed: August 25, 2006

Date of Patent: September 2, 2008

Assignee: Intel Corporation

Inventor: Ken Shoemaker
Register file indexing methods and apparatus for providing indirect control of register addressing in a VLIW processor

Patent number: RE41012

Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibility in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.

Type: Grant

Filed: June 3, 2004

Date of Patent: November 24, 2009

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand

prev … 2 3 4 5 6 7 8 9 10 … next