Of Multiple Instructions Simultaneously Patents (Class 712/206)

Methods and apparatuses for reducing step loads of processors

Patent number: 7992017

Abstract: Methods and apparatuses for reducing step loads of processors are disclosed. Method embodiments comprise examining a number of instructions to be processed by a processor to determine the types of instructions that it has, calculating power consumption by in an execution period based on the types of instructions, and limiting the execution to a subset of instructions of the number to control the quantity of power for the execution period. Some embodiments may also create artificial activity to provide a minimum power floor for the processor. Apparatus embodiments comprise instruction type determination logic to determine types of instructions in an incoming instruction stream, a power calculator to calculate power consumption associated with processing a number of instructions in an execution period, and instruction throttling logic to control the power consumption by limiting the number of instructions to be processed in the execution period.

Type: Grant

Filed: September 11, 2007

Date of Patent: August 2, 2011

Assignee: Intel Corporation

Inventors: Kevin Safford, Rohit Bhatia, Chris Bostak, Richard Blumberg, Blaine Stackhouse, Steve Undy
Methods and apparatus storing expanded width instructions in a VLIW memory deferred execution

Patent number: 7962723

Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.

Type: Grant

Filed: July 9, 2009

Date of Patent: June 14, 2011

Inventors: Gerald George Pechanek, Stamatis Vassiliadis
VECTOR PROCESSING APPARATUS AND METHOD

Publication number: 20110107063

Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.

Type: Application

Filed: August 2, 2010

Publication date: May 5, 2011

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
Preparing instruction groups for a processor having multiple issue ports

Patent number: 7934203

Abstract: During program code conversion, such as in a dynamic binary translator, automatic code generation provides target code 21 executable by a target processor 13. Multiple instruction ports 610 disperse a group of instructions to functional units 620 of the processor 13. Disclosed is a mechanism of preparing an instruction group 606 using a plurality of pools 700 having a hierarchical structure 711-715. Each pool represents a different overlapping subset of the issue ports 610. Placing an instruction 600 into a particular pool 700 also reduces vacancies in any one or more subsidiary pools in the hierarchy. In a preferred embodiment, a counter value 702 is associated with each pool 700 to track vacancies. A valid instruction group 606 is formed by picking the placed instructions 600 from the pools 700. The instruction groups are generated accurately and automatically. Decoding errors and stalls are minimized or completely avoided.

Type: Grant

Filed: May 27, 2005

Date of Patent: April 26, 2011

Assignee: International Business Machines Corporation

Inventors: William O. Lovett, David Haikney, Matthew Evans
Processor and method for executing a program loop within an instruction word

Patent number: 7913069

Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. Instruction words (48) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In a particular example, the series of operations are included in a single instruction word (48). The micro-loop (100) in combination with the ability of the computers (12) to send instruction words (48) to a neighboring computer (12) provides a powerful tool for allowing a computer (12) to utilize the resources of a neighboring computer (12).

Type: Grant

Filed: May 26, 2006

Date of Patent: March 22, 2011

Assignee: VNS Portfolio LLC

Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
PROCESSOR FOR EXECUTING INSTRUCTION STREAM AT LOW COST, METHOD FOR THE EXECUTION, AND PROGRAM FOR THE EXECUTION

Publication number: 20110010527

Abstract: A VLIW processor executes a very long instruction word containing a plurality of instructions, and executes a plurality of instruction streams at low cost. A processor executing a very long instruction word containing a plurality of instructions fetches concurrently the very long instruction words of up to M instruction streams, from N instruction caches including a plurality of memory banks to store the very long instruction words of the M instruction streams.

Type: Application

Filed: February 3, 2009

Publication date: January 13, 2011

Inventor: Shohei Nomoto
Methods and apparatus for implementing complex parallel instructions using control logic

Patent number: 7870367

Abstract: Methods and apparatus are provided for implementing complex parallel instructions on a processor having a supported instruction set. Complex parallel instructions provide that an operation code, control logic, and input data is passed to a processor core. The operation code identifies the instruction used to process the input data and the control logic identifies the state of the instruction. An intervening instruction can be executed by a processor core even before execution of a complex parallel instruction is complete.

Type: Grant

Filed: June 17, 2003

Date of Patent: January 11, 2011

Assignee: Altera Corporation

Inventor: Chris Robinson
HANDLING AND PROCESSING OF MASSIVE NUMBERS OF PROCESSING INSTRUCTIONS IN REAL TIME

Publication number: 20110004788

Abstract: A system is designed for processing instructions in real time during a session. This system comprises: a preloader for obtaining reference data relating to the instructions, the reference data indicating the current values of each specified resource account data file, and the preloader being arranged to read the reference data for a plurality of received instructions in parallel from a master database; an enriched instruction queue for queuing the instructions together with their respective preloaded reference data; an execution engine for determining sequentially whether each received instruction can be executed under the present values of the relevant resource account files and for each executable instruction to generate an updating command; and an updater, responsive to the updating command from the execution engine (for updating the master database with the results of each executable instruction, the operation of the plurality of updaters being decoupled from the operation of the execution engine.

Type: Application

Filed: February 27, 2009

Publication date: January 6, 2011

Applicant: EUROCLEAR SA/NV

Inventors: Henri Petit, Jean-Francois Collin, Nicolas Marechal, Christine Deloge
DYNAMIC ALLOCATION OF RESOURCES IN A THREADED, HETEROGENEOUS PROCESSOR

Publication number: 20100299499

Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.

Type: Application

Filed: September 30, 2009

Publication date: November 25, 2010

Inventors: Robert T. Golla, Gregory F. Grohoski
System and method for processing thread groups in a SIMD architecture

Patent number: 7836276

Abstract: A SIMD processor efficiently utilizes its hardware resources to achieve higher data processing throughput. The effective width of a SIMD processor is extended by clocking the instruction processing side of the SIMD processor at a fraction of the rate of the data processing side and by providing multiple execution pipelines, each with multiple data paths. As a result, higher data processing throughput is achieved while an instruction is fetched and issued once per clock. This configuration also allows a large group of threads to be clustered and executed together through the SIMD processor so that greater memory efficiency can be achieved for certain types of operations like texture memory accesses performed in connection with graphics processing.

Type: Grant

Filed: December 2, 2005

Date of Patent: November 16, 2010

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm
System and method of executing program threads in a multi-threaded processor

Patent number: 7814487

Abstract: A multithreaded processor device is disclosed and includes a first program thread and second program thread. The second program thread is execution linked to the first program thread in a lock step manner. As such, when the first program thread experiences a stall event, the second program thread is instructed to perform a no operation instruction in order to keep the second program thread execution linked to the first program thread. Also, the second program thread performs a no operation instruction during each clock cycle that the first program thread is stalled due to the stall event. When the first program thread performs a first successful operation after the stall event, the second program thread restarts normal execution.

Type: Grant

Filed: April 26, 2005

Date of Patent: October 12, 2010

Assignee: QUALCOMM Incorporated

Inventors: Lucian Codrescu, Erich Plondke, Muhammad Ahmed, William C. Anderson
Parallel generating of bundles of data objects

Patent number: 7810084

Abstract: Computer-implemented methods, computer systems and computer program products are provided for parallel processing a plurality of data objects with a plurality of processors. As disclosed herein, the data objects to be assembled for further processing may be in bundles, the bundles obeying first predefined criteria, which is dynamically controlled by using a bundle specific master table. The methods and systems may generate pipelines of data objects by pre-selecting and grouping the data objects according to second predefined criteria by a first group of the plurality of processors, and create the bundles from each pipeline of the pre-selected data objects by a second group of the plurality of processors.

Type: Grant

Filed: June 1, 2006

Date of Patent: October 5, 2010

Assignee: SAP AG

Inventor: Karsten S. Egetoft
GENERAL PURPOSE EMBEDDED PROCESSOR

Publication number: 20100228954

Abstract: The invention provides an embedded processor architecture comprising a plurality of virtual processing units that each execute processes or threads (collectively, “threads”). One or more execution units, which are shared by the processing units, execute instructions from the threads. An event delivery mechanism delivers events—such as, by way of non-limiting example, hardware interrupts, software-initiated signaling events (“software events”) and memory events—to respective threads without execution of instructions. Each event can, per aspects of the invention, be processed by the respective thread without execution of instructions outside that thread. The threads need not be constrained to execute on the same respective processing units during the lives of those threads—though, in some embodiments, they can be so constrained. The execution units execute instructions from the threads without needing to know what threads those instructions are from.

Type: Application

Filed: February 4, 2010

Publication date: September 9, 2010

Applicants: SHARP KABUSHIKI KAISHA CORPORATION

Inventors: Steven Frank, Shigeki Imai
Apparatus for compressing instruction word for parallel processing VLIW computer and method for the same

Patent number: 7774581

Abstract: An apparatus and a method are provided for a parallel processing very long instruction word (VLIW) computer. The apparatus includes: an index code generation unit sequentially generating an index code, which is associated with a number of no operation (NOP) instruction word between effective instruction words, with respect to each of instruction word groups to be executed in a VLIW computer; an instruction compression unit sequentially deleting the NOP instruction word which corresponds to the index code with respect to each of instruction word groups; and an instruction word conversion unit converting the effective instruction words to include the index code, the effective instruction words corresponding to the NOP instruction words.

Type: Grant

Filed: August 14, 2007

Date of Patent: August 10, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: Chang-Woo Baek, Hong-Seok Kim, Hee Seok Kim, Jeongwook Kim
Cache control device and control method

Publication number: 20100169577

Abstract: In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration.

Type: Application

Filed: December 17, 2009

Publication date: July 1, 2010

Applicant: Fujitsu Limited

Inventor: Naohiro Kiyota
Apparatus and method for implementing a hash algorithm word buffer

Patent number: 7720219

Abstract: An apparatus and method for implementing a hash algorithm word buffer. In one embodiment, a cryptographic unit may include hash logic configured to compute a hash value of a data block according to a hash algorithm, where the hash algorithm includes a plurality of iterations, and where the data block includes a plurality of data words. The cryptographic unit may further include a word buffer comprising a plurality of data word positions and configured to store the data block during computing by the hash logic, where subsequent to the hash logic computing one of the iterations of the hash algorithm, the word buffer is further configured to linearly shift the data block by one or more data word positions according to the hash algorithm. The hash algorithm may be dynamically selectable from a plurality of hash algorithms.

Type: Grant

Filed: October 19, 2004

Date of Patent: May 18, 2010

Assignee: Oracle America, Inc.

Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
Scheduling compatible threads in a simultaneous multi-threading processor using cycle per instruction value occurred during identified time interval

Patent number: 7698707

Abstract: Identifying compatible threads in a Simultaneous Multithreading (SMT) processor environment is provided by calculating a performance metric, such as cycles per instruction (CPI), that occurs when two threads are running on the SMT processor. The CPI that is achieved when both threads were executing on the SMT processor is determined. If the CPI that was achieved is better than the compatibility threshold, then information indicating the compatibility is recorded. When a thread is about to complete, the scheduler looks at the run queue from which the completing thread belongs to dispatch another thread. The scheduler identifies a thread that is (1) compatible with the thread that is still running on the SMT processor (i.e., the thread that is not about to complete), and (2) ready to execute. The CPI data is continually updated so that threads that are compatible with one another are continually identified.

Type: Grant

Filed: February 25, 2008

Date of Patent: April 13, 2010

Assignee: International Business Machines Corporation

Inventors: Jos Manuel Accapadi, Andrew Dunshea, Dirk Michel, Mysore Sathyanarayana Srinivas
Data processing apparatus of high speed process using memory of low speed and low power consumption

Patent number: 7694109

Abstract: When fetching an instruction from a plurality of memory banks, a first pipeline cycle corresponding to selection of a memory bank and a second pipeline cycle corresponding to instruction readout are generated to carry out a pipeline process. Only the selected memory bank can be precharged to allow reduction of power consumption. Since the first and second pipeline cycles are effected in parallel, the throughput of the instruction memory can be improved.

Type: Grant

Filed: December 4, 2007

Date of Patent: April 6, 2010

Assignee: Renesas Technology Corp.

Inventors: Toyohiko Yoshida, Akira Yamada, Hisakazu Sato
System and method for improving the page crossing performance of a data prefetcher

Patent number: 7689774

Abstract: A system and method for improving the page crossing performance of a data prefetcher is presented. A prefetch engine tracks times at which a data stream terminates due to a page boundary. When a certain percentage of data streams terminate at page boundaries, the prefetch engine sets an aggressive profile flag. In turn, when the data prefetch engine receives a real address that corresponds to the beginning/end of a new page, and the aggressive profile flag is set, the prefetch engine uses an aggressive startup profile to generate and schedule prefetches on the assumption that the real address is highly likely to be the continuation of a long data stream. As a result, the system and method minimize latency when crossing real page boundaries when a program is predominately accessing long streams.

Type: Grant

Filed: April 6, 2007

Date of Patent: March 30, 2010

Assignee: International Business Machines Corporation

Inventors: Francis Patrick O'Connell, Jeffrey A. Stuecheli
Method and Apparatus for Reducing Latency Associated with Executing Multiple Instruction Groups

Publication number: 20100064118

Abstract: A method and apparatus for reducing latency in computer processors. The method incorporates a special instruction set that provides an indication of whether a particular instruction is capable of being executed nearly simultaneously with a preceding instruction in the same group. In such a situation, multiple instructions may be executed at a rate faster than expected. A simple apparatus for accomplishing this method is illustrated.

Type: Application

Filed: September 10, 2008

Publication date: March 11, 2010

Applicant: VNS PORTFOLIO LLC

Inventor: Charles H. Moore
VLIW optional fetch packet header extends instruction set space

Patent number: 7673119

Abstract: This invention is useful in a very long instruction word data processor that fetches a predetermined plural number of instructions each operation cycle. A predetermined one of these instructions is used as a special header. This special header has a unique encoding different from any normal instruction. When decoded this special header instructs decode hardware to decode this fetch packet in a special way. In one embodiment a bit field in the header signals the decode hardware whether to decode each instruction word normally or in an alternative way. The header may include extension opcode bits corresponding to each of the other instruction slots. In another embodiment another bit field signals whether to decode an instruction field as one normal length instruction or as two half-length instructions.

Type: Grant

Filed: May 8, 2006

Date of Patent: March 2, 2010

Assignee: Texas Instruments Incorporated

Inventors: Michael D. Asal, Eric J. Stotzer, Todd T. Hahn
Data processing apparatus with parallel operating functional units

Patent number: 7664929

Abstract: A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.

Type: Grant

Filed: September 17, 2003

Date of Patent: February 16, 2010

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Carlos Antonio Alba Pinto, Ramanathan Sethuraman, Srinivasan Balakrishnan, Harm Johannes Antonius Maria Peters, Rafael Peset Llopis
THREAD COMPLETION RATE CONTROLLED SCHEDULING

Publication number: 20100031006

Abstract: A method, processor and processing system provide management of per-thread pipeline resource allocation in a simultaneous multi-threaded (SMT) processor by counting indications of instruction completion for each of the threads. The indication may be the commit phase of the pipeline, which indicates results of the pipeline instruction execution are ready for write-back. The completion counts are used in a relative or absolute form to control the pipeline resource allocation. The decode or fetch rates of instructions for the threads can be controlled from the relative or absolute completion counts, providing control of scheduling instructions among the threads for execution by execution pipeline(s). Alternatively, or in combination, the thread priority registers in any thread priority management scheme can be controlled by comparison and/or scaling of the completion counts.

Type: Application

Filed: August 4, 2008

Publication date: February 4, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Wael R. El-essawy, Lixin Zhang
PREDICATION SUPPORT IN AN OUT-OF-ORDER PROCESSOR BY SELECTIVELY EXECUTING AMBIGUOUSLY RENAMED WRITE OPERATIONS

Publication number: 20090287908

Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.

Type: Application

Filed: May 19, 2008

Publication date: November 19, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
Methods and Apparatus storing expanded width instructions in a VLIW memory for deferred execution

Publication number: 20090276576

Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution arc addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.

Type: Application

Filed: July 9, 2009

Publication date: November 5, 2009

Applicant: Altera Corporation

Inventors: Gerald George Pechanek, Stamatis Vassiliadis
PROCESSOR

Publication number: 20090228687

Abstract: A processor includes: an instruction buffer which holds a group of instructions that can be executed in parallel; an instruction decoding unit which decodes part or all of the group of instructions; and an instruction issuance control unit which detects whether or not a factor obstructing simultaneous execution of the group of instructions exists in the group of instructions and supplies the group of instructions to the instruction decoding unit by controlling the instruction buffer so that the instructions of the group of instructions are sequentially supplied when the factor exists and all the instructions of the group of instructions are simultaneously supplied when the factor does not exist.

Type: Application

Filed: March 9, 2006

Publication date: September 10, 2009

Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.

Inventor: Tetsu Hosoki
Software source transfer selects instruction word sizes

Patent number: 7581082

Abstract: This invention employs a 16-bit instruction set that has a subset of the functionality of the 32-bit instruction set. In this invention 16-bit instructions and 32-bit instructions can coexist in the same fetch packet. In the prior architecture 32-bit instructions may not span a 32-bit boundary. The 16-bit instruction set is implemented with a special fetch packet header that signals whether the fetch packet includes some 16-bit instructions. This fetch packet header also has special bits that tell the hardware how to interpret a particular 16-bit instruction. These bits essentially allow overlays on the whole or part of the 16-bit instruction space. This makes the opcode space larger permitting more instructions than with a pure 16-bit opcode space.

Type: Grant

Filed: May 8, 2006

Date of Patent: August 25, 2009

Assignee: Texas Instruments Incorporated

Inventors: Todd T. Hahn, Eric J. Stotzer, Michael D. Asal
METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR AN IMPLICIT PREDICTED RETURN FROM A PREDICTED SUBROUTINE

Publication number: 20090210661

Abstract: A method, system and computer program product for performing an implicit predicted return from a predicted subroutine are provided. The system includes a branch history table/branch target buffer (BHT/BTB) to hold branch information, including a target address of a predicted subroutine and a branch type. The system also includes instruction buffers, and instruction fetch controls to perform a method including fetching a branch instruction at a branch address and a return-point instruction. The method also includes receiving the target address and the branch type, and fetching a fixed number of instructions in response to the branch type. The method further includes referencing the return-point instruction within the instruction buffers such that the return-point instruction is available upon completing the fetching of the fixed number of instructions absent a re-fetch of the return-point instruction.

Type: Application

Filed: February 20, 2008

Publication date: August 20, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Khary J. Alexander, James J. Bonanno, Brian R. Prasky, Anthony Saporito, Robert J. Sonnelitter, III, Charles F. Webb
Computer with two execution modes

Publication number: 20090204785

Abstract: A computer. A processor pipeline alternately executes instructions coded for first and second different computer architectures or coded to implement first and second different processing conventions. A memory stores instructions for execution by the processor pipeline, the memory being divided into pages for management by a virtual memory manager, a single address space of the memory having first and second pages. A memory unit fetches instructions from the memory for execution by the pipeline, and fetches stored indicator elements associated with respective memory pages of the single address space from which the instructions are to be fetched. Each indicator element is designed to store an indication of which of two different computer architectures and/or execution conventions under which instruction data of the associated page are to be executed by the processor pipeline.

Type: Application

Filed: October 31, 2007

Publication date: August 13, 2009

Inventors: John S. Yates, JR., David L. Reese, Korbin S. Van Dyke, T. R. Ramesh, Paul H. Hohensee
Fetching all or portion of instructions in memory line up to branch instruction based on branch prediction and size indicator stored in branch target buffer indexed by fetch address

Patent number: 7552314

Abstract: The invention provides a method and apparatus for branch prediction in a processor. A fetch-block branch target buffer is used in an early stage of pipeline processing before the instruction is decoded, which stores information about a control transfer instruction for a “block” of instruction memory. The block of instruction memory is represented by a block entry in the fetch-block branch target buffer. The block entry represents one recorded control-transfer instruction (such as a branch instruction) and a set of sequentially preceding instructions, up to a fixed maximum length N. Indexing into the fetch-block branch target buffer yields an answer whether the block entry represents memory that contains a previously executed a control-transfer instruction, a length value representing the amount of memory that contains the instructions represented by the block, and an indicator for the type of control-transfer instruction that terminates the block, its target and outcome.

Type: Grant

Filed: October 17, 2005

Date of Patent: June 23, 2009

Assignee: STMicroelectronics, Inc.

Inventors: Anatoly Gelman, Russell Schnapp
Continuously providing instructions to a programmable device

Patent number: 7546451

Abstract: A system and method for enabling a programmable device to execute instructions without interruption. An instruction space for storing instructions from a host application is bifurcated to define a program segment and a hold segment. At startup, instructions are loaded into the hold segment, and the programmable device begins executing those instructions. While the hold segment instructions are executed, the program segment is loaded with instructions. Once the program segment is filled, control is shifted to it and instructions from this segment are executed by the programmable device. When the program segment has been executed, control is shifted back to the hold segment, and instructions are taken from it while the program segment is reloaded with a fresh set of instructions from the host application. Once the program segment is reloaded, control is redirected and execution of instructions from the program segment is continued.

Type: Grant

Filed: June 19, 2002

Date of Patent: June 9, 2009

Assignee: Finisar Corporation

Inventors: Chris Cicchetti, Jean-François Dubé, Thomas Andrew Myers, An Huynh, Geoffrey T. Hibbert
Methods and apparatus for meta-architecture defined programmable instruction fetch functions supporting assembled variable length instruction processors

Patent number: 7509483

Abstract: A computing architecture and software techniques are described which modifies the basic sequential instruction fetching mechanism of a processor by separating a program's control flow from its functional execution flow. A compiled sequential HLL program's static control structures are analyzed and a separate program based on its own unique instructions is created that primarily generates addresses for the selection of functional execution instructions. The original program is now represented by an instruction fetch program and a set of function/logic execution instructions. This basic split allows multiple instruction addresses to be generated in parallel to access multiple instruction memories. These multiple instruction memories contain only the function/logic instructions of the program and no control structure operations such as branches or calls. All the original program's control instructions are split from the original program and used to create the instruction addressing program.

Type: Grant

Filed: February 22, 2007

Date of Patent: March 24, 2009

Assignee: Renesky Tap III, Limited Liability Company

Inventor: Gerald George Pechanek
Methods and apparatus for updating of a branch history table

Patent number: 7500088

Abstract: Methods and apparatus are provided for enhanced instruction handling in processing environments. If branch misprediction occurs during instruction processing, a branch history table may be updated based upon the number of instructions to be fetched. The branch history table may be updated in accordance with a first mode if at least two instructions are available, and may be updated in accordance with a second mode if less than two instructions are available. A compiler can assist the processing by aligning instructions for processing. The instructions can be aligned across multiple instruction fetch groups so that instructions are available for fetching and the branch history table is updated prior to performing a branching operation.

Type: Grant

Filed: July 8, 2004

Date of Patent: March 3, 2009

Assignee: Sony Computer Entertainment Inc.

Inventor: Masaki Osawa
Method and apparatus for sharing instruction memory among a plurality of processors

Patent number: 7500066

Abstract: A multiprocessing apparatus includes a memory and a plurality (M) of processors coupled to share the memory. Access to the memory is time-division multiplexed among the plurality of processors. In one embodiment, a selected processor retrieves M words of instruction forming K instructions during a given clock cycle. The selected processor executes M?K NOP instructions if K<M.

Type: Grant

Filed: April 30, 2005

Date of Patent: March 3, 2009

Assignee: Tellabs Operations, Inc.

Inventors: Thayl D. Zohner, Lawrence D. Weizeorick, Keith M. Ellens
High-performance, superscalar-based computer system with out-of-order instruction execution

Patent number: 7487333

Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.

Type: Grant

Filed: November 5, 2003

Date of Patent: February 3, 2009

Assignee: Seiko Epson Corporation

Inventors: Le-Trong Nguyen, Derek J Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H Trang
Systems for loading unaligned words and methods of operating the same

Patent number: 7480783

Abstract: Disclosed are systems for loading an unaligned word from a specified unaligned word address in a memory, the unaligned word comprising a plurality of indexed portions crossing a word boundry, a method of operating the system comprising: loading a first aligned word commencing at an aligned word address rounded from the specified unaligned word address; identifying an index representing the location of the unaligned word address relative to the aligned word address; loading a second aligned word commencing at an aligned word address rounded from a second unaligned word address; and combining indexed portions of the first and second alinged words using the indentified index to construct the unaligned word.

Type: Grant

Filed: August 19, 2004

Date of Patent: January 20, 2009

Assignees: STMicroelectronics Limited, Hewlett-Packard Company

Inventors: Mark O. Homewood, Paolo Faraboschi
SCHEDULING THREADS IN A PROCESSOR

Publication number: 20080301409

Abstract: The invention provides a processor for executing threads, each thread comprising a sequence of instructions, said instructions defining operations and at least some of those instructions defining a memory access operation. The processor comprises: a plurality of instruction buffers, each for holding at least one instruction of a thread associated with that buffer; an instruction issue stage for issuing instructions from the instruction buffers; and a memory access stage connected to a memory and arranged to receive instructions issued by the instruction issue stage. The memory access stage comprises: detecting logic adapted to detect whether a memory access operation is defined in each issued instruction; and instruction fetch logic adapted to instigate an instruction fetch to fetch an instruction of a thread when no memory access operation is detected.

Type: Application

Filed: May 30, 2007

Publication date: December 4, 2008

Inventor: Michael David MAY
Computer processing system employing an instruction schedule cache

Patent number: 7454597

Abstract: A processor core and method of executing instructions, both of which utilizes schedules, are presented. Each of the schedules includes a sequence of instructions, an address of a first of the instructions in the schedule, an order vector of an original order of the instructions in the schedule, a rename map of registers for each register in the schedule, and a list of register names used in the schedule. The schedule exploits instruction-level parallelism in executing out-of-order instructions. The processor core includes a schedule cache that is configured to store schedules, a shared cache configured to store both I-side and D-side cache data, and an execution resource for requesting a schedule to be executed from the schedule cache. The processor core further includes a scheduler disposed between the schedule cache and the cache.

Type: Grant

Filed: January 2, 2007

Date of Patent: November 18, 2008

Assignee: International Business Machines Corporation

Inventors: Krishnan K. Kailas, Ravi Nair, Sumedh W. Sathaye, Wolfram Sauer, John-David Wellman
Multiple parallel pipeline processor having self-repairing capability

Patent number: 7454654

Abstract: A multiple parallel pipeline digital processing apparatus has the capability to substitute a second pipeline for a first in the event that a failure is detected in the first pipeline. Preferably, a redundant pipeline is shared by multiple primary pipelines. Preferably, the pipelines are located physically adjacent one another in an array. A pipeline failure causes data to be shifted one position within the array of pipelines, to by-pass the failing pipeline, so that each pipeline has only two sources of data, a primary and an alternate. Preferably, selection logic controlling the selection between a primary and alternate source of pipeline data is integrated with other pipeline operand selection logic.

Type: Grant

Filed: September 13, 2006

Date of Patent: November 18, 2008

Assignee: International Business Machines Corporation

Inventor: David A. Luick
System and Method for using a Local Condition Code Register for Accelerating Conditional Instruction Execution in a Pipeline Processor

Publication number: 20080276072

Abstract: A method of executing a conditional instruction within a pipeline processor having a plurality of pipelines, the processor having a first condition code register associated with a first pipeline and a second condition code register associated with a second pipeline is disclosed. The method saves a most recent condition code value to either the first condition code register or the second condition code register. The method further sets an indicator indicating whether the second condition code register has the most recent condition code value and retrieves the most recent condition code value from either the first or second condition code register based on the indicator. The method uses the most recent condition code value to determine if the conditional instruction should be executed.

Type: Application

Filed: May 3, 2007

Publication date: November 6, 2008

Inventor: Bohuslav Rychlik
Multiple thread instruction fetch from different cache levels

Publication number: 20080270758

Abstract: A data processing apparatus is provided wherein processing circuitry executes multiple program threads including at least one high priority thread and at least one lower priority thread. Instructions required by the threads are retrieved from a cache memory hierarchy comprising multiple cache levels. The cache memory hierarchy includes a bypass path for omitting a predetermined level of the cache memory hierarchy when performing a lookup procedure for a required instruction and for bypassing said predetermined level of the cache memory hierarchy when returning said required instruction to said processing circuitry. The bypass path is used by default when the requested instruction is for a lower priority thread.

Type: Application

Filed: April 27, 2007

Publication date: October 30, 2008

Applicant: ARM Limited

Inventors: Emre Ozer, Stuart David Biles
SCALABLE PROCESSING ARCHITECTURE

Publication number: 20080244230

Abstract: A computation node according to various embodiments of the invention includes at least one input port capable of being coupled to at least one first other 5 computation node, a first store coupled to the input port(s) to store input data, a second store to receive and store instructions, an instruction wakeup unit to match the input data to the instructions, at least one execution unit to execute the instructions, using the input data to produce output data, and at least one output port capable of being coupled to at least one second other computation node. The node may also include a router to direct the output data from the output port(s) to the second other node. A system according to various embodiments of the invention includes and external instruction sequencer to fetch a group of instructions, and one or more interconnected, preselected computational nodes.

Type: Application

Filed: June 10, 2008

Publication date: October 2, 2008

Applicant: Board of Regents, The University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler, Karthikevan Sankaralingam, Ramadass Nagarajan
Data processor

Patent number: 7424598

Abstract: The data processor for executing, instructions realized by wired logic, by a pipeline system, includes a plurality of instruction registers, and arithmetic operation units of the same number. A plurality of instructions read in the instruction registers in one machine cycle at a time are processed in parallel by the plurality of arithmetic operation units.

Type: Grant

Filed: May 14, 2001

Date of Patent: September 9, 2008

Assignee: Renesas Technology Corp.

Inventors: Takashi Hotta, Shigeya Tanaka, Hideo Maejima
Handling cache miss in an instruction crossing a cache line boundary

Patent number: 7404042

Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.

Type: Grant

Filed: May 18, 2005

Date of Patent: July 22, 2008

Assignee: QUALCOMM Incorporated

Inventors: Brian Michael Stempel, Jeffrey Todd Bridges, Rodney Wayne Smith, Thomas Andrew Sartorius
Inter-cluster communication module using the memory access network

Patent number: 7404048

Abstract: An inter-cluster communication module using the memory access network is provided, including a plurality of clusters, a memory subsystem, a controller and a switch device. When some clusters issue a load instruction and some clusters issue a store instruction of an identical memory address concurrently, the controller controls the switch device which connects the clusters and the memory banks of the memory subsystem, so that the data item is transmitted from the cluster issuing the store instruction to the cluster issuing the load instruction through the switch device, thereby achieving data exchange between the clusters. Herein, the data item is selectively stored in the memory module depending on the address. Furthermore, the data item is also transmitted between the memory and the clusters over the switch device.

Type: Grant

Filed: October 11, 2005

Date of Patent: July 22, 2008

Assignee: Industrial Technology Research Institute

Inventors: Tay-Jyi Lin, Pi-Chen Hsiao, Chih-Wei Liu, Chein-Wei Jen, I-Tao Liao, Po-Han Huang
Apparatus and method for adjusting instruction thread priority in a multi-thread processor

Patent number: 7401207

Abstract: Each instruction thread in a SMT processor is associated with a software assigned base input processing priority. Unless some predefined event or circumstance occurs with an instruction being processed or to be processed, the base input processing priorities of the respective threads are used to determine the interleave frequency between the threads according to some instruction interleave rule. However, upon the occurrence of some predefined event or circumstance in the processor related to a particular instruction thread, the base input processing priority of one or more instruction threads is adjusted to produce one more adjusted priority values. The instruction interleave rule is then enforced according to the adjusted priority value or values together with any base input processing priority values that have not been subject to adjustment.

Type: Grant

Filed: April 25, 2003

Date of Patent: July 15, 2008

Assignee: International Business Machines Corporation

Inventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
Method and apparatus for randomizing instruction thread interleaving in a multi-thread processor

Patent number: 7401208

Abstract: A processor interleaves instructions according to a priority rule which determines the frequency with which instructions from each respective thread are selected and added to an interleaved stream of instructions to be processed in the data processor. The frequency with which each thread is selected according to the rule may be based on the priorities assigned to the instruction threads. A randomization is inserted into the interleaving process so that the selection of an instruction thread during any particular clock cycle is not based solely by the priority rule, but is also based in part on a random or pseudo random element. This randomization is inserted into the instruction thread selection process so as to vary the order in which instructions are selected from the various instruction threads while preserving the overall frequency of thread selection (i.e. how often threads are selected) set by the priority rule.

Type: Grant

Filed: April 25, 2003

Date of Patent: July 15, 2008

Assignee: International Business Machines Corporation

Inventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
Multi-cluster processor for processing instructions of one or more instruction threads

Patent number: 7398374

Abstract: The invention provides a processor that processes bundles of instructions preferentially through clusters or execution units according to thread characteristics. The cluster architectures of the invention preferably include capability to process “multi-threaded” instructions. Selectively, the architecture either (a) processes singly-threaded instructions through a single cluster to avoid bypassing and to increase throughput, or (b) processes singly-threaded instructions through multiple processes to increase “per thread” performance. The architecture may be “configurable” to operate in one of two modes: in a “wide” mode of operation, the processor's internal clusters collectively process bundled instructions of one thread of a program at the same time; in a “throughput” mode of operation, those clusters independently process instruction bundles of separate program threads. Clusters are often implemented on a common die, with a core and register file per cluster.

Type: Grant

Filed: February 27, 2002

Date of Patent: July 8, 2008

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Eric DeLano
Mechanism for irrevocable transactions

Publication number: 20080162881

Abstract: A method and apparatus for designating and handling irrevocable transaction is herein described. In response to detecting an irrevocable event, such as an I/O operation, a user-defined irrevocable designation, and a dynamic failure profile, a transaction is designated as irrevocable. In response to designating a transaction as irrevocable, Single Owner Read Locks (SORLs) are acquired for previous and subsequent reads in the irrevocably designated transaction to ensure the transaction is able to complete without modification to locations read from, while permitting remote resources to load from those locations to continue execution.

Type: Application

Filed: December 28, 2006

Publication date: July 3, 2008

Inventors: Adam Welc, Bratin Saha, Ali-Reza Adl-Tabatabai
Register file indexing methods and apparatus for providing indirect control of register addressing in a VLIW processor

Patent number: RE41012

Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibility in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.

Type: Grant

Filed: June 3, 2004

Date of Patent: November 24, 2009

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand

prev 1 2 3 4 5 6 7 next