Of Multiple Instructions Simultaneously Patents (Class 712/206)
  • Patent number: 7992017
    Abstract: Methods and apparatuses for reducing step loads of processors are disclosed. Method embodiments comprise examining a number of instructions to be processed by a processor to determine the types of instructions that it has, calculating power consumption by in an execution period based on the types of instructions, and limiting the execution to a subset of instructions of the number to control the quantity of power for the execution period. Some embodiments may also create artificial activity to provide a minimum power floor for the processor. Apparatus embodiments comprise instruction type determination logic to determine types of instructions in an incoming instruction stream, a power calculator to calculate power consumption associated with processing a number of instructions in an execution period, and instruction throttling logic to control the power consumption by limiting the number of instructions to be processed in the execution period.
    Type: Grant
    Filed: September 11, 2007
    Date of Patent: August 2, 2011
    Assignee: Intel Corporation
    Inventors: Kevin Safford, Rohit Bhatia, Chris Bostak, Richard Blumberg, Blaine Stackhouse, Steve Undy
  • Patent number: 7962723
    Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.
    Type: Grant
    Filed: July 9, 2009
    Date of Patent: June 14, 2011
    Inventors: Gerald George Pechanek, Stamatis Vassiliadis
  • Publication number: 20110107063
    Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.
    Type: Application
    Filed: August 2, 2010
    Publication date: May 5, 2011
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
  • Patent number: 7934203
    Abstract: During program code conversion, such as in a dynamic binary translator, automatic code generation provides target code 21 executable by a target processor 13. Multiple instruction ports 610 disperse a group of instructions to functional units 620 of the processor 13. Disclosed is a mechanism of preparing an instruction group 606 using a plurality of pools 700 having a hierarchical structure 711-715. Each pool represents a different overlapping subset of the issue ports 610. Placing an instruction 600 into a particular pool 700 also reduces vacancies in any one or more subsidiary pools in the hierarchy. In a preferred embodiment, a counter value 702 is associated with each pool 700 to track vacancies. A valid instruction group 606 is formed by picking the placed instructions 600 from the pools 700. The instruction groups are generated accurately and automatically. Decoding errors and stalls are minimized or completely avoided.
    Type: Grant
    Filed: May 27, 2005
    Date of Patent: April 26, 2011
    Assignee: International Business Machines Corporation
    Inventors: William O. Lovett, David Haikney, Matthew Evans
  • Patent number: 7913069
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. Instruction words (48) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In a particular example, the series of operations are included in a single instruction word (48). The micro-loop (100) in combination with the ability of the computers (12) to send instruction words (48) to a neighboring computer (12) provides a powerful tool for allowing a computer (12) to utilize the resources of a neighboring computer (12).
    Type: Grant
    Filed: May 26, 2006
    Date of Patent: March 22, 2011
    Assignee: VNS Portfolio LLC
    Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
  • Publication number: 20110010527
    Abstract: A VLIW processor executes a very long instruction word containing a plurality of instructions, and executes a plurality of instruction streams at low cost. A processor executing a very long instruction word containing a plurality of instructions fetches concurrently the very long instruction words of up to M instruction streams, from N instruction caches including a plurality of memory banks to store the very long instruction words of the M instruction streams.
    Type: Application
    Filed: February 3, 2009
    Publication date: January 13, 2011
    Inventor: Shohei Nomoto
  • Patent number: 7870367
    Abstract: Methods and apparatus are provided for implementing complex parallel instructions on a processor having a supported instruction set. Complex parallel instructions provide that an operation code, control logic, and input data is passed to a processor core. The operation code identifies the instruction used to process the input data and the control logic identifies the state of the instruction. An intervening instruction can be executed by a processor core even before execution of a complex parallel instruction is complete.
    Type: Grant
    Filed: June 17, 2003
    Date of Patent: January 11, 2011
    Assignee: Altera Corporation
    Inventor: Chris Robinson
  • Publication number: 20110004788
    Abstract: A system is designed for processing instructions in real time during a session. This system comprises: a preloader for obtaining reference data relating to the instructions, the reference data indicating the current values of each specified resource account data file, and the preloader being arranged to read the reference data for a plurality of received instructions in parallel from a master database; an enriched instruction queue for queuing the instructions together with their respective preloaded reference data; an execution engine for determining sequentially whether each received instruction can be executed under the present values of the relevant resource account files and for each executable instruction to generate an updating command; and an updater, responsive to the updating command from the execution engine (for updating the master database with the results of each executable instruction, the operation of the plurality of updaters being decoupled from the operation of the execution engine.
    Type: Application
    Filed: February 27, 2009
    Publication date: January 6, 2011
    Applicant: EUROCLEAR SA/NV
    Inventors: Henri Petit, Jean-Francois Collin, Nicolas Marechal, Christine Deloge
  • Publication number: 20100299499
    Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.
    Type: Application
    Filed: September 30, 2009
    Publication date: November 25, 2010
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Patent number: 7836276
    Abstract: A SIMD processor efficiently utilizes its hardware resources to achieve higher data processing throughput. The effective width of a SIMD processor is extended by clocking the instruction processing side of the SIMD processor at a fraction of the rate of the data processing side and by providing multiple execution pipelines, each with multiple data paths. As a result, higher data processing throughput is achieved while an instruction is fetched and issued once per clock. This configuration also allows a large group of threads to be clustered and executed together through the SIMD processor so that greater memory efficiency can be achieved for certain types of operations like texture memory accesses performed in connection with graphics processing.
    Type: Grant
    Filed: December 2, 2005
    Date of Patent: November 16, 2010
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm
  • Patent number: 7814487
    Abstract: A multithreaded processor device is disclosed and includes a first program thread and second program thread. The second program thread is execution linked to the first program thread in a lock step manner. As such, when the first program thread experiences a stall event, the second program thread is instructed to perform a no operation instruction in order to keep the second program thread execution linked to the first program thread. Also, the second program thread performs a no operation instruction during each clock cycle that the first program thread is stalled due to the stall event. When the first program thread performs a first successful operation after the stall event, the second program thread restarts normal execution.
    Type: Grant
    Filed: April 26, 2005
    Date of Patent: October 12, 2010
    Assignee: QUALCOMM Incorporated
    Inventors: Lucian Codrescu, Erich Plondke, Muhammad Ahmed, William C. Anderson
  • Patent number: 7810084
    Abstract: Computer-implemented methods, computer systems and computer program products are provided for parallel processing a plurality of data objects with a plurality of processors. As disclosed herein, the data objects to be assembled for further processing may be in bundles, the bundles obeying first predefined criteria, which is dynamically controlled by using a bundle specific master table. The methods and systems may generate pipelines of data objects by pre-selecting and grouping the data objects according to second predefined criteria by a first group of the plurality of processors, and create the bundles from each pipeline of the pre-selected data objects by a second group of the plurality of processors.
    Type: Grant
    Filed: June 1, 2006
    Date of Patent: October 5, 2010
    Assignee: SAP AG
    Inventor: Karsten S. Egetoft
  • Publication number: 20100228954
    Abstract: The invention provides an embedded processor architecture comprising a plurality of virtual processing units that each execute processes or threads (collectively, “threads”). One or more execution units, which are shared by the processing units, execute instructions from the threads. An event delivery mechanism delivers events—such as, by way of non-limiting example, hardware interrupts, software-initiated signaling events (“software events”) and memory events—to respective threads without execution of instructions. Each event can, per aspects of the invention, be processed by the respective thread without execution of instructions outside that thread. The threads need not be constrained to execute on the same respective processing units during the lives of those threads—though, in some embodiments, they can be so constrained. The execution units execute instructions from the threads without needing to know what threads those instructions are from.
    Type: Application
    Filed: February 4, 2010
    Publication date: September 9, 2010
    Applicants: SHARP KABUSHIKI KAISHA CORPORATION
    Inventors: Steven Frank, Shigeki Imai
  • Patent number: 7774581
    Abstract: An apparatus and a method are provided for a parallel processing very long instruction word (VLIW) computer. The apparatus includes: an index code generation unit sequentially generating an index code, which is associated with a number of no operation (NOP) instruction word between effective instruction words, with respect to each of instruction word groups to be executed in a VLIW computer; an instruction compression unit sequentially deleting the NOP instruction word which corresponds to the index code with respect to each of instruction word groups; and an instruction word conversion unit converting the effective instruction words to include the index code, the effective instruction words corresponding to the NOP instruction words.
    Type: Grant
    Filed: August 14, 2007
    Date of Patent: August 10, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Chang-Woo Baek, Hong-Seok Kim, Hee Seok Kim, Jeongwook Kim
  • Publication number: 20100169577
    Abstract: In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration.
    Type: Application
    Filed: December 17, 2009
    Publication date: July 1, 2010
    Applicant: Fujitsu Limited
    Inventor: Naohiro Kiyota
  • Patent number: 7720219
    Abstract: An apparatus and method for implementing a hash algorithm word buffer. In one embodiment, a cryptographic unit may include hash logic configured to compute a hash value of a data block according to a hash algorithm, where the hash algorithm includes a plurality of iterations, and where the data block includes a plurality of data words. The cryptographic unit may further include a word buffer comprising a plurality of data word positions and configured to store the data block during computing by the hash logic, where subsequent to the hash logic computing one of the iterations of the hash algorithm, the word buffer is further configured to linearly shift the data block by one or more data word positions according to the hash algorithm. The hash algorithm may be dynamically selectable from a plurality of hash algorithms.
    Type: Grant
    Filed: October 19, 2004
    Date of Patent: May 18, 2010
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
  • Patent number: 7698707
    Abstract: Identifying compatible threads in a Simultaneous Multithreading (SMT) processor environment is provided by calculating a performance metric, such as cycles per instruction (CPI), that occurs when two threads are running on the SMT processor. The CPI that is achieved when both threads were executing on the SMT processor is determined. If the CPI that was achieved is better than the compatibility threshold, then information indicating the compatibility is recorded. When a thread is about to complete, the scheduler looks at the run queue from which the completing thread belongs to dispatch another thread. The scheduler identifies a thread that is (1) compatible with the thread that is still running on the SMT processor (i.e., the thread that is not about to complete), and (2) ready to execute. The CPI data is continually updated so that threads that are compatible with one another are continually identified.
    Type: Grant
    Filed: February 25, 2008
    Date of Patent: April 13, 2010
    Assignee: International Business Machines Corporation
    Inventors: Jos Manuel Accapadi, Andrew Dunshea, Dirk Michel, Mysore Sathyanarayana Srinivas
  • Patent number: 7694109
    Abstract: When fetching an instruction from a plurality of memory banks, a first pipeline cycle corresponding to selection of a memory bank and a second pipeline cycle corresponding to instruction readout are generated to carry out a pipeline process. Only the selected memory bank can be precharged to allow reduction of power consumption. Since the first and second pipeline cycles are effected in parallel, the throughput of the instruction memory can be improved.
    Type: Grant
    Filed: December 4, 2007
    Date of Patent: April 6, 2010
    Assignee: Renesas Technology Corp.
    Inventors: Toyohiko Yoshida, Akira Yamada, Hisakazu Sato
  • Patent number: 7689774
    Abstract: A system and method for improving the page crossing performance of a data prefetcher is presented. A prefetch engine tracks times at which a data stream terminates due to a page boundary. When a certain percentage of data streams terminate at page boundaries, the prefetch engine sets an aggressive profile flag. In turn, when the data prefetch engine receives a real address that corresponds to the beginning/end of a new page, and the aggressive profile flag is set, the prefetch engine uses an aggressive startup profile to generate and schedule prefetches on the assumption that the real address is highly likely to be the continuation of a long data stream. As a result, the system and method minimize latency when crossing real page boundaries when a program is predominately accessing long streams.
    Type: Grant
    Filed: April 6, 2007
    Date of Patent: March 30, 2010
    Assignee: International Business Machines Corporation
    Inventors: Francis Patrick O'Connell, Jeffrey A. Stuecheli
  • Publication number: 20100064118
    Abstract: A method and apparatus for reducing latency in computer processors. The method incorporates a special instruction set that provides an indication of whether a particular instruction is capable of being executed nearly simultaneously with a preceding instruction in the same group. In such a situation, multiple instructions may be executed at a rate faster than expected. A simple apparatus for accomplishing this method is illustrated.
    Type: Application
    Filed: September 10, 2008
    Publication date: March 11, 2010
    Applicant: VNS PORTFOLIO LLC
    Inventor: Charles H. Moore
  • Patent number: 7673119
    Abstract: This invention is useful in a very long instruction word data processor that fetches a predetermined plural number of instructions each operation cycle. A predetermined one of these instructions is used as a special header. This special header has a unique encoding different from any normal instruction. When decoded this special header instructs decode hardware to decode this fetch packet in a special way. In one embodiment a bit field in the header signals the decode hardware whether to decode each instruction word normally or in an alternative way. The header may include extension opcode bits corresponding to each of the other instruction slots. In another embodiment another bit field signals whether to decode an instruction field as one normal length instruction or as two half-length instructions.
    Type: Grant
    Filed: May 8, 2006
    Date of Patent: March 2, 2010
    Assignee: Texas Instruments Incorporated
    Inventors: Michael D. Asal, Eric J. Stotzer, Todd T. Hahn
  • Patent number: 7664929
    Abstract: A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.
    Type: Grant
    Filed: September 17, 2003
    Date of Patent: February 16, 2010
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Carlos Antonio Alba Pinto, Ramanathan Sethuraman, Srinivasan Balakrishnan, Harm Johannes Antonius Maria Peters, Rafael Peset Llopis
  • Publication number: 20100031006
    Abstract: A method, processor and processing system provide management of per-thread pipeline resource allocation in a simultaneous multi-threaded (SMT) processor by counting indications of instruction completion for each of the threads. The indication may be the commit phase of the pipeline, which indicates results of the pipeline instruction execution are ready for write-back. The completion counts are used in a relative or absolute form to control the pipeline resource allocation. The decode or fetch rates of instructions for the threads can be controlled from the relative or absolute completion counts, providing control of scheduling instructions among the threads for execution by execution pipeline(s). Alternatively, or in combination, the thread priority registers in any thread priority management scheme can be controlled by comparison and/or scaling of the completion counts.
    Type: Application
    Filed: August 4, 2008
    Publication date: February 4, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Wael R. El-essawy, Lixin Zhang
  • Publication number: 20090287908
    Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.
    Type: Application
    Filed: May 19, 2008
    Publication date: November 19, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
  • Publication number: 20090276576
    Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution arc addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.
    Type: Application
    Filed: July 9, 2009
    Publication date: November 5, 2009
    Applicant: Altera Corporation
    Inventors: Gerald George Pechanek, Stamatis Vassiliadis
  • Publication number: 20090228687
    Abstract: A processor includes: an instruction buffer which holds a group of instructions that can be executed in parallel; an instruction decoding unit which decodes part or all of the group of instructions; and an instruction issuance control unit which detects whether or not a factor obstructing simultaneous execution of the group of instructions exists in the group of instructions and supplies the group of instructions to the instruction decoding unit by controlling the instruction buffer so that the instructions of the group of instructions are sequentially supplied when the factor exists and all the instructions of the group of instructions are simultaneously supplied when the factor does not exist.
    Type: Application
    Filed: March 9, 2006
    Publication date: September 10, 2009
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventor: Tetsu Hosoki
  • Patent number: 7581082
    Abstract: This invention employs a 16-bit instruction set that has a subset of the functionality of the 32-bit instruction set. In this invention 16-bit instructions and 32-bit instructions can coexist in the same fetch packet. In the prior architecture 32-bit instructions may not span a 32-bit boundary. The 16-bit instruction set is implemented with a special fetch packet header that signals whether the fetch packet includes some 16-bit instructions. This fetch packet header also has special bits that tell the hardware how to interpret a particular 16-bit instruction. These bits essentially allow overlays on the whole or part of the 16-bit instruction space. This makes the opcode space larger permitting more instructions than with a pure 16-bit opcode space.
    Type: Grant
    Filed: May 8, 2006
    Date of Patent: August 25, 2009
    Assignee: Texas Instruments Incorporated
    Inventors: Todd T. Hahn, Eric J. Stotzer, Michael D. Asal
  • Publication number: 20090210661
    Abstract: A method, system and computer program product for performing an implicit predicted return from a predicted subroutine are provided. The system includes a branch history table/branch target buffer (BHT/BTB) to hold branch information, including a target address of a predicted subroutine and a branch type. The system also includes instruction buffers, and instruction fetch controls to perform a method including fetching a branch instruction at a branch address and a return-point instruction. The method also includes receiving the target address and the branch type, and fetching a fixed number of instructions in response to the branch type. The method further includes referencing the return-point instruction within the instruction buffers such that the return-point instruction is available upon completing the fetching of the fixed number of instructions absent a re-fetch of the return-point instruction.
    Type: Application
    Filed: February 20, 2008
    Publication date: August 20, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Khary J. Alexander, James J. Bonanno, Brian R. Prasky, Anthony Saporito, Robert J. Sonnelitter, III, Charles F. Webb
  • Publication number: 20090204785
    Abstract: A computer. A processor pipeline alternately executes instructions coded for first and second different computer architectures or coded to implement first and second different processing conventions. A memory stores instructions for execution by the processor pipeline, the memory being divided into pages for management by a virtual memory manager, a single address space of the memory having first and second pages. A memory unit fetches instructions from the memory for execution by the pipeline, and fetches stored indicator elements associated with respective memory pages of the single address space from which the instructions are to be fetched. Each indicator element is designed to store an indication of which of two different computer architectures and/or execution conventions under which instruction data of the associated page are to be executed by the processor pipeline.
    Type: Application
    Filed: October 31, 2007
    Publication date: August 13, 2009
    Inventors: John S. Yates, JR., David L. Reese, Korbin S. Van Dyke, T. R. Ramesh, Paul H. Hohensee
  • Patent number: 7552314
    Abstract: The invention provides a method and apparatus for branch prediction in a processor. A fetch-block branch target buffer is used in an early stage of pipeline processing before the instruction is decoded, which stores information about a control transfer instruction for a “block” of instruction memory. The block of instruction memory is represented by a block entry in the fetch-block branch target buffer. The block entry represents one recorded control-transfer instruction (such as a branch instruction) and a set of sequentially preceding instructions, up to a fixed maximum length N. Indexing into the fetch-block branch target buffer yields an answer whether the block entry represents memory that contains a previously executed a control-transfer instruction, a length value representing the amount of memory that contains the instructions represented by the block, and an indicator for the type of control-transfer instruction that terminates the block, its target and outcome.
    Type: Grant
    Filed: October 17, 2005
    Date of Patent: June 23, 2009
    Assignee: STMicroelectronics, Inc.
    Inventors: Anatoly Gelman, Russell Schnapp
  • Patent number: 7546451
    Abstract: A system and method for enabling a programmable device to execute instructions without interruption. An instruction space for storing instructions from a host application is bifurcated to define a program segment and a hold segment. At startup, instructions are loaded into the hold segment, and the programmable device begins executing those instructions. While the hold segment instructions are executed, the program segment is loaded with instructions. Once the program segment is filled, control is shifted to it and instructions from this segment are executed by the programmable device. When the program segment has been executed, control is shifted back to the hold segment, and instructions are taken from it while the program segment is reloaded with a fresh set of instructions from the host application. Once the program segment is reloaded, control is redirected and execution of instructions from the program segment is continued.
    Type: Grant
    Filed: June 19, 2002
    Date of Patent: June 9, 2009
    Assignee: Finisar Corporation
    Inventors: Chris Cicchetti, Jean-François Dubé, Thomas Andrew Myers, An Huynh, Geoffrey T. Hibbert
  • Patent number: 7509483
    Abstract: A computing architecture and software techniques are described which modifies the basic sequential instruction fetching mechanism of a processor by separating a program's control flow from its functional execution flow. A compiled sequential HLL program's static control structures are analyzed and a separate program based on its own unique instructions is created that primarily generates addresses for the selection of functional execution instructions. The original program is now represented by an instruction fetch program and a set of function/logic execution instructions. This basic split allows multiple instruction addresses to be generated in parallel to access multiple instruction memories. These multiple instruction memories contain only the function/logic instructions of the program and no control structure operations such as branches or calls. All the original program's control instructions are split from the original program and used to create the instruction addressing program.
    Type: Grant
    Filed: February 22, 2007
    Date of Patent: March 24, 2009
    Assignee: Renesky Tap III, Limited Liability Company
    Inventor: Gerald George Pechanek
  • Patent number: 7500088
    Abstract: Methods and apparatus are provided for enhanced instruction handling in processing environments. If branch misprediction occurs during instruction processing, a branch history table may be updated based upon the number of instructions to be fetched. The branch history table may be updated in accordance with a first mode if at least two instructions are available, and may be updated in accordance with a second mode if less than two instructions are available. A compiler can assist the processing by aligning instructions for processing. The instructions can be aligned across multiple instruction fetch groups so that instructions are available for fetching and the branch history table is updated prior to performing a branching operation.
    Type: Grant
    Filed: July 8, 2004
    Date of Patent: March 3, 2009
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Masaki Osawa
  • Patent number: 7500066
    Abstract: A multiprocessing apparatus includes a memory and a plurality (M) of processors coupled to share the memory. Access to the memory is time-division multiplexed among the plurality of processors. In one embodiment, a selected processor retrieves M words of instruction forming K instructions during a given clock cycle. The selected processor executes M?K NOP instructions if K<M.
    Type: Grant
    Filed: April 30, 2005
    Date of Patent: March 3, 2009
    Assignee: Tellabs Operations, Inc.
    Inventors: Thayl D. Zohner, Lawrence D. Weizeorick, Keith M. Ellens
  • Patent number: 7487333
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: November 5, 2003
    Date of Patent: February 3, 2009
    Assignee: Seiko Epson Corporation
    Inventors: Le-Trong Nguyen, Derek J Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H Trang
  • Patent number: 7480783
    Abstract: Disclosed are systems for loading an unaligned word from a specified unaligned word address in a memory, the unaligned word comprising a plurality of indexed portions crossing a word boundry, a method of operating the system comprising: loading a first aligned word commencing at an aligned word address rounded from the specified unaligned word address; identifying an index representing the location of the unaligned word address relative to the aligned word address; loading a second aligned word commencing at an aligned word address rounded from a second unaligned word address; and combining indexed portions of the first and second alinged words using the indentified index to construct the unaligned word.
    Type: Grant
    Filed: August 19, 2004
    Date of Patent: January 20, 2009
    Assignees: STMicroelectronics Limited, Hewlett-Packard Company
    Inventors: Mark O. Homewood, Paolo Faraboschi
  • Publication number: 20080301409
    Abstract: The invention provides a processor for executing threads, each thread comprising a sequence of instructions, said instructions defining operations and at least some of those instructions defining a memory access operation. The processor comprises: a plurality of instruction buffers, each for holding at least one instruction of a thread associated with that buffer; an instruction issue stage for issuing instructions from the instruction buffers; and a memory access stage connected to a memory and arranged to receive instructions issued by the instruction issue stage. The memory access stage comprises: detecting logic adapted to detect whether a memory access operation is defined in each issued instruction; and instruction fetch logic adapted to instigate an instruction fetch to fetch an instruction of a thread when no memory access operation is detected.
    Type: Application
    Filed: May 30, 2007
    Publication date: December 4, 2008
    Inventor: Michael David MAY
  • Patent number: 7454597
    Abstract: A processor core and method of executing instructions, both of which utilizes schedules, are presented. Each of the schedules includes a sequence of instructions, an address of a first of the instructions in the schedule, an order vector of an original order of the instructions in the schedule, a rename map of registers for each register in the schedule, and a list of register names used in the schedule. The schedule exploits instruction-level parallelism in executing out-of-order instructions. The processor core includes a schedule cache that is configured to store schedules, a shared cache configured to store both I-side and D-side cache data, and an execution resource for requesting a schedule to be executed from the schedule cache. The processor core further includes a scheduler disposed between the schedule cache and the cache.
    Type: Grant
    Filed: January 2, 2007
    Date of Patent: November 18, 2008
    Assignee: International Business Machines Corporation
    Inventors: Krishnan K. Kailas, Ravi Nair, Sumedh W. Sathaye, Wolfram Sauer, John-David Wellman
  • Patent number: 7454654
    Abstract: A multiple parallel pipeline digital processing apparatus has the capability to substitute a second pipeline for a first in the event that a failure is detected in the first pipeline. Preferably, a redundant pipeline is shared by multiple primary pipelines. Preferably, the pipelines are located physically adjacent one another in an array. A pipeline failure causes data to be shifted one position within the array of pipelines, to by-pass the failing pipeline, so that each pipeline has only two sources of data, a primary and an alternate. Preferably, selection logic controlling the selection between a primary and alternate source of pipeline data is integrated with other pipeline operand selection logic.
    Type: Grant
    Filed: September 13, 2006
    Date of Patent: November 18, 2008
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Publication number: 20080276072
    Abstract: A method of executing a conditional instruction within a pipeline processor having a plurality of pipelines, the processor having a first condition code register associated with a first pipeline and a second condition code register associated with a second pipeline is disclosed. The method saves a most recent condition code value to either the first condition code register or the second condition code register. The method further sets an indicator indicating whether the second condition code register has the most recent condition code value and retrieves the most recent condition code value from either the first or second condition code register based on the indicator. The method uses the most recent condition code value to determine if the conditional instruction should be executed.
    Type: Application
    Filed: May 3, 2007
    Publication date: November 6, 2008
    Inventor: Bohuslav Rychlik
  • Publication number: 20080270758
    Abstract: A data processing apparatus is provided wherein processing circuitry executes multiple program threads including at least one high priority thread and at least one lower priority thread. Instructions required by the threads are retrieved from a cache memory hierarchy comprising multiple cache levels. The cache memory hierarchy includes a bypass path for omitting a predetermined level of the cache memory hierarchy when performing a lookup procedure for a required instruction and for bypassing said predetermined level of the cache memory hierarchy when returning said required instruction to said processing circuitry. The bypass path is used by default when the requested instruction is for a lower priority thread.
    Type: Application
    Filed: April 27, 2007
    Publication date: October 30, 2008
    Applicant: ARM Limited
    Inventors: Emre Ozer, Stuart David Biles
  • Publication number: 20080244230
    Abstract: A computation node according to various embodiments of the invention includes at least one input port capable of being coupled to at least one first other 5 computation node, a first store coupled to the input port(s) to store input data, a second store to receive and store instructions, an instruction wakeup unit to match the input data to the instructions, at least one execution unit to execute the instructions, using the input data to produce output data, and at least one output port capable of being coupled to at least one second other computation node. The node may also include a router to direct the output data from the output port(s) to the second other node. A system according to various embodiments of the invention includes and external instruction sequencer to fetch a group of instructions, and one or more interconnected, preselected computational nodes.
    Type: Application
    Filed: June 10, 2008
    Publication date: October 2, 2008
    Applicant: Board of Regents, The University of Texas System
    Inventors: Douglas C. Burger, Stephen W. Keckler, Karthikevan Sankaralingam, Ramadass Nagarajan
  • Patent number: 7424598
    Abstract: The data processor for executing, instructions realized by wired logic, by a pipeline system, includes a plurality of instruction registers, and arithmetic operation units of the same number. A plurality of instructions read in the instruction registers in one machine cycle at a time are processed in parallel by the plurality of arithmetic operation units.
    Type: Grant
    Filed: May 14, 2001
    Date of Patent: September 9, 2008
    Assignee: Renesas Technology Corp.
    Inventors: Takashi Hotta, Shigeya Tanaka, Hideo Maejima
  • Patent number: 7404042
    Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.
    Type: Grant
    Filed: May 18, 2005
    Date of Patent: July 22, 2008
    Assignee: QUALCOMM Incorporated
    Inventors: Brian Michael Stempel, Jeffrey Todd Bridges, Rodney Wayne Smith, Thomas Andrew Sartorius
  • Patent number: 7404048
    Abstract: An inter-cluster communication module using the memory access network is provided, including a plurality of clusters, a memory subsystem, a controller and a switch device. When some clusters issue a load instruction and some clusters issue a store instruction of an identical memory address concurrently, the controller controls the switch device which connects the clusters and the memory banks of the memory subsystem, so that the data item is transmitted from the cluster issuing the store instruction to the cluster issuing the load instruction through the switch device, thereby achieving data exchange between the clusters. Herein, the data item is selectively stored in the memory module depending on the address. Furthermore, the data item is also transmitted between the memory and the clusters over the switch device.
    Type: Grant
    Filed: October 11, 2005
    Date of Patent: July 22, 2008
    Assignee: Industrial Technology Research Institute
    Inventors: Tay-Jyi Lin, Pi-Chen Hsiao, Chih-Wei Liu, Chein-Wei Jen, I-Tao Liao, Po-Han Huang
  • Patent number: 7401207
    Abstract: Each instruction thread in a SMT processor is associated with a software assigned base input processing priority. Unless some predefined event or circumstance occurs with an instruction being processed or to be processed, the base input processing priorities of the respective threads are used to determine the interleave frequency between the threads according to some instruction interleave rule. However, upon the occurrence of some predefined event or circumstance in the processor related to a particular instruction thread, the base input processing priority of one or more instruction threads is adjusted to produce one more adjusted priority values. The instruction interleave rule is then enforced according to the adjusted priority value or values together with any base input processing priority values that have not been subject to adjustment.
    Type: Grant
    Filed: April 25, 2003
    Date of Patent: July 15, 2008
    Assignee: International Business Machines Corporation
    Inventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
  • Patent number: 7401208
    Abstract: A processor interleaves instructions according to a priority rule which determines the frequency with which instructions from each respective thread are selected and added to an interleaved stream of instructions to be processed in the data processor. The frequency with which each thread is selected according to the rule may be based on the priorities assigned to the instruction threads. A randomization is inserted into the interleaving process so that the selection of an instruction thread during any particular clock cycle is not based solely by the priority rule, but is also based in part on a random or pseudo random element. This randomization is inserted into the instruction thread selection process so as to vary the order in which instructions are selected from the various instruction threads while preserving the overall frequency of thread selection (i.e. how often threads are selected) set by the priority rule.
    Type: Grant
    Filed: April 25, 2003
    Date of Patent: July 15, 2008
    Assignee: International Business Machines Corporation
    Inventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
  • Patent number: 7398374
    Abstract: The invention provides a processor that processes bundles of instructions preferentially through clusters or execution units according to thread characteristics. The cluster architectures of the invention preferably include capability to process “multi-threaded” instructions. Selectively, the architecture either (a) processes singly-threaded instructions through a single cluster to avoid bypassing and to increase throughput, or (b) processes singly-threaded instructions through multiple processes to increase “per thread” performance. The architecture may be “configurable” to operate in one of two modes: in a “wide” mode of operation, the processor's internal clusters collectively process bundled instructions of one thread of a program at the same time; in a “throughput” mode of operation, those clusters independently process instruction bundles of separate program threads. Clusters are often implemented on a common die, with a core and register file per cluster.
    Type: Grant
    Filed: February 27, 2002
    Date of Patent: July 8, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Eric DeLano
  • Publication number: 20080162881
    Abstract: A method and apparatus for designating and handling irrevocable transaction is herein described. In response to detecting an irrevocable event, such as an I/O operation, a user-defined irrevocable designation, and a dynamic failure profile, a transaction is designated as irrevocable. In response to designating a transaction as irrevocable, Single Owner Read Locks (SORLs) are acquired for previous and subsequent reads in the irrevocably designated transaction to ensure the transaction is able to complete without modification to locations read from, while permitting remote resources to load from those locations to continue execution.
    Type: Application
    Filed: December 28, 2006
    Publication date: July 3, 2008
    Inventors: Adam Welc, Bratin Saha, Ali-Reza Adl-Tabatabai
  • Patent number: RE41012
    Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibility in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.
    Type: Grant
    Filed: June 3, 2004
    Date of Patent: November 24, 2009
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand