Of Multiple Instructions Simultaneously Patents (Class 712/206)
-
Patent number: 7992017Abstract: Methods and apparatuses for reducing step loads of processors are disclosed. Method embodiments comprise examining a number of instructions to be processed by a processor to determine the types of instructions that it has, calculating power consumption by in an execution period based on the types of instructions, and limiting the execution to a subset of instructions of the number to control the quantity of power for the execution period. Some embodiments may also create artificial activity to provide a minimum power floor for the processor. Apparatus embodiments comprise instruction type determination logic to determine types of instructions in an incoming instruction stream, a power calculator to calculate power consumption associated with processing a number of instructions in an execution period, and instruction throttling logic to control the power consumption by limiting the number of instructions to be processed in the execution period.Type: GrantFiled: September 11, 2007Date of Patent: August 2, 2011Assignee: Intel CorporationInventors: Kevin Safford, Rohit Bhatia, Chris Bostak, Richard Blumberg, Blaine Stackhouse, Steve Undy
-
Patent number: 7962723Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.Type: GrantFiled: July 9, 2009Date of Patent: June 14, 2011Inventors: Gerald George Pechanek, Stamatis Vassiliadis
-
Publication number: 20110107063Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.Type: ApplicationFiled: August 2, 2010Publication date: May 5, 2011Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
-
Patent number: 7934203Abstract: During program code conversion, such as in a dynamic binary translator, automatic code generation provides target code 21 executable by a target processor 13. Multiple instruction ports 610 disperse a group of instructions to functional units 620 of the processor 13. Disclosed is a mechanism of preparing an instruction group 606 using a plurality of pools 700 having a hierarchical structure 711-715. Each pool represents a different overlapping subset of the issue ports 610. Placing an instruction 600 into a particular pool 700 also reduces vacancies in any one or more subsidiary pools in the hierarchy. In a preferred embodiment, a counter value 702 is associated with each pool 700 to track vacancies. A valid instruction group 606 is formed by picking the placed instructions 600 from the pools 700. The instruction groups are generated accurately and automatically. Decoding errors and stalls are minimized or completely avoided.Type: GrantFiled: May 27, 2005Date of Patent: April 26, 2011Assignee: International Business Machines CorporationInventors: William O. Lovett, David Haikney, Matthew Evans
-
Patent number: 7913069Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. Instruction words (48) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In a particular example, the series of operations are included in a single instruction word (48). The micro-loop (100) in combination with the ability of the computers (12) to send instruction words (48) to a neighboring computer (12) provides a powerful tool for allowing a computer (12) to utilize the resources of a neighboring computer (12).Type: GrantFiled: May 26, 2006Date of Patent: March 22, 2011Assignee: VNS Portfolio LLCInventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
-
Publication number: 20110010527Abstract: A VLIW processor executes a very long instruction word containing a plurality of instructions, and executes a plurality of instruction streams at low cost. A processor executing a very long instruction word containing a plurality of instructions fetches concurrently the very long instruction words of up to M instruction streams, from N instruction caches including a plurality of memory banks to store the very long instruction words of the M instruction streams.Type: ApplicationFiled: February 3, 2009Publication date: January 13, 2011Inventor: Shohei Nomoto
-
Patent number: 7870367Abstract: Methods and apparatus are provided for implementing complex parallel instructions on a processor having a supported instruction set. Complex parallel instructions provide that an operation code, control logic, and input data is passed to a processor core. The operation code identifies the instruction used to process the input data and the control logic identifies the state of the instruction. An intervening instruction can be executed by a processor core even before execution of a complex parallel instruction is complete.Type: GrantFiled: June 17, 2003Date of Patent: January 11, 2011Assignee: Altera CorporationInventor: Chris Robinson
-
Publication number: 20110004788Abstract: A system is designed for processing instructions in real time during a session. This system comprises: a preloader for obtaining reference data relating to the instructions, the reference data indicating the current values of each specified resource account data file, and the preloader being arranged to read the reference data for a plurality of received instructions in parallel from a master database; an enriched instruction queue for queuing the instructions together with their respective preloaded reference data; an execution engine for determining sequentially whether each received instruction can be executed under the present values of the relevant resource account files and for each executable instruction to generate an updating command; and an updater, responsive to the updating command from the execution engine (for updating the master database with the results of each executable instruction, the operation of the plurality of updaters being decoupled from the operation of the execution engine.Type: ApplicationFiled: February 27, 2009Publication date: January 6, 2011Applicant: EUROCLEAR SA/NVInventors: Henri Petit, Jean-Francois Collin, Nicolas Marechal, Christine Deloge
-
Publication number: 20100299499Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.Type: ApplicationFiled: September 30, 2009Publication date: November 25, 2010Inventors: Robert T. Golla, Gregory F. Grohoski
-
Patent number: 7836276Abstract: A SIMD processor efficiently utilizes its hardware resources to achieve higher data processing throughput. The effective width of a SIMD processor is extended by clocking the instruction processing side of the SIMD processor at a fraction of the rate of the data processing side and by providing multiple execution pipelines, each with multiple data paths. As a result, higher data processing throughput is achieved while an instruction is fetched and issued once per clock. This configuration also allows a large group of threads to be clustered and executed together through the SIMD processor so that greater memory efficiency can be achieved for certain types of operations like texture memory accesses performed in connection with graphics processing.Type: GrantFiled: December 2, 2005Date of Patent: November 16, 2010Assignee: NVIDIA CorporationInventors: Brett W. Coon, John Erik Lindholm
-
Patent number: 7814487Abstract: A multithreaded processor device is disclosed and includes a first program thread and second program thread. The second program thread is execution linked to the first program thread in a lock step manner. As such, when the first program thread experiences a stall event, the second program thread is instructed to perform a no operation instruction in order to keep the second program thread execution linked to the first program thread. Also, the second program thread performs a no operation instruction during each clock cycle that the first program thread is stalled due to the stall event. When the first program thread performs a first successful operation after the stall event, the second program thread restarts normal execution.Type: GrantFiled: April 26, 2005Date of Patent: October 12, 2010Assignee: QUALCOMM IncorporatedInventors: Lucian Codrescu, Erich Plondke, Muhammad Ahmed, William C. Anderson
-
Patent number: 7810084Abstract: Computer-implemented methods, computer systems and computer program products are provided for parallel processing a plurality of data objects with a plurality of processors. As disclosed herein, the data objects to be assembled for further processing may be in bundles, the bundles obeying first predefined criteria, which is dynamically controlled by using a bundle specific master table. The methods and systems may generate pipelines of data objects by pre-selecting and grouping the data objects according to second predefined criteria by a first group of the plurality of processors, and create the bundles from each pipeline of the pre-selected data objects by a second group of the plurality of processors.Type: GrantFiled: June 1, 2006Date of Patent: October 5, 2010Assignee: SAP AGInventor: Karsten S. Egetoft
-
Publication number: 20100228954Abstract: The invention provides an embedded processor architecture comprising a plurality of virtual processing units that each execute processes or threads (collectively, “threads”). One or more execution units, which are shared by the processing units, execute instructions from the threads. An event delivery mechanism delivers events—such as, by way of non-limiting example, hardware interrupts, software-initiated signaling events (“software events”) and memory events—to respective threads without execution of instructions. Each event can, per aspects of the invention, be processed by the respective thread without execution of instructions outside that thread. The threads need not be constrained to execute on the same respective processing units during the lives of those threads—though, in some embodiments, they can be so constrained. The execution units execute instructions from the threads without needing to know what threads those instructions are from.Type: ApplicationFiled: February 4, 2010Publication date: September 9, 2010Applicants: SHARP KABUSHIKI KAISHA CORPORATIONInventors: Steven Frank, Shigeki Imai
-
Patent number: 7774581Abstract: An apparatus and a method are provided for a parallel processing very long instruction word (VLIW) computer. The apparatus includes: an index code generation unit sequentially generating an index code, which is associated with a number of no operation (NOP) instruction word between effective instruction words, with respect to each of instruction word groups to be executed in a VLIW computer; an instruction compression unit sequentially deleting the NOP instruction word which corresponds to the index code with respect to each of instruction word groups; and an instruction word conversion unit converting the effective instruction words to include the index code, the effective instruction words corresponding to the NOP instruction words.Type: GrantFiled: August 14, 2007Date of Patent: August 10, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: Chang-Woo Baek, Hong-Seok Kim, Hee Seok Kim, Jeongwook Kim
-
Publication number: 20100169577Abstract: In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration.Type: ApplicationFiled: December 17, 2009Publication date: July 1, 2010Applicant: Fujitsu LimitedInventor: Naohiro Kiyota
-
Patent number: 7720219Abstract: An apparatus and method for implementing a hash algorithm word buffer. In one embodiment, a cryptographic unit may include hash logic configured to compute a hash value of a data block according to a hash algorithm, where the hash algorithm includes a plurality of iterations, and where the data block includes a plurality of data words. The cryptographic unit may further include a word buffer comprising a plurality of data word positions and configured to store the data block during computing by the hash logic, where subsequent to the hash logic computing one of the iterations of the hash algorithm, the word buffer is further configured to linearly shift the data block by one or more data word positions according to the hash algorithm. The hash algorithm may be dynamically selectable from a plurality of hash algorithms.Type: GrantFiled: October 19, 2004Date of Patent: May 18, 2010Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
-
Patent number: 7698707Abstract: Identifying compatible threads in a Simultaneous Multithreading (SMT) processor environment is provided by calculating a performance metric, such as cycles per instruction (CPI), that occurs when two threads are running on the SMT processor. The CPI that is achieved when both threads were executing on the SMT processor is determined. If the CPI that was achieved is better than the compatibility threshold, then information indicating the compatibility is recorded. When a thread is about to complete, the scheduler looks at the run queue from which the completing thread belongs to dispatch another thread. The scheduler identifies a thread that is (1) compatible with the thread that is still running on the SMT processor (i.e., the thread that is not about to complete), and (2) ready to execute. The CPI data is continually updated so that threads that are compatible with one another are continually identified.Type: GrantFiled: February 25, 2008Date of Patent: April 13, 2010Assignee: International Business Machines CorporationInventors: Jos Manuel Accapadi, Andrew Dunshea, Dirk Michel, Mysore Sathyanarayana Srinivas
-
Patent number: 7694109Abstract: When fetching an instruction from a plurality of memory banks, a first pipeline cycle corresponding to selection of a memory bank and a second pipeline cycle corresponding to instruction readout are generated to carry out a pipeline process. Only the selected memory bank can be precharged to allow reduction of power consumption. Since the first and second pipeline cycles are effected in parallel, the throughput of the instruction memory can be improved.Type: GrantFiled: December 4, 2007Date of Patent: April 6, 2010Assignee: Renesas Technology Corp.Inventors: Toyohiko Yoshida, Akira Yamada, Hisakazu Sato
-
Patent number: 7689774Abstract: A system and method for improving the page crossing performance of a data prefetcher is presented. A prefetch engine tracks times at which a data stream terminates due to a page boundary. When a certain percentage of data streams terminate at page boundaries, the prefetch engine sets an aggressive profile flag. In turn, when the data prefetch engine receives a real address that corresponds to the beginning/end of a new page, and the aggressive profile flag is set, the prefetch engine uses an aggressive startup profile to generate and schedule prefetches on the assumption that the real address is highly likely to be the continuation of a long data stream. As a result, the system and method minimize latency when crossing real page boundaries when a program is predominately accessing long streams.Type: GrantFiled: April 6, 2007Date of Patent: March 30, 2010Assignee: International Business Machines CorporationInventors: Francis Patrick O'Connell, Jeffrey A. Stuecheli
-
Publication number: 20100064118Abstract: A method and apparatus for reducing latency in computer processors. The method incorporates a special instruction set that provides an indication of whether a particular instruction is capable of being executed nearly simultaneously with a preceding instruction in the same group. In such a situation, multiple instructions may be executed at a rate faster than expected. A simple apparatus for accomplishing this method is illustrated.Type: ApplicationFiled: September 10, 2008Publication date: March 11, 2010Applicant: VNS PORTFOLIO LLCInventor: Charles H. Moore
-
Patent number: 7673119Abstract: This invention is useful in a very long instruction word data processor that fetches a predetermined plural number of instructions each operation cycle. A predetermined one of these instructions is used as a special header. This special header has a unique encoding different from any normal instruction. When decoded this special header instructs decode hardware to decode this fetch packet in a special way. In one embodiment a bit field in the header signals the decode hardware whether to decode each instruction word normally or in an alternative way. The header may include extension opcode bits corresponding to each of the other instruction slots. In another embodiment another bit field signals whether to decode an instruction field as one normal length instruction or as two half-length instructions.Type: GrantFiled: May 8, 2006Date of Patent: March 2, 2010Assignee: Texas Instruments IncorporatedInventors: Michael D. Asal, Eric J. Stotzer, Todd T. Hahn
-
Patent number: 7664929Abstract: A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.Type: GrantFiled: September 17, 2003Date of Patent: February 16, 2010Assignee: Koninklijke Philips Electronics N.V.Inventors: Carlos Antonio Alba Pinto, Ramanathan Sethuraman, Srinivasan Balakrishnan, Harm Johannes Antonius Maria Peters, Rafael Peset Llopis
-
Publication number: 20100031006Abstract: A method, processor and processing system provide management of per-thread pipeline resource allocation in a simultaneous multi-threaded (SMT) processor by counting indications of instruction completion for each of the threads. The indication may be the commit phase of the pipeline, which indicates results of the pipeline instruction execution are ready for write-back. The completion counts are used in a relative or absolute form to control the pipeline resource allocation. The decode or fetch rates of instructions for the threads can be controlled from the relative or absolute completion counts, providing control of scheduling instructions among the threads for execution by execution pipeline(s). Alternatively, or in combination, the thread priority registers in any thread priority management scheme can be controlled by comparison and/or scaling of the completion counts.Type: ApplicationFiled: August 4, 2008Publication date: February 4, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Wael R. El-essawy, Lixin Zhang
-
Publication number: 20090287908Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.Type: ApplicationFiled: May 19, 2008Publication date: November 19, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
-
Publication number: 20090276576Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution arc addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.Type: ApplicationFiled: July 9, 2009Publication date: November 5, 2009Applicant: Altera CorporationInventors: Gerald George Pechanek, Stamatis Vassiliadis
-
Publication number: 20090228687Abstract: A processor includes: an instruction buffer which holds a group of instructions that can be executed in parallel; an instruction decoding unit which decodes part or all of the group of instructions; and an instruction issuance control unit which detects whether or not a factor obstructing simultaneous execution of the group of instructions exists in the group of instructions and supplies the group of instructions to the instruction decoding unit by controlling the instruction buffer so that the instructions of the group of instructions are sequentially supplied when the factor exists and all the instructions of the group of instructions are simultaneously supplied when the factor does not exist.Type: ApplicationFiled: March 9, 2006Publication date: September 10, 2009Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.Inventor: Tetsu Hosoki
-
Patent number: 7581082Abstract: This invention employs a 16-bit instruction set that has a subset of the functionality of the 32-bit instruction set. In this invention 16-bit instructions and 32-bit instructions can coexist in the same fetch packet. In the prior architecture 32-bit instructions may not span a 32-bit boundary. The 16-bit instruction set is implemented with a special fetch packet header that signals whether the fetch packet includes some 16-bit instructions. This fetch packet header also has special bits that tell the hardware how to interpret a particular 16-bit instruction. These bits essentially allow overlays on the whole or part of the 16-bit instruction space. This makes the opcode space larger permitting more instructions than with a pure 16-bit opcode space.Type: GrantFiled: May 8, 2006Date of Patent: August 25, 2009Assignee: Texas Instruments IncorporatedInventors: Todd T. Hahn, Eric J. Stotzer, Michael D. Asal
-
Publication number: 20090210661Abstract: A method, system and computer program product for performing an implicit predicted return from a predicted subroutine are provided. The system includes a branch history table/branch target buffer (BHT/BTB) to hold branch information, including a target address of a predicted subroutine and a branch type. The system also includes instruction buffers, and instruction fetch controls to perform a method including fetching a branch instruction at a branch address and a return-point instruction. The method also includes receiving the target address and the branch type, and fetching a fixed number of instructions in response to the branch type. The method further includes referencing the return-point instruction within the instruction buffers such that the return-point instruction is available upon completing the fetching of the fixed number of instructions absent a re-fetch of the return-point instruction.Type: ApplicationFiled: February 20, 2008Publication date: August 20, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Khary J. Alexander, James J. Bonanno, Brian R. Prasky, Anthony Saporito, Robert J. Sonnelitter, III, Charles F. Webb
-
Publication number: 20090204785Abstract: A computer. A processor pipeline alternately executes instructions coded for first and second different computer architectures or coded to implement first and second different processing conventions. A memory stores instructions for execution by the processor pipeline, the memory being divided into pages for management by a virtual memory manager, a single address space of the memory having first and second pages. A memory unit fetches instructions from the memory for execution by the pipeline, and fetches stored indicator elements associated with respective memory pages of the single address space from which the instructions are to be fetched. Each indicator element is designed to store an indication of which of two different computer architectures and/or execution conventions under which instruction data of the associated page are to be executed by the processor pipeline.Type: ApplicationFiled: October 31, 2007Publication date: August 13, 2009Inventors: John S. Yates, JR., David L. Reese, Korbin S. Van Dyke, T. R. Ramesh, Paul H. Hohensee
-
Patent number: 7552314Abstract: The invention provides a method and apparatus for branch prediction in a processor. A fetch-block branch target buffer is used in an early stage of pipeline processing before the instruction is decoded, which stores information about a control transfer instruction for a “block” of instruction memory. The block of instruction memory is represented by a block entry in the fetch-block branch target buffer. The block entry represents one recorded control-transfer instruction (such as a branch instruction) and a set of sequentially preceding instructions, up to a fixed maximum length N. Indexing into the fetch-block branch target buffer yields an answer whether the block entry represents memory that contains a previously executed a control-transfer instruction, a length value representing the amount of memory that contains the instructions represented by the block, and an indicator for the type of control-transfer instruction that terminates the block, its target and outcome.Type: GrantFiled: October 17, 2005Date of Patent: June 23, 2009Assignee: STMicroelectronics, Inc.Inventors: Anatoly Gelman, Russell Schnapp
-
Patent number: 7546451Abstract: A system and method for enabling a programmable device to execute instructions without interruption. An instruction space for storing instructions from a host application is bifurcated to define a program segment and a hold segment. At startup, instructions are loaded into the hold segment, and the programmable device begins executing those instructions. While the hold segment instructions are executed, the program segment is loaded with instructions. Once the program segment is filled, control is shifted to it and instructions from this segment are executed by the programmable device. When the program segment has been executed, control is shifted back to the hold segment, and instructions are taken from it while the program segment is reloaded with a fresh set of instructions from the host application. Once the program segment is reloaded, control is redirected and execution of instructions from the program segment is continued.Type: GrantFiled: June 19, 2002Date of Patent: June 9, 2009Assignee: Finisar CorporationInventors: Chris Cicchetti, Jean-François Dubé, Thomas Andrew Myers, An Huynh, Geoffrey T. Hibbert
-
Patent number: 7509483Abstract: A computing architecture and software techniques are described which modifies the basic sequential instruction fetching mechanism of a processor by separating a program's control flow from its functional execution flow. A compiled sequential HLL program's static control structures are analyzed and a separate program based on its own unique instructions is created that primarily generates addresses for the selection of functional execution instructions. The original program is now represented by an instruction fetch program and a set of function/logic execution instructions. This basic split allows multiple instruction addresses to be generated in parallel to access multiple instruction memories. These multiple instruction memories contain only the function/logic instructions of the program and no control structure operations such as branches or calls. All the original program's control instructions are split from the original program and used to create the instruction addressing program.Type: GrantFiled: February 22, 2007Date of Patent: March 24, 2009Assignee: Renesky Tap III, Limited Liability CompanyInventor: Gerald George Pechanek
-
Patent number: 7500088Abstract: Methods and apparatus are provided for enhanced instruction handling in processing environments. If branch misprediction occurs during instruction processing, a branch history table may be updated based upon the number of instructions to be fetched. The branch history table may be updated in accordance with a first mode if at least two instructions are available, and may be updated in accordance with a second mode if less than two instructions are available. A compiler can assist the processing by aligning instructions for processing. The instructions can be aligned across multiple instruction fetch groups so that instructions are available for fetching and the branch history table is updated prior to performing a branching operation.Type: GrantFiled: July 8, 2004Date of Patent: March 3, 2009Assignee: Sony Computer Entertainment Inc.Inventor: Masaki Osawa
-
Patent number: 7500066Abstract: A multiprocessing apparatus includes a memory and a plurality (M) of processors coupled to share the memory. Access to the memory is time-division multiplexed among the plurality of processors. In one embodiment, a selected processor retrieves M words of instruction forming K instructions during a given clock cycle. The selected processor executes M?K NOP instructions if K<M.Type: GrantFiled: April 30, 2005Date of Patent: March 3, 2009Assignee: Tellabs Operations, Inc.Inventors: Thayl D. Zohner, Lawrence D. Weizeorick, Keith M. Ellens
-
Patent number: 7487333Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.Type: GrantFiled: November 5, 2003Date of Patent: February 3, 2009Assignee: Seiko Epson CorporationInventors: Le-Trong Nguyen, Derek J Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H Trang
-
Patent number: 7480783Abstract: Disclosed are systems for loading an unaligned word from a specified unaligned word address in a memory, the unaligned word comprising a plurality of indexed portions crossing a word boundry, a method of operating the system comprising: loading a first aligned word commencing at an aligned word address rounded from the specified unaligned word address; identifying an index representing the location of the unaligned word address relative to the aligned word address; loading a second aligned word commencing at an aligned word address rounded from a second unaligned word address; and combining indexed portions of the first and second alinged words using the indentified index to construct the unaligned word.Type: GrantFiled: August 19, 2004Date of Patent: January 20, 2009Assignees: STMicroelectronics Limited, Hewlett-Packard CompanyInventors: Mark O. Homewood, Paolo Faraboschi
-
Publication number: 20080301409Abstract: The invention provides a processor for executing threads, each thread comprising a sequence of instructions, said instructions defining operations and at least some of those instructions defining a memory access operation. The processor comprises: a plurality of instruction buffers, each for holding at least one instruction of a thread associated with that buffer; an instruction issue stage for issuing instructions from the instruction buffers; and a memory access stage connected to a memory and arranged to receive instructions issued by the instruction issue stage. The memory access stage comprises: detecting logic adapted to detect whether a memory access operation is defined in each issued instruction; and instruction fetch logic adapted to instigate an instruction fetch to fetch an instruction of a thread when no memory access operation is detected.Type: ApplicationFiled: May 30, 2007Publication date: December 4, 2008Inventor: Michael David MAY
-
Patent number: 7454597Abstract: A processor core and method of executing instructions, both of which utilizes schedules, are presented. Each of the schedules includes a sequence of instructions, an address of a first of the instructions in the schedule, an order vector of an original order of the instructions in the schedule, a rename map of registers for each register in the schedule, and a list of register names used in the schedule. The schedule exploits instruction-level parallelism in executing out-of-order instructions. The processor core includes a schedule cache that is configured to store schedules, a shared cache configured to store both I-side and D-side cache data, and an execution resource for requesting a schedule to be executed from the schedule cache. The processor core further includes a scheduler disposed between the schedule cache and the cache.Type: GrantFiled: January 2, 2007Date of Patent: November 18, 2008Assignee: International Business Machines CorporationInventors: Krishnan K. Kailas, Ravi Nair, Sumedh W. Sathaye, Wolfram Sauer, John-David Wellman
-
Patent number: 7454654Abstract: A multiple parallel pipeline digital processing apparatus has the capability to substitute a second pipeline for a first in the event that a failure is detected in the first pipeline. Preferably, a redundant pipeline is shared by multiple primary pipelines. Preferably, the pipelines are located physically adjacent one another in an array. A pipeline failure causes data to be shifted one position within the array of pipelines, to by-pass the failing pipeline, so that each pipeline has only two sources of data, a primary and an alternate. Preferably, selection logic controlling the selection between a primary and alternate source of pipeline data is integrated with other pipeline operand selection logic.Type: GrantFiled: September 13, 2006Date of Patent: November 18, 2008Assignee: International Business Machines CorporationInventor: David A. Luick
-
Publication number: 20080276072Abstract: A method of executing a conditional instruction within a pipeline processor having a plurality of pipelines, the processor having a first condition code register associated with a first pipeline and a second condition code register associated with a second pipeline is disclosed. The method saves a most recent condition code value to either the first condition code register or the second condition code register. The method further sets an indicator indicating whether the second condition code register has the most recent condition code value and retrieves the most recent condition code value from either the first or second condition code register based on the indicator. The method uses the most recent condition code value to determine if the conditional instruction should be executed.Type: ApplicationFiled: May 3, 2007Publication date: November 6, 2008Inventor: Bohuslav Rychlik
-
Publication number: 20080270758Abstract: A data processing apparatus is provided wherein processing circuitry executes multiple program threads including at least one high priority thread and at least one lower priority thread. Instructions required by the threads are retrieved from a cache memory hierarchy comprising multiple cache levels. The cache memory hierarchy includes a bypass path for omitting a predetermined level of the cache memory hierarchy when performing a lookup procedure for a required instruction and for bypassing said predetermined level of the cache memory hierarchy when returning said required instruction to said processing circuitry. The bypass path is used by default when the requested instruction is for a lower priority thread.Type: ApplicationFiled: April 27, 2007Publication date: October 30, 2008Applicant: ARM LimitedInventors: Emre Ozer, Stuart David Biles
-
Publication number: 20080244230Abstract: A computation node according to various embodiments of the invention includes at least one input port capable of being coupled to at least one first other 5 computation node, a first store coupled to the input port(s) to store input data, a second store to receive and store instructions, an instruction wakeup unit to match the input data to the instructions, at least one execution unit to execute the instructions, using the input data to produce output data, and at least one output port capable of being coupled to at least one second other computation node. The node may also include a router to direct the output data from the output port(s) to the second other node. A system according to various embodiments of the invention includes and external instruction sequencer to fetch a group of instructions, and one or more interconnected, preselected computational nodes.Type: ApplicationFiled: June 10, 2008Publication date: October 2, 2008Applicant: Board of Regents, The University of Texas SystemInventors: Douglas C. Burger, Stephen W. Keckler, Karthikevan Sankaralingam, Ramadass Nagarajan
-
Patent number: 7424598Abstract: The data processor for executing, instructions realized by wired logic, by a pipeline system, includes a plurality of instruction registers, and arithmetic operation units of the same number. A plurality of instructions read in the instruction registers in one machine cycle at a time are processed in parallel by the plurality of arithmetic operation units.Type: GrantFiled: May 14, 2001Date of Patent: September 9, 2008Assignee: Renesas Technology Corp.Inventors: Takashi Hotta, Shigeya Tanaka, Hideo Maejima
-
Patent number: 7404042Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.Type: GrantFiled: May 18, 2005Date of Patent: July 22, 2008Assignee: QUALCOMM IncorporatedInventors: Brian Michael Stempel, Jeffrey Todd Bridges, Rodney Wayne Smith, Thomas Andrew Sartorius
-
Patent number: 7404048Abstract: An inter-cluster communication module using the memory access network is provided, including a plurality of clusters, a memory subsystem, a controller and a switch device. When some clusters issue a load instruction and some clusters issue a store instruction of an identical memory address concurrently, the controller controls the switch device which connects the clusters and the memory banks of the memory subsystem, so that the data item is transmitted from the cluster issuing the store instruction to the cluster issuing the load instruction through the switch device, thereby achieving data exchange between the clusters. Herein, the data item is selectively stored in the memory module depending on the address. Furthermore, the data item is also transmitted between the memory and the clusters over the switch device.Type: GrantFiled: October 11, 2005Date of Patent: July 22, 2008Assignee: Industrial Technology Research InstituteInventors: Tay-Jyi Lin, Pi-Chen Hsiao, Chih-Wei Liu, Chein-Wei Jen, I-Tao Liao, Po-Han Huang
-
Patent number: 7401207Abstract: Each instruction thread in a SMT processor is associated with a software assigned base input processing priority. Unless some predefined event or circumstance occurs with an instruction being processed or to be processed, the base input processing priorities of the respective threads are used to determine the interleave frequency between the threads according to some instruction interleave rule. However, upon the occurrence of some predefined event or circumstance in the processor related to a particular instruction thread, the base input processing priority of one or more instruction threads is adjusted to produce one more adjusted priority values. The instruction interleave rule is then enforced according to the adjusted priority value or values together with any base input processing priority values that have not been subject to adjustment.Type: GrantFiled: April 25, 2003Date of Patent: July 15, 2008Assignee: International Business Machines CorporationInventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
-
Patent number: 7401208Abstract: A processor interleaves instructions according to a priority rule which determines the frequency with which instructions from each respective thread are selected and added to an interleaved stream of instructions to be processed in the data processor. The frequency with which each thread is selected according to the rule may be based on the priorities assigned to the instruction threads. A randomization is inserted into the interleaving process so that the selection of an instruction thread during any particular clock cycle is not based solely by the priority rule, but is also based in part on a random or pseudo random element. This randomization is inserted into the instruction thread selection process so as to vary the order in which instructions are selected from the various instruction threads while preserving the overall frequency of thread selection (i.e. how often threads are selected) set by the priority rule.Type: GrantFiled: April 25, 2003Date of Patent: July 15, 2008Assignee: International Business Machines CorporationInventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
-
Patent number: 7398374Abstract: The invention provides a processor that processes bundles of instructions preferentially through clusters or execution units according to thread characteristics. The cluster architectures of the invention preferably include capability to process “multi-threaded” instructions. Selectively, the architecture either (a) processes singly-threaded instructions through a single cluster to avoid bypassing and to increase throughput, or (b) processes singly-threaded instructions through multiple processes to increase “per thread” performance. The architecture may be “configurable” to operate in one of two modes: in a “wide” mode of operation, the processor's internal clusters collectively process bundled instructions of one thread of a program at the same time; in a “throughput” mode of operation, those clusters independently process instruction bundles of separate program threads. Clusters are often implemented on a common die, with a core and register file per cluster.Type: GrantFiled: February 27, 2002Date of Patent: July 8, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventor: Eric DeLano
-
Publication number: 20080162881Abstract: A method and apparatus for designating and handling irrevocable transaction is herein described. In response to detecting an irrevocable event, such as an I/O operation, a user-defined irrevocable designation, and a dynamic failure profile, a transaction is designated as irrevocable. In response to designating a transaction as irrevocable, Single Owner Read Locks (SORLs) are acquired for previous and subsequent reads in the irrevocably designated transaction to ensure the transaction is able to complete without modification to locations read from, while permitting remote resources to load from those locations to continue execution.Type: ApplicationFiled: December 28, 2006Publication date: July 3, 2008Inventors: Adam Welc, Bratin Saha, Ali-Reza Adl-Tabatabai
-
Patent number: RE41012Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibility in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.Type: GrantFiled: June 3, 2004Date of Patent: November 24, 2009Assignee: Altera CorporationInventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand