Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
  • Patent number: 6598152
    Abstract: Enables a processor to quickly recover reliable use of a multi-cycle index used in a branch prediction mechanism for certain types of flush events occurring in the processor pipeline, whether the flush event occurs for a non-branch instruction or for a branch instruction contained in the same dispatch group. A GHV (global history vector) value is used in the generation of a multi-cycle index required for locating a prediction in a GBHT (global branch history table) for the instruction associated with the GHV value. The GHV value is captured in a BIQ (branch information queue) element representing each branch instruction selected for execution of a program. The BIQ element also captures an associated GHV count when the GHV value is captured.
    Type: Grant
    Filed: November 8, 1999
    Date of Patent: July 22, 2003
    Assignee: International Business Machines Corporation
    Inventor: Balaram Sinharoy
  • Publication number: 20030120901
    Abstract: A multithreaded processor includes an instruction decoder for decoding retrieved instructions to determine an instruction type for each of the retrieved instructions, an integer unit coupled to the instruction decoder for processing integer type instructions, and a vector unit coupled to the instruction decoder for processing vector type instructions. A reduction unit is preferably associated with the vector unit and receives parallel data elements processed in the vector unit. The reduction unit generates a serial output from the parallel data elements. The processor may be configured to execute at least control code, digital signal processor (DSP) code, Java code and network processing code, and is therefore well-suited for use in a convergence device. The processor is preferably configured to utilize token triggered threading in conjunction with instruction pipelining.
    Type: Application
    Filed: October 11, 2002
    Publication date: June 26, 2003
    Inventors: Erdem Hokenek, Mayan Moudgill, C. John Glossner
  • Publication number: 20030120900
    Abstract: A program memory controller unit includes apparatus for the execution of a software pipeline procedure in response to a predetermined instruction. The apparatus provides a prolog, a kernel, and an epilog state for the execution of the software pipeline procedure. In addition, in response to a predetermined condition, the software pipeline loop procedure can be terminated early. A second software pipeline loop procedure can be initiated prior to the completion of first software pipeline loop procedure.
    Type: Application
    Filed: August 21, 2002
    Publication date: June 26, 2003
    Inventors: Eric J. Stotzer, Steven D. Krueger, Timothy Anderson
  • Patent number: 6578138
    Abstract: An exemplary processor or trace cache according to the present invention includes a cache unit, which includes a data array that stores traces. The processor or trace cache also includes a control block connected to the cache unit, the control block unrolling loops when building the traces. In one exemplary method of unrolling loops, the processor or trace cache unrolls loops until the trace is a minimum length. In another exemplary embodiment, the processor or trace cache unrolls only those loops in which the head of the loop is the trace head. In a third exemplary embodiment, the processor or trace cache unrolls loops based on a predicted number of iterations of the loop when executed.
    Type: Grant
    Filed: December 30, 1999
    Date of Patent: June 10, 2003
    Assignee: Intel Corporation
    Inventors: Alan Beecher Kyker, Robert Franklin Krick
  • Patent number: 6571332
    Abstract: A method and apparatus for combined transaction reordering and buffer management. The apparatus may include a buffer, a first generator circuit and a second generator circuit. The buffer is configured to store memory transaction responses received from a memory controller in a plurality of addressable locations. The first generator circuit is configured to generate a first memory transaction request encoded with a first tag corresponding to an address in the buffer in response to receiving a first memory request. The second generator circuit is configured to generate a second tag using the size of said first memory request added to the first tag. The first generator circuit may be further configured to generate a second memory transaction request encoded with the second tag corresponding to a second address in the buffer in response to receiving a second memory request successive to the first memory request.
    Type: Grant
    Filed: April 11, 2000
    Date of Patent: May 27, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Paul C. Miranda, Larry D. Hewitt, Stephen C. Ennis
  • Patent number: 6567831
    Abstract: A method optimizes function evaluations performed by of a VLIW processor through enhanced parallelism by evaluating the function by table approximation using decomposition into a Taylor series.
    Type: Grant
    Filed: April 20, 2000
    Date of Patent: May 20, 2003
    Assignee: Elbrus International Limited
    Inventor: Vadim E. Loginov
  • Patent number: 6567895
    Abstract: A microprocessor and method for operating this microprocessor are disclosed. The microprocessor contains multiple execution units that receive instructions from an instruction pipeline. A loop cache memory is connected in communication with the instruction pipeline, such that it may both store instructions from the instruction pipeline and issue instructions to be executed by the execution units. A loop cache controller controls instruction flow. In operation, the loop cache controller is preferably signaled by a software instruction to begin building a software pipelined loop of a specified size into the loop cache memory. The loop cache controller then begins accumulating instructions from the instruction pipeline into the loop cache memory; these instructions may also remain in the pipeline for execution. When the kernel of the software pipelined loop is built into the loop cache memory, the controller preferably stalls the instruction pipeline and executes the loop using the cached instructions.
    Type: Grant
    Filed: May 14, 2001
    Date of Patent: May 20, 2003
    Assignee: Texas Instruments Incorporated
    Inventor: Richard H. Scales
  • Publication number: 20030093655
    Abstract: An embedded processor system having a single-chip embedded microprocessor with analog and digital electrical interfaces to external systems. A novel processor core uses pipelined execution of multiple independent or dependent concurrent threads, together with supervisory control for monitoring and controlling the processor thread state and access to other components. The pipeline enables simultaneous execution of multiple threads by selectively avoiding memory or peripheral access conflicts through the types of pipeline stages chosen and the use of dual and tri-port memory techniques. The single processor core executes one or multiple instruction streams on multiple data streams in various combinations under the control of single or multiple threads.
    Type: Application
    Filed: April 26, 2001
    Publication date: May 15, 2003
    Applicant: Eleven Engineering Inc.
    Inventors: Jason Gosior, Colin Broughton, Phillip Jacobsen, John Sobota
  • Patent number: 6564303
    Abstract: The present invention relates to a data processing system comprising a processor provided with two memory access units operating in parallel; two separate memories respectively associated with the two access units; and circuitry for, when the address of a datum to be written into a memory is in a predetermined address range, writing the datum into both memories at the same time at the same address.
    Type: Grant
    Filed: December 21, 1998
    Date of Patent: May 13, 2003
    Assignee: STMicroelectronics S.A.
    Inventors: Didier Fuin, Joël Curtet, Fabrice Devaux
  • Patent number: 6560775
    Abstract: A method and system for preparing branch instruction of a computer program, for compiling and execution in a computer system, in which each transfer instruction is split into two instructions: a control transfer preparation instruction and a control transfer instruction, wherein the control transfer preparation instruction contains the transfer address and is placed by the compiler several instructions ahead of the control transfer instruction, so that the number of clock cycles in the pipeline between transfer condition generation and transfer itself would be reduced.
    Type: Grant
    Filed: December 24, 1998
    Date of Patent: May 6, 2003
    Assignee: Elbrus International Limited
    Inventors: Alexander M. Artymov, Boris A. Babaian, Feodor A. Gruzdov, Alexey P. Lizorkin, Yuli K. Sakhin, Evgeny Z. Stolyarsky
  • Publication number: 20030079112
    Abstract: A computing system as described in which individual instructions are executable in parallel by processing pipelines, and instructions to be executed in parallel by different pipelines are supplied to the pipelines simultaneously. The system includes storage for storing an arbitrary number of the instructions to be executed. The instructions to be executed are tagged with pipeline identification tags indicative of the pipeline to which they should be dispatched. The pipeline identification tags are supplied to a system which controls a crossbar switch, enabling the tags to be used to control the switch and supply the appropriate instructions simultaneously to the differing pipelines.
    Type: Application
    Filed: July 3, 2002
    Publication date: April 24, 2003
    Applicant: Intergraph Corporation
    Inventors: Howard G. Sachs, Siamak Arya
  • Publication number: 20030074543
    Abstract: A processing engine 10 for executing instructions in parallel comprises an instruction buffer 600 for holding at least two instructions, with the first instruction 602 in a first position and the second instruction 604 in a second position. A first decoder 612 provides decoding of the first instruction and generates first control signals. The first control signals include first resource control signals, first address generation control signals, and a first validity signal indicative of the validity of the first instruction in the first position. A second decoder 614 provides decoding of the second instruction and generates second control signals. The second control signals include second resource control signals, second address generation control signals, and a second validity signal indicative of the validity of the second instruction in the second position.
    Type: Application
    Filed: October 1, 1999
    Publication date: April 17, 2003
    Inventors: Karim Djafarian, Gilbert Laurenti, Vincent Gillett
  • Patent number: 6550000
    Abstract: In a processor, a plurality of instructions in a program are executed in parallel using a plurality of functional units within the processor. Determination of which functional unit is to be used to execute each instruction is made when the program is produced prior to execution. The processor has the priority as to access of the PSW among the plurality of functional units predetermined when the contents of a PSW (Program Status Word) storage register in the processor are to be accessed simultaneously by a plurality of instructions during parallel execution of a plurality of instructions. Execution control can be provided of a program that reliably avoids a PSW access a conflict by a plurality of instructions during parallel execution of a plurality of instructions using a plurality of functional units in the processor.
    Type: Grant
    Filed: July 28, 1999
    Date of Patent: April 15, 2003
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventors: Isao Minematsu, Akira Yamada
  • Publication number: 20030070060
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Application
    Filed: October 30, 2002
    Publication date: April 10, 2003
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20030065905
    Abstract: A parallel computation processor being capable of high-speed loop operation. When instruction decoders decode the VLOOP instruction, which triggers loop operation, an instruction buffer starts storing normal instructions. The instruction buffer dispatches a VLIW instruction composed of n pieces of normal instructions to execution units each time n pieces of instructions are stored therein. The execution units concurrently execute the instructions. After all instructions comprised in a loop have been stored in the buffer and once dispatched as VLIW instructions to be executed, the loop is executed repeatedly.
    Type: Application
    Filed: September 26, 2002
    Publication date: April 3, 2003
    Applicant: NEC CORPORATION
    Inventor: Daiji Ishii
  • Patent number: 6542983
    Abstract: In a computer system having a central processing unit (CPU) execution pipeline and a floating point unit (FPU) execution pipeline, the CPU execution pipeline including a CPU decoder pipestage and the FPU execution pipeline including an FPU decoder pipestage, the method including the steps of, (a) sending a first instruction to the CPU decoder pipestage, (b) sending the first instruction to the FPU decoder pipestage, (c) generating a signal indicating that the first instruction has been accepted by the CPU decoder pipestage, (d) generating a signal indicating that the first instruction has been accepted by the FPU decoder pipestage, (e) sending a second instruction to the CPU decoder pipestage in response to step (d), and (f) sending a second instruction to the FPU decoder pipestage in response to step (c). A corresponding apparatus is also provided.
    Type: Grant
    Filed: October 1, 1999
    Date of Patent: April 1, 2003
    Assignee: Hitachi, Ltd.
    Inventors: Margaret Rose Gearty, Chih-Jui Peng
  • Patent number: 6539469
    Abstract: A processor comprises an instruction cache that stores a cache line of instructions and an execution engine for executing the instructions, along with a buffer to store a plurality of entries. A first logic circuit divides the cache line into instruction bundles, each of which gets written into an entry of the buffer. A second logic circuit reads out a number of consecutive instruction bundles from the buffer for dispersal to the execution engine to optimize speculative fetching and maximizing instruction supply to the execution resources of the processor.
    Type: Grant
    Filed: October 12, 1999
    Date of Patent: March 25, 2003
    Assignee: Intel Corporation
    Inventor: Jesse Pan
  • Patent number: 6539266
    Abstract: A computer system for detecting alteration of programs in which a plurality of check program portions are read from a storage medium which carries computer programs including the check program portions. Each check program portion is executed to detect alteration of at least one other check program portion.
    Type: Grant
    Filed: April 3, 2000
    Date of Patent: March 25, 2003
    Assignees: Konami Co., Ltd., Konami Computer Entertainment Tokyo Co., Ltd.
    Inventor: Hirotaka Ishikawa
  • Patent number: 6523107
    Abstract: A circuit is provided to provide instruction streams to a processing device: embodiments of the circuit are appropriate for use with RISC CPUs, whereas other embodiments are useable with other processing devices, such as small processing devices used in a field programmable array. The circuit receives an external instruction stream which provides a first set of instruction values, and has a memory which contains a second set of instruction values. Two or more outputs provide instruction streams to the processing device. The circuit has a control input in the form of a mask which causes a selection means to allocate bits from the first and second sets of instruction values to different instruction streams to the processing device.
    Type: Grant
    Filed: December 11, 1998
    Date of Patent: February 18, 2003
    Assignee: Elixent Limited
    Inventors: Anthony Stansfield, Alan David Marshall, Jean Vuillemin
  • Publication number: 20030033505
    Abstract: A computing system has first and second instruction storing circuits, each instruction storing circuit storing N instructions for parallel output. An instruction dispatch circuit, coupled to the first instruction storing circuit, dispatches L instructions stored in the first instruction storing circuit, wherein L is less than or equal to N. An instruction loading circuit, coupled to the instruction dispatch circuit and to the first and second instruction storing circuits, loads L instructions from the second instruction storing circuit into the first instruction storing circuit after the L instructions are dispatched from the first instruction storing circuit and before further instructions are dispatched from the first instruction storing circuit.
    Type: Application
    Filed: May 24, 2001
    Publication date: February 13, 2003
    Inventors: Chandra Joshi, Paul Rodman, Peter Hsu, Monica R. Nofal
  • Publication number: 20030014612
    Abstract: A processor improves throughput efficiency and exploits increased parallelism by introducing multithreading to an existing and mature processor core. The multithreading is implemented in two steps including vertical multithreading and horizontal multithreading. The processor core is retrofitted to support multiple machine states. System embodiments that exploit retrofitting of an existing processor core advantageously leverage hundreds of man-years of hardware and software development by extending the lifetime of a proven processor pipeline generation. A processor implements N-bit flip-flop global substitution. To implement multiple machine states, the processor converts 1-bit flip-flops in storage cells of the stalling vertical thread to an N-bit global flip-flop where N is the number of vertical threads.
    Type: Application
    Filed: May 11, 1999
    Publication date: January 16, 2003
    Inventors: WILLIAM N. JOY, MARC TREMBLAY, GARY LAUTERBACH, JOSEPH I. CHAMDANI
  • Publication number: 20020199084
    Abstract: A microprocessor capable of processing at least two program instructions at the same time and capable of issuing the two program instructions to two symmetrical multifunctional program execution units. The microprocessor includes a plurality of registers which store a plurality of operands and an instruction issue control which controls issuance of program instructions to the two symmetrical multifunctional program execution units. The instruction issue control issues the two program instructions (e.g. first and second) without decoding them in order to determine the processing functions required to be performed in response to the two program instructions.
    Type: Application
    Filed: March 6, 2000
    Publication date: December 26, 2002
    Inventors: Jack Choquette, Norman K. Yeung
  • Publication number: 20020188828
    Abstract: A super scalar processor includes execution units for data processing on integers, an execution unit for multiplication, an execution unit for data loading/storing and an electric power and clock controller for supplying electric power and a clock signal to them, and one of the execution units for data processing on integers includes an emulator for emulating instruction codes to be executed by the execution unit for multiplication to instruction codes thereto; while the execution unit for multiplication is powered down or off, an instruction analyzing and distributing unit changes the issuance of the instruction codes to the execution unit, and the instruction codes are emulated so as to achieve the given jobs; when an instruction code makes the execution unit for multiplication recovered from the idling state, the execution unit becomes enable after a time lug automatically given thereto so that the super scalar processor is improved in operability.
    Type: Application
    Filed: May 24, 2002
    Publication date: December 12, 2002
    Inventor: Hideki Sugimoto
  • Publication number: 20020161986
    Abstract: An instruction processing method for checking an arrangement of basic instructions in a very long instruction word (VLIW) instruction, suitable for language processing systems, an assembler and a compiler, used for processors which execute variable length VLIW instructions designed based on variable length VLIW architecture.
    Type: Application
    Filed: January 24, 2002
    Publication date: October 31, 2002
    Applicant: FUJITSU LIMITED
    Inventors: Teruhiko Kamigata, Hideo Miyake
  • Patent number: 6463524
    Abstract: A superscalar processor and method are disclosed for efficiently executing a store instruction. The store instruction is stored in an issue queue within the processor. A first part of the store instruction is issued from the issue queue to a first one of different execution units in response to a first operand becoming available. A second part of the store instruction is issued from the issue queue to a second one of the different execution units in response to a second operand becoming available. The store instruction is completed in response to executing the first part of the store instruction by the first one of the execution units and the second part of the store instruction by the second one of the execution units.
    Type: Grant
    Filed: August 26, 1999
    Date of Patent: October 8, 2002
    Assignee: International Business Machines Corporation
    Inventors: Maureen Delaney, Hung Qui Le, Dung Quoc Nguyen, Robert McDonald, David W. Victor
  • Patent number: 6456891
    Abstract: A system and method for transparent handling of extended register states. A set of additional registers, or an extended register file, is added to the base architecture of a microprocessor. The extended register file includes two dedicated registers and a plurality of general-use registers. The extended register file is mapped to a region in main memory. One dedicated register of the extended register file stores the physical base address of the memory region. Another dedicated register of the extended register file is used to store bits to indicate the status of the extended register file. A set of extended instructions is implemented for transferring data to and from the extended register file.
    Type: Grant
    Filed: October 27, 1999
    Date of Patent: September 24, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Uwe Kranich, David S. Christie
  • Publication number: 20020124159
    Abstract: A data processing apparatus executes a program. A number of operations has to be executed at a data dependent points in time. This is implemented by executing a data independent series of instructions at data independent points in time. The series of instructions includes instructions whose completion is dependent on data dependent conditions. Using the conditions it is selected which of the executed instructions cause the operations to be executed. FIG.
    Type: Application
    Filed: November 26, 2001
    Publication date: September 5, 2002
    Inventors: Marco Jan Gerrit Bekooji, Albert Van Der Werf, Natalino Giorgio Busa
  • Patent number: 6446190
    Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibilty in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: September 3, 2002
    Assignee: Bops, Inc.
    Inventors: Edwin F. Barry, Gerald G. Pechanek, Patrick R. Marchand
  • Patent number: 6434693
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address-collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: November 12, 1999
    Date of Patent: August 13, 2002
    Assignee: Seiko Epson Corporation
    Inventors: Cheryl D. Senter, Johannes Wang
  • Patent number: 6430664
    Abstract: A DSP (10) accesses internal memory using physical addresses and has a internal MMU (19) which allows the DSP (10) to work with a large virtual address space mapped to an external memory (20). The MMU (19) performs the translation between a virtual address and the physical address associated with the external memory (20). The MMU (19) includes a translation lookaside buffer (28) and walking table logic (32) for translating virtual addresses to physical addresses.
    Type: Grant
    Filed: November 5, 1999
    Date of Patent: August 6, 2002
    Assignee: Texas Instruments Incorporated
    Inventors: Gérard Chauvel, Serge Lasserre, Dominique Benoît Jacques d'Inverno
  • Patent number: 6430683
    Abstract: A system for time-ordered execution of load instructions. More specifically, the system enables just-in-time delivery of data requested by a load instruction. The system consists of a processor, an L1 data cache with corresponding L1 cache controller, and an instruction processor. The instruction processor manipulates an architected time dependency bit field of a load instruction to create a Distance of Dependency (DoD) bit field. The DoD bit field holds a relative dependency value which is utilized to order the load instruction in a Relative Time-Ordered Queue (RTOQ) of the L1 cache controller. The load instruction is sent from RTOQ to the L1 data cache at a particular time so that the data requested is loaded from the L1 data cache at the time specified by the DoD bit field. In the preferred embodiment, an acknowledgement is sent to the processing unit when the time specified is available in the RTOQ.
    Type: Grant
    Filed: June 25, 1999
    Date of Patent: August 6, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Lakshminarayanan Baba Arimilli, John Steven Dodson, Jerry Don Lewis
  • Publication number: 20020103990
    Abstract: An architecture and method are presented for a computer processor supporting interleaved execution of multiple concurrently-active threads, and capable of independently allocating a portion of the total processor execution time to each of the threads. Compared to existing architectures, in which the portion of processor time allocated to each thread is fixed, the processor architecture described herein is believed to offer higher performance for applications such as communications protocol processing, in which the workload of individual threads may vary, and in which the workload requires real time facilities.
    Type: Application
    Filed: February 1, 2001
    Publication date: August 1, 2002
    Inventor: Hanan Potash
  • Patent number: 6424870
    Abstract: A parallel processor system has a plurality of nodes interconnected by a network for communication under control of a network interface controller of each node. The network interface controller includes a message reception controller for receiving a message from another node and judging illustratively the status of message reception and the need to return an acknowledge message; an acknowledge generating unit for generating an acknowledge message transmission request based on predetermined information in the message and the reception status when the return of an acknowledge message is judged to be necessary; and a message transmission controller for receiving an acknowledge the message transmission request and generating and returning an acknowledge message correspondingly. At the receiving node, the network interface controller can return an acknowledge message without processor intervention.
    Type: Grant
    Filed: August 7, 1998
    Date of Patent: July 23, 2002
    Assignee: Hitachi, Ltd.
    Inventors: Hiromitsu Maeda, Patrick Hamilton
  • Patent number: 6421751
    Abstract: A computer system includes a pipelined communication link on which pipelined transactions are identified by a tag. A finite number of tags are available. The computer system detects where all the available tags have been assigned to outstanding transactions, no tags are free and the condition has persisted for a predetermined amount of time.
    Type: Grant
    Filed: June 11, 1999
    Date of Patent: July 16, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Dale E. Gulick
  • Patent number: 6418527
    Abstract: A system for instructing a data processor, the system including an instruction root having an operation selection field for selecting an operation to be performed by said data processor and an instruction prefix. The instruction prefix has a field selected from the group of a conditional execution field for selecting a condition under which a data processor will perform said selected operation, an operand length modification field for modifying the selected operation so as to be performed on an operand having a different length, an instruction group field for selecting a length of an instruction group that includes the instruction root, and a prefix length selection field for selecting a length of said instruction prefix. A data processor system responsive to this instruction system is also disclosed. An instruction system for statically grouping instructions without using an instruction prefix is also disclosed.
    Type: Grant
    Filed: October 13, 1998
    Date of Patent: July 9, 2002
    Assignee: Motorola, Inc.
    Inventors: Zvika Rozenshein, Jacob Tokar, Uri Dayan, Joe Paul Gergen
  • Publication number: 20020087834
    Abstract: For use in a data processor comprising an instruction execution pipeline comprising N processing stages, a system and method of encoding constant operands is disclosed. The system comprises a constant generator unit that is capable of generating both short constant operands and long constant operands. The constant generator unit extracts the bits of a short constant operand from an instruction syllable and right justifies the bits in an output syllable. For long constant operands, the constant generator unit extracts K low order bits from an instruction syllable and T high order bits from an extension syllable. The right justified K low order bits and the T high order bits are combined to represent the long constant operand in one output syllable. In response to the status of op code bits located within a constant generation instruction, the constant generator unit enables and disables multiplexers to automatically generate the appropriate short or long constant operand.
    Type: Application
    Filed: December 29, 2000
    Publication date: July 4, 2002
    Applicant: STMicroelectronics, Inc.
    Inventors: Paolo Faraboschi, Alexander J. Starr, Anthony X. Jarvis, Geoffrey M. Brown, Mark Owen Homewood, Gary L. Vondran
  • Publication number: 20020087833
    Abstract: A method and system provides for efficient dispersal of instructions to be executed by a processor using a distributed methodology of a centralized scheduling structure. The method and system include mapping instructions received from at least two instruction groups during a first stage followed by remapping, merging, and distributing instructions to a plurality of functional units during a second stage. The use of a first and second stage allowing an increased number of instructions to be executed by a processor operating at a given clock rate.
    Type: Application
    Filed: December 28, 2000
    Publication date: July 4, 2002
    Inventors: James S. Burns, Kin-Kee Sit, Sailesh Kottapalli, Kenneth D. Shoemaker
  • Publication number: 20020087835
    Abstract: A method and apparatus for improving dispersal performance of instruction threads is described. In one embodiment, the dispersal logic determines whether the instructions supplied to it include any NOP instructions. When a NOP instruction is detected, the dispersal logic places the NOP into a no-op port for execution. All other instructions are distributed to the proper execution pipes in a normal manner. Because the NOP instructions do not use the execution resources of other instructions, all instruction threads can be executed in one cycle.
    Type: Application
    Filed: December 29, 2000
    Publication date: July 4, 2002
    Inventors: Sailesh Kottapalli, Udo Walterscheidt, Andrew Sun, Thomas Yeh, Kinkee Sit
  • Patent number: 6415376
    Abstract: An apparatus and method for issue grouping of instructions in a VLIW processor is disclosed. There can be one, two, or three issue groups (but no greater than three issue groups) in each VLIW packet. In one embodiment, a template in the VLIW packet comprises two issue group end markers where each issue group end marker comprises three bits. The three bits in the first issue group end marker identifies the instruction which is the last instruction in the first issue group. Likewise, the three bits in the second issue group end marker identifies the instruction which is the last instruction in the second issue group. Any instructions in the VLIW packet falling outside the two expressly defined first and second issue groups are placed in a third issue group. As such, three issue groups can be identified by use of the two issue group end markers. In one embodiment, the template of the VLIW packet includes a chaining bit.
    Type: Grant
    Filed: June 16, 2000
    Date of Patent: July 2, 2002
    Assignee: Conexant Sytems, Inc.
    Inventors: Moataz A Mohamed, Chien-Wei Li, John R. Spence
  • Publication number: 20020083303
    Abstract: A program-controlled unit is described having a plurality of instruction-execution units for simultaneously executing successive instructions of a program that is to be executed. The program-controlled unit described has a number of particular features allowing the number of access operations to a program memory storing the program that is to be executed can be reduced.
    Type: Application
    Filed: September 4, 2001
    Publication date: June 27, 2002
    Inventors: Raimund Leitner, Christian Panis
  • Patent number: 6412061
    Abstract: A method of dynamically adjusting a multiple stage pipeline to execute one of a set of instructions, wherein each stage has a latency and performs a selected data operation. An instruction to be executed is received and a number of stages of the pipeline is selected to execute the instruction as needed to perform a corresponding data operation. Unnecessary stages are bypassed to a reduced latency and the instruction is executed with the selected stages.
    Type: Grant
    Filed: January 14, 1998
    Date of Patent: June 25, 2002
    Assignee: Cirrus Logic, Inc.
    Inventor: Thomas Anthony Dye
  • Patent number: 6408377
    Abstract: A microprocessor having M parallel pipelines and N arithmetic logic units, where N is less than M. A single instruction fetch stage fetches multi-stage instructions, and a single instruction decoder provides a parallel set of three instructions to the three pipelines. The two ALUs are dynamically connected to two of the pipelines having instructions requiring an ALU, while the third pipeline executes an instruction in parallel that does not require an ALU. The third pipeline may have a move unit connected to it.
    Type: Grant
    Filed: April 26, 2001
    Date of Patent: June 18, 2002
    Assignee: Rise Technology Company
    Inventor: Kenneth K. Munson
  • Patent number: 6408375
    Abstract: A system and method for performing register renaming of source registers in a processor having a variable advance instruction window for storing a group of instructions to be executed by the processor, wherein a new instruction is added to the variable advance instruction window when a location becomes available. A tag is assigned to each instruction in the variable advance instruction window. The tag of each instruction to leave the window is assigned to the next new instruction to be added to it. The results of instructions executed by the processor are stored in a temp buffer according to their corresponding tags to avoid output and anti-dependencies. The temp buffer therefore permits the processor to execute instructions out of order and in parallel. Data dependency checks for input dependencies are performed only for each new instruction added to the variable advance instruction window and register renaming is performed to avoid input dependencies.
    Type: Grant
    Filed: April 5, 2001
    Date of Patent: June 18, 2002
    Assignee: Seiko Epson Corporation
    Inventors: Trevor A. Deosaran, Sanjiv Garg, Kevin R. Iadonato
  • Patent number: 6408376
    Abstract: Disclosed is a method, apparatus, and an instruction set architecture (ISA) for an application specific signal processor (ASSP) tailored to digital signal processing (DSP) applications. The instruction set architecture implemented with the ASSP, is adapted to DSP algorithmic structures. In one embodiment, a single DSP instruction includes a pair of sub-instructions: a primary DSP sub-instruction and a shadow DSP sub-instruction. Both the primary and the shadow DSP sub-instructions are dyadic DSP instructions performing two operations in one instruction cycle. The DSP operations, in one embodiment, include a multiply instruction (MULT), an addition instruction (ADD), a minimize/maximize instruction (MIN/MAX), and a no operation instruction (NOP).
    Type: Grant
    Filed: August 30, 2000
    Date of Patent: June 18, 2002
    Assignee: Intel Corporation
    Inventors: Kumar Ganapathy, Ruban Kanapathipillai
  • Patent number: 6405267
    Abstract: A system and method for increasing effective bus bandwidth in communicating with a graphics device. Graphics commands and associated parameters are written into a contiguous region of system memory and transmitted in a weakly ordered fashion over a bus to a graphics device. The graphics device reorders the incoming data into the same order as which the data was written into the contiguous region of system memory, thereby allowing the use of order dependent encoded commands with the weakly ordered bus interface.
    Type: Grant
    Filed: January 22, 1999
    Date of Patent: June 11, 2002
    Assignee: S3 Graphics Co., Ltd.
    Inventors: Randy X. Zhao, Chien-Te Ho, Steve Fong
  • Publication number: 20020069345
    Abstract: In one exemplary embodiment, the disclosed VLIW processor comprises a number of threads where each thread includes a processing unit. For example, there can be two threads, where each of the two threads has its own processing unit. According to this exemplary embodiment, a number of VLIW packets are divided into a number of issue groups. As an example, two VLIW packets are divided into two issue groups each. The first issue group in the first VLIW packet is provided to a first thread for execution in the first thread processing unit during a first clock cycle. Concurrently, the first issue group in the second VLIW packet is provided to a second thread for execution in the second thread processing unit during the same clock cycle, i.e. during the first clock cycle. Moreover, the second issue group in the first VLIW packet is provided to the first thread for execution in the first thread processing unit during a second clock cycle.
    Type: Application
    Filed: December 5, 2000
    Publication date: June 6, 2002
    Applicant: Conexant Systems, Inc.
    Inventors: Moataz A. Mohamed, John R. Spence
  • Patent number: 6397319
    Abstract: A 32-bit instruction 50 is composed of a 4-bit format field 51, a 4-bit operation field 52, and two 12-bit operation fields 59 and 60. The 4-bit operation field 52 can only include (1) an operation code “cc” that indicates a branch operation which uses a stored value of the implicitly indicated constant register 36 as the branch address, or (2) a constant “const”. The content of the 4-bit operation field 52 is specified by a format code provided in the format field 51.
    Type: Grant
    Filed: June 20, 2000
    Date of Patent: May 28, 2002
    Assignee: Matsushita Electric Ind. Co., Ltd.
    Inventors: Shuichi Takayama, Nobuo Higaki
  • Patent number: 6393549
    Abstract: An instruction alignment unit is provided which is capable of routing variable byte length instructions simultaneously to a plurality of decode units which form fixed issue positions within a superscalar microprocessor. The instruction alignment unit may be implemented with a relatively small number of cascaded levels of logic gates, thus accomodating very high frequencies of operation. In one embodiment, the superscalar microprocessor includes an instruction cache for storing a plurality of variable byte-length instructions and a predecode unit for generating predecode tags which identify the location of the start byte of each variable byte-length instruction. An instruction alignment unit is configured to channel a plurality of the variable byte-length instructions simultaneously to predetermined issue positions depending upon the locations of their corresponding start bytes in a cache line.
    Type: Grant
    Filed: December 21, 1999
    Date of Patent: May 21, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Thang Tran, David B. Witt
  • Publication number: 20020056037
    Abstract: A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor also includes a memory control system that has a first memory controller that sorts memory references based on whether the memory references are directed to an even bank or an odd bank of memory and a second memory controller that optimizes memory references based upon whether the memory references are read references or write references. Instructions for switching and branching based on executing contexts are also disclosed.
    Type: Application
    Filed: January 12, 2001
    Publication date: May 9, 2002
    Inventors: Gilbert Wolrich, Matthew J. Adiletta, William Wheeler
  • Patent number: 6385719
    Abstract: A transfer tag is generated by the Instruction Fetch Unit and passed to the decode unit in the instruction pipeline with each group of instructions fetched during a branch prediction by a fetcher. Individual instructions within the fetched group for the branch pipeline are assigned a concatenated version (group tag concatenated with instruction lane) of the transfer tag which is used to match on requests to flush any newer instructions. All potential instruction or Internal Operation latches in the decode pipeline must perform a match and if a match is encountered, all valid bits associated with newer instructions or internal operations upstream from the match are cleared. The transfer tag representing the next instruction to be processed in the branch pipeline is passed to the Instruction Dispatch Unit. The Instruction Dispatch Unit queries the branch pipeline to compare its transfer tag with transfer tags of instructions in the branch pipeline.
    Type: Grant
    Filed: June 30, 1999
    Date of Patent: May 7, 2002
    Assignee: International Business Machines Corporation
    Inventors: John Edward Derrick, Brian R. Konigsburg, Lee Evan Eisen, David Stephen Levitan