Superscalar Patents (Class 712/23)
-
Patent number: 6499099Abstract: A central processing unit having an extension instruction comprises a memory address, an offset and a fixed length instruction of varying immediate data. The central processing unit comprises a general register, a special register, a register file constituted as an inner register, a function block for executing the calculation function; an instruction register for memorizing the instruction, a control block for generating/outputting a control signal to the instruction register and a plurality of status flags, in which the special register enables access by a programmer and includes an extension data field for memorizing extension data or an extension register having the extension data field as one element and an extension flag for changing its status when the instruction memorizing the extension data in the extension register is executed and having one or a plurality of bits that is accessible to a programmer.Type: GrantFiled: January 27, 1999Date of Patent: December 24, 2002Assignee: Asia Design Co., Ltd.Inventor: Kyung Youn Cho
-
Patent number: 6487652Abstract: Methods and apparatus for speculatively locking an object are disclosed. According to one aspect of the present invention, a method for acquiring use of an object using a current thread includes a determination of whether a first bit included in the object is set to indicate that the object is speculatively owned by a speculative owner thread. When the object is speculatively owned, the speculative owner thread is allowed to use the object without locking the object. The method also includes checking a stored identifier that is associated with the object and identifies the speculative owner thread, as well as determining whether the stored identifier identifies the current thread. When the stored identifier identifies the current thread, the current thread already has use of the object; i.e., the current thread is the speculative owner thread.Type: GrantFiled: September 30, 1999Date of Patent: November 26, 2002Assignee: Sun Microsystems, Inc.Inventors: Benedict A. Gomes, Lars Bak, David P. Stoutamire
-
Patent number: 6484251Abstract: A processor including a register, an execution unit, a temporary result buffer, and a commit function circuit. The register includes at least one register bit and may include one or more sticky bits. The execution unit is suitable for executing a set of computer instructions. The temporary result buffer is configured to receive, from the execution unit, register bit modification information provided by the instructions. The temporary result buffer is suitable for storing the modification information in set/clear pairs of bits corresponding to respective register bits of the register. The commit function circuit is configured to receive the set/clear pairs of bits from the temporary result buffer when the instruction is committed. The commit function circuit is suitable for generating an updated bit in response to receiving the set/clear pairs of bits. The updated bit is then committed to the corresponding register bit of the register.Type: GrantFiled: October 14, 1999Date of Patent: November 19, 2002Assignee: International Business Machines CorporationInventors: Robert Greg McDonald, Peichun Peter Liu, Christopher Hans Olson
-
Patent number: 6484254Abstract: According to one aspect of the invention, a method is provided in which store addresses of store instructions dispatched during a last predetermined number of cycles are maintained in a first data structure of a first processor. It is determined whether a load address of a first load instruction matches one of the store addresses in the first data structure. The first load instruction is replayed if the load address of the first load instruction matches one of the store addresses in the first data structure.Type: GrantFiled: December 30, 1999Date of Patent: November 19, 2002Assignee: Intel CorporationInventors: Muntaquim F. Chowdhury, Douglas M. Carmean
-
Patent number: 6477562Abstract: A multi-streaming processor has multiple streams for processing multiple threads, and an instruction scheduler including a priority record of priority codes for one or more of the streams. The priority codes determine in some embodiments relative access to resources as well as which stream has access at any point in time. In other embodiments priorities are determined dynamically and altered on-the-fly, which may be done by various criteria, such as on-chip processing statistics, by executing one or more priority algorithms, by input from off-chip, according to stream loading, or by combinations of these and other methods. In one embodiment a special code is used for disabling a stream, and streams may be enabled and disabled dynamically by various methods, such as by on-chip events, processing statistics, inpu from off-chip, and by processor interrupts. Some specific applications are taught, including for IP-routers and digital signal processors.Type: GrantFiled: December 16, 1998Date of Patent: November 5, 2002Assignee: Clearwater Networks, Inc.Inventors: Mario D. Nemirovsky, Adolfo M. Nemirovsky, Narendra Sankar
-
Patent number: 6463525Abstract: Where it is desired to perform a double precision operation using single precision operands, first and second single precision operands are loaded into first and second respective rows of a re-order buffer, and third and fourth single precision operands are loaded into third and fourth respective rows of the re-order buffer. A first merge instruction copies the first and second single precision operands from respective first and second rows of the re-order buffer into first and second portions of a fifth row of the re-order buffer, thereby concatenating the first and second single precision operands to represent a first double precision operand. A second merge instruction copies the third and fourth single precision operands from respective third and fourth rows of the re-order buffer into first and second portions of a sixth row of the re-order buffer, thereby concatenating the third and fourth single precision operands to represent a second double precision operand.Type: GrantFiled: August 16, 1999Date of Patent: October 8, 2002Assignee: Sun Microsystems, Inc.Inventor: J. Arjun Prabhu
-
Publication number: 20020144083Abstract: Speculative pre-computation and multithreading (SP), allows a processor to use spare hardware contexts to spawn speculative threads to very effectively pre-fetch data well in advance of the main thread. The burden of spawning threads may fall on the main thread via basic triggers. The speculative threads may also spawn other speculative threads via chaining triggers.Type: ApplicationFiled: March 30, 2001Publication date: October 3, 2002Inventors: Hong Wang, Jamison Collins, John P. Shen, Bryan Black, Perry H. Wang, Edward T. Grochowski, Ralph M. Kling
-
Publication number: 20020138717Abstract: A processor includes logic for tagging a thread identifier (TID) for usage with processor blocks that are not stalled. Pertinent non-stalling blocks include caches, translation look-aside buffers (TLB), a load buffer asynchronous interface, an external memory management unit (MMU) interface, and others. A processor includes a cache that is segregated into a plurality of N cache parts. Cache segregation avoids interference, “pollution”, or “cross-talk” between threads. One technique for cache segregation utilizes logic for storing and communicating thread identification (TID) bits. The cache utilizes cache indexing logic. For example, the TID bits can be inserted at the most significant bits of the cache index.Type: ApplicationFiled: May 23, 2002Publication date: September 26, 2002Inventors: William N. Joy, Marc Tremblay, Gary Lauterbach, Joseph I. Chamdani
-
Patent number: 6453344Abstract: A multiprocessor system having a total number of available CPUs partitioned into one or more smaller pools of CPUs called servers where the number of CPUs available to a server is reduced below the total number of available CPUs. Software licensing costs are thereby reduced because the number of CPUs available to run the operating system or ISV software has been reduced to the number of CPUs in the pool of the server rather than the total number of available CPUs in the multiprocessor system. In order to enforce the isolation of CPUs required by software licensing, separate identification codes, CPUIDs, that contain unique system serial numbers are assigned to each server in the multiprocessing system. The multiprocessor system has multiple CPUIDs, one for each server (each pool of CPUs that can execute operating systems and ISV software).Type: GrantFiled: March 31, 1999Date of Patent: September 17, 2002Assignee: Amdahl CorporationInventors: Robert Scott Ellsworth, Jonathan Russell Nolting, Keith Joseph Philipp
-
Patent number: 6442670Abstract: A data processing system comprises a plurality of nodes and a serial data bus interconnecting the nodes in series in a closed loop, for passing address and data information. At least one processing node includes a processor, a printed circuit board and a memory which is partitioned into a plurality of sections, including a first section for directly sharable memory located on the printed circuit board, and a second section for block sharable memory. A local bus connects the processor, block sharable memory and printed circuit board, for transferring data in parallel from the processor to the directly sharable memory on the printed circuit board, and for transferring data from the block sharable memory to the printed circuit board.Type: GrantFiled: July 2, 2001Date of Patent: August 27, 2002Assignee: Sun Microsystems, Inc.Inventors: John D. Acton, Michael D. Derbish, Gavin G. Gibson, Jack M. Hardy, Jr., Hugh M. Humphreys, Steven P. Kent, Steven E. Schelong, Ricardo Yong, William B. DeRolf
-
Patent number: 6438677Abstract: One embodiment of the present invention provides a system that supports space and time dimensional program execution by facilitating accesses to different versions of a memory element. The system supports a head thread that executes program instructions and a speculative thread that executes program instructions in advance of the head thread. The head thread accesses a primary version of the memory element, and the speculative thread accesses a space-time dimensioned version of the memory element. During a reference to the memory element by the head thread, the system accesses the primary version of the memory element. During a reference to the memory element by the speculative thread, the speculative thread accesses a pointer associated with the primary version of the memory element, and accesses a version of the memory element through the pointer. Note that the pointer points to the space-time dimensioned version of the memory element if the space-time dimensioned version of the memory element exists.Type: GrantFiled: October 20, 1999Date of Patent: August 20, 2002Assignee: Sun Microsystems, Inc.Inventors: Shailender Chaudhry, Marc Tremblay
-
Patent number: 6438680Abstract: When a decision circuit (217) incorporated in a control circuit (21) in an instruction decode unit (2) in a microprocessor (1) decides that an integer operation unit (4) can not execute a following sub instruction, the decision circuit (217) controls each of selectors (211, 214, and 215) and an exchange circuit (216) so that a memory access unit (3) that has already executed a preceding sub instruction can execute the following sub instruction.Type: GrantFiled: June 3, 1999Date of Patent: August 20, 2002Assignee: Mitsubishi Denki Kabushiki KaishaInventors: Akira Yamada, Isao Minematsu
-
Patent number: 6434693Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address-collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.Type: GrantFiled: November 12, 1999Date of Patent: August 13, 2002Assignee: Seiko Epson CorporationInventors: Cheryl D. Senter, Johannes Wang
-
Patent number: 6434689Abstract: An apparatus is described that comprises a data processing unit and at least one coprocessor. The data processing unit comprises a register file having registers, a memory, a plurality of execution units, a coprocessor interface for coupling the at least one coprocessor with the data processing unit, and a pipeline configuration for processing instructions having a fetch stage for fetching an instruction from the memory, a decode stage for decoding an operational code from the instruction, an execution stage for activating one of the execution units, and a write-back stage for God writing back from the execution unit. The data processing unit comprises read-and write-lines coupling the register file with the coprocessor for exchanging operands, at least one control line indicating that the coprocessor is busy, and a plurality of control lines from the decode stage for controlling the coprocessor which are operated upon detection of a coprocessor instruction.Type: GrantFiled: November 9, 1998Date of Patent: August 13, 2002Assignee: Infineon Technologies North America Corp.Inventors: Rod G. Fleck, Roger D. Arnold, Bruce Holmer, Danielle G. Lemay
-
Publication number: 20020103990Abstract: An architecture and method are presented for a computer processor supporting interleaved execution of multiple concurrently-active threads, and capable of independently allocating a portion of the total processor execution time to each of the threads. Compared to existing architectures, in which the portion of processor time allocated to each thread is fixed, the processor architecture described herein is believed to offer higher performance for applications such as communications protocol processing, in which the workload of individual threads may vary, and in which the workload requires real time facilities.Type: ApplicationFiled: February 1, 2001Publication date: August 1, 2002Inventor: Hanan Potash
-
Patent number: 6427201Abstract: Routine processing for routine data, non-routine processing for routine data and general non-routine processing are to be processed efficiently. To this end, a main CPU 20 has a CPU core 21, having a parallel computational mechanism, a command cache 22 and a data cache 23, as ordinary cache units, and a scratch-pad memory SPR 24 which is an internal high-speed memory capable of performing direct memory accessing (DMA) suited for routine processing. A floating decimal point vector processor (VPE) 30 has an internal high-speed memory (VU-MEM) capable of DMA processing and is tightly connected to the main CPU to form a co-processor. The VPE 40 has a high-speed internal memory 40 (VU-MEM) capable of DMA processing. The DMA controller (DMAC) 14 controls DMA transfer between the main memory 50 and the SPR 24, between the main memory 50 and the (VU-MEM) 34 and between the (VU-MEM) 44 and the SPR 24.Type: GrantFiled: August 18, 1998Date of Patent: July 30, 2002Assignee: Sony Computer Entertainment Inc.Inventor: Akio Ohba
-
Patent number: 6415354Abstract: When a search key is supplied to a content addressable memory (CAM), the CAM signals indicate which CAM entries have matched the key. These signals are provided to a weight array to select the entry of the highest priority. Each entry's priority is indicated by a weight in the weight array. The weight array processing is pipelined. In pipeline stage 0, the most significant bits (bits 0) of the weights are examined, and the highest priorities are selected based on the most significant bits. At pipeline stage 1, the next most significant bits (bits 1) are examined, and so on.Type: GrantFiled: July 15, 1999Date of Patent: July 2, 2002Assignee: Applied Micro Circuits CorporationInventors: Alexander Joffe, Oran Uzrad-Nali, Simon H. Milner
-
Patent number: 6412062Abstract: The present invention is a method and apparatus to inject an external event to a first pipeline stage in a pipeline chain. A target instruction address corresponding to an instruction is specified. The external event is asserted when there is a match between the target instruction address and a pipeline instruction pointer corresponding to a second pipeline stage. The second pipeline stage is earlier than the first pipeline stage in the pipeline chain. The external event is unmasked via a delivery path between a signal representing the asserted external event and the first pipeline stage.Type: GrantFiled: June 30, 1999Date of Patent: June 25, 2002Assignee: Intel CorporationInventors: Yan Xu, Steven J. Tu
-
Patent number: 6412061Abstract: A method of dynamically adjusting a multiple stage pipeline to execute one of a set of instructions, wherein each stage has a latency and performs a selected data operation. An instruction to be executed is received and a number of stages of the pipeline is selected to execute the instruction as needed to perform a corresponding data operation. Unnecessary stages are bypassed to a reduced latency and the instruction is executed with the selected stages.Type: GrantFiled: January 14, 1998Date of Patent: June 25, 2002Assignee: Cirrus Logic, Inc.Inventor: Thomas Anthony Dye
-
Patent number: 6408377Abstract: A microprocessor having M parallel pipelines and N arithmetic logic units, where N is less than M. A single instruction fetch stage fetches multi-stage instructions, and a single instruction decoder provides a parallel set of three instructions to the three pipelines. The two ALUs are dynamically connected to two of the pipelines having instructions requiring an ALU, while the third pipeline executes an instruction in parallel that does not require an ALU. The third pipeline may have a move unit connected to it.Type: GrantFiled: April 26, 2001Date of Patent: June 18, 2002Assignee: Rise Technology CompanyInventor: Kenneth K. Munson
-
Patent number: 6408375Abstract: A system and method for performing register renaming of source registers in a processor having a variable advance instruction window for storing a group of instructions to be executed by the processor, wherein a new instruction is added to the variable advance instruction window when a location becomes available. A tag is assigned to each instruction in the variable advance instruction window. The tag of each instruction to leave the window is assigned to the next new instruction to be added to it. The results of instructions executed by the processor are stored in a temp buffer according to their corresponding tags to avoid output and anti-dependencies. The temp buffer therefore permits the processor to execute instructions out of order and in parallel. Data dependency checks for input dependencies are performed only for each new instruction added to the variable advance instruction window and register renaming is performed to avoid input dependencies.Type: GrantFiled: April 5, 2001Date of Patent: June 18, 2002Assignee: Seiko Epson CorporationInventors: Trevor A. Deosaran, Sanjiv Garg, Kevin R. Iadonato
-
Patent number: 6405304Abstract: A technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. The technique further involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.Type: GrantFiled: August 24, 1998Date of Patent: June 11, 2002Assignee: Compaq Information Technologies Group, L.P.Inventors: James Arthur Farrell, Sharon Marie Britton, Harry Ray Fair, III, Bruce Gieseke, Daniel Lawrence Leibholz, Derrick R. Meyer
-
Patent number: 6397319Abstract: A 32-bit instruction 50 is composed of a 4-bit format field 51, a 4-bit operation field 52, and two 12-bit operation fields 59 and 60. The 4-bit operation field 52 can only include (1) an operation code “cc” that indicates a branch operation which uses a stored value of the implicitly indicated constant register 36 as the branch address, or (2) a constant “const”. The content of the 4-bit operation field 52 is specified by a format code provided in the format field 51.Type: GrantFiled: June 20, 2000Date of Patent: May 28, 2002Assignee: Matsushita Electric Ind. Co., Ltd.Inventors: Shuichi Takayama, Nobuo Higaki
-
Publication number: 20020056034Abstract: A data processing system including a memory system and a plurality of peripheral components. A processor is coupled to the memory and peripheral components. A plurality of pipeline stages are implemented within the processor where each stage is configured to perform specific operations according to instructions then associated with that stage. A snapshot register is associated with at least some of the pipeline stages where the snapshot register configured to store data describing the state of execution of the instruction then associated with that stage.Type: ApplicationFiled: October 1, 1999Publication date: May 9, 2002Inventors: MARGARET GEARTY, CHIH-JUI PENG
-
Patent number: 6385715Abstract: A processor is provided that includes an execution unit for executing instructions and a replay system for replaying instructions which have not executed properly. The replay system is coupled to the execution unit and includes a checker for determining whether each instruction has executed properly and a plurality of replay queues or replay queue sections coupled to the checker for temporarily storing one or more instructions for replay. In one embodiment, thread-specific replay queue sections may each be used to store long latency instruction for each thread until the long latency instruction is ready to be executed (e.g., data for a load instruction has been retrieved from external memory). By storing the long latency instruction and its dependents in a replay queue section for one thread which has stalled, execution resources are made available for improving the speed of execution of other threads which have not stalled.Type: GrantFiled: May 4, 2001Date of Patent: May 7, 2002Assignee: Intel CorporationInventors: Amit A. Merchant, Darrell D. Boggs, David J. Sager
-
Patent number: 6385719Abstract: A transfer tag is generated by the Instruction Fetch Unit and passed to the decode unit in the instruction pipeline with each group of instructions fetched during a branch prediction by a fetcher. Individual instructions within the fetched group for the branch pipeline are assigned a concatenated version (group tag concatenated with instruction lane) of the transfer tag which is used to match on requests to flush any newer instructions. All potential instruction or Internal Operation latches in the decode pipeline must perform a match and if a match is encountered, all valid bits associated with newer instructions or internal operations upstream from the match are cleared. The transfer tag representing the next instruction to be processed in the branch pipeline is passed to the Instruction Dispatch Unit. The Instruction Dispatch Unit queries the branch pipeline to compare its transfer tag with transfer tags of instructions in the branch pipeline.Type: GrantFiled: June 30, 1999Date of Patent: May 7, 2002Assignee: International Business Machines CorporationInventors: John Edward Derrick, Brian R. Konigsburg, Lee Evan Eisen, David Stephen Levitan
-
Publication number: 20020053014Abstract: A tag monitoring system for assigning tags to instructions. A source supplies instructions to be executed by a functional unit. A register file stores information required for the execution of each instruction. A queue having a plurality of slots containing tags which are used for tagging the instructions. The tags are arranged in the queue in an order specified by the program order of their corresponding instructions. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file stores an instruction's information at a location in the register file defined by the tag assigned to that instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.Type: ApplicationFiled: January 3, 2002Publication date: May 2, 2002Inventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
-
Patent number: 6381689Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.Type: GrantFiled: March 13, 2001Date of Patent: April 30, 2002Assignee: Advanced Micro Devices, Inc.Inventors: David B. Witt, Thang M. Tran
-
Patent number: 6378060Abstract: The present invention provides a cross-bar circuit that implements a switch of a broadband processor. In an exemplary embodiment, the present invention provides a cross-bar circuit that, in response to partially-decoded instruction information and in response to datapath information, (1) allows any bit from a 2n-bit (e.g. 256-bit) input source word to be switched into any bit position of a 2m-bit (e.g. 128-bit) output destination word and (2) provides the ability to set-to-zero any bit in said 2m-bit output destination word. The cross-bar circuit includes: (1) a switch circuit which includes 2m 2n:1 multiplexor circuits, where each of the 2n:1 multiplexor circuits (a) has a unique n-bit (e.g.Type: GrantFiled: February 11, 2000Date of Patent: April 23, 2002Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, Bruce Bateman, John Moussouris
-
Patent number: 6360309Abstract: A tag monitoring system for assigning tags to instructions. A source supplies instructions to be executed by a functional unit. A register file stores information required for the execution of each instruction. A queue having a plurality of slots containing tags which are used for tagging the instructions. The tags are arranged in the queue in an order specified by the program order of their corresponding instructions. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file stores an instruction's information at a location in the register file defined by the tag assigned to that instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.Type: GrantFiled: May 19, 2000Date of Patent: March 19, 2002Assignee: Seiko Epson CorporationInventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
-
Publication number: 20020029328Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.Type: ApplicationFiled: May 10, 2001Publication date: March 7, 2002Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6353881Abstract: A system is provided that facilitates space and time dimensional execution of computer programs through selective versioning of memory elements located in a system heap. The system includes a head thread that executes program instructions and a speculative thread that simultaneously executes program instructions in advance of the head thread with respect to the time dimension of sequential execution of the program. The collapsing of the time dimensions is facilitated by expanding the heap into two space-time dimensions, a primary dimension (dimension zero), in which the head thread operates, and a space-time dimension (dimension one), in which the speculative thread operates. In general, each dimension contains its own version of an object and objects created by the thread operating in the dimension. The head thread generally accesses a primary version of a memory element and the speculative thread generally accesses a corresponding space-time dimensioned version of the memory element.Type: GrantFiled: May 17, 1999Date of Patent: March 5, 2002Assignee: Sun Microsystems, Inc.Inventors: Shailender Chaudhry, Marc Tremblay
-
Patent number: 6351804Abstract: A control bit vector storage is provided. The present control bit vector storage (preferably included within a functional unit) stores control bits indicative of a particular instruction. The control bits are divided into multiple control vectors, each vector indicative of one instruction operation. The control bits control dataflow elements within the functional unit to cause the instruction operation to be performed. Additionally, the present control bit vector storage allows complex instructions (or instructions which produce multiple results) to be divided into simpler operations. The hardware included within the functional unit may be reduced to that employed to perform the simpler operations. In one embodiment, the control bit vector storage comprises a plurality of vector storages. Each vector storage comprises a pair of individual vector storages and a shared vector storage. The shared vector storage stores control bits common to both control vectors.Type: GrantFiled: October 10, 2000Date of Patent: February 26, 2002Assignee: Advanced Micro Devices, Inc.Inventor: Marty L. Pflum
-
Publication number: 20020016903Abstract: The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available.Type: ApplicationFiled: May 8, 2001Publication date: February 7, 2002Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6343359Abstract: An apparatus is presented for expediting the execution of dependent micro instructions in a pipeline microprocessor having design characteristics—complexity, power, and timing—that are not significantly impacted by the number of stages in the microprocessor's pipeline. In contrast to conventional result distribution schemes where an intermediate result is distributed to multiple pipeline stages, the present invention provides a cache for storage of multiple intermediate results. The cache is accessed by a dependent micro instruction to retrieve required operands. The apparatus includes a result forwarding cache, result update logic, and operand configuration logic. The result forwarding cache stores the intermediate results. The result update logic receives the intermediate results as they are generated and enters the intermediate results into the result forwarding cache.Type: GrantFiled: May 18, 1999Date of Patent: January 29, 2002Assignee: IP-First, L.L.C.Inventors: Gerard M. Col, G. Glenn Henry
-
Patent number: 6341347Abstract: A processor includes a thread switching control logic that performs a fast thread-switching operation in response to an L1 cache miss stall. The fast thread-switching operation implements one or more of several thread-switching methods. A first thread-switching operation is “oblivious” thread-switching for every N cycle in which the individual flip-flops locally determine a thread-switch without notification of stalling. The oblivious technique avoids usage of an extra global interconnection between threads for thread selection. A second thread-switching operation is “semi-oblivious” thread-switching for use with an existing “pipeline stall” signal (if any). The pipeline stall signal operates in two capacities, first as a notification of a pipeline stall, and second as a thread select signal between threads so that, again, usage of an extra global interconnection between threads for thread selection is avoided.Type: GrantFiled: May 11, 1999Date of Patent: January 22, 2002Assignee: Sun Microsystems, Inc.Inventors: William N. Joy, Marc Tremblay, Gary Lauterbach, Joseph I. Chamdani
-
Patent number: 6341343Abstract: Three parallel instruction processing pipelines of a microprocessor share two data memory ports for obtaining operands and writing back results. Since a significant proportion of the instructions of a typical computer program do not require reading operands from the memory, the probability is high that at least one of any three program instructions to be executed at the same time need not fetch an operand from memory. The two memory ports are thus connected at any given time with the two of the three pipelines which are processing instructions that require memory access, the pipeline without access to the memory processing an instruction that does not need it. To do so, the added third pipeline need not have all the same resources as the other two pipelines, so its stages are made to have a reduced capability in order to save space and reduce power consumption.Type: GrantFiled: April 26, 2001Date of Patent: January 22, 2002Assignee: Rise Technology CompanyInventor: Kenneth K. Munson
-
Publication number: 20020007450Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.Type: ApplicationFiled: March 13, 2001Publication date: January 17, 2002Inventors: David B. Witt, Thang M. Tran
-
Patent number: 6339822Abstract: A microprocessor configured to cache basic blocks of instructions is disclosed. The microprocessor may comprise decoding logic, a basic block cache, and a branch prediction unit. The decoding logic is coupled to receive and decode variable-length instructions into padded instructions that have one of a predetermined number of predetermined lengths. The decoding logic is further configured to form basic blocks of instructions from the padded and decoded instructions. Basic blocks are natural divisions in instruction streams resulting from branch instructions. The start of a basic block is a target of a branch, and the end is another branch instruction. The basic block cache is configured to store the basic blocks in a plurality of storage locations, wherein each storage location is configured to store an address tag, a link bit, and at least a portion of one basic block. The link bit indicates whether the basic block stored in said storage location extends into another storage location.Type: GrantFiled: October 2, 1998Date of Patent: January 15, 2002Assignee: Advanced Micro Devices, Inc.Inventor: Paul K. Miller
-
Patent number: 6336160Abstract: A method and system for dividing computer processor registers into sectors and storing frequently used data in the most significant unused sectors. The method includes sector renaming that is performed on each individual sector (i.e., on a sector-by-sector basis) rather than renaming an entire processor register. A register is divided into sectors such that the smallest accessible unit for an instruction in each register can be uniquely addressed and renamed. A register file is divided into sectors so that each process register can be uniquely addressed and renamed. The most significant sectors of the processor registers are used to hold pre-assigned values therein. Data previously loaded into processor register sectors is stored in the most significant sectors of the processor registers for possible future referencing and use. The method also includes establishing a sign-extend memory that includes at least one sign-extend bit in a sector status table.Type: GrantFiled: June 19, 1998Date of Patent: January 1, 2002Assignee: International Business Machines CorporationInventors: Richard James Eickemeyer, Nadeem Malik, Alan Vicha Pita, Avijit Saha
-
Patent number: 6336182Abstract: A method and system for aligning internal operations (IOPs) for dispatch are disclosed. The method and system comprise conditionally asserting a predecode based on a particular dispatch slot that an instruction is going to be placed. The method and system further include using the information related to the predecode to expand an instruction into at least one dummy operation and an IOP operation whenever the instruction would not be supported in the particular dispatch slot.Type: GrantFiled: March 5, 1999Date of Patent: January 1, 2002Assignee: International Business Machines CorporationInventors: John Edward Derrick, Lee Evan Eisen, Paul Joseph Jordan, Robert William Hay
-
Patent number: 6336178Abstract: An internal RISC-type instruction structure furnishes a fixed bit-length template including a plurality of defined bit fields for a plurality of operation (Op) formats. One format includes an instruction-type bit field, two source-operand bit fields and one destination-operand bit field for designating a register-to-register operation. Another format is a load-store format that includes an instruction-type bit field, an identifier of a source or destination register for the respective load or store operation, and bit fields for specifying the segment, base and index parameters of an address.Type: GrantFiled: September 11, 1998Date of Patent: January 1, 2002Assignee: Advanced Micro Devices, Inc.Inventor: John G. Favor
-
Patent number: 6332187Abstract: A processor is configured to generate lookahead values using a cumulative constant. The processor classifies operations to a particular register (e.g. the stack pointer register, or ESP in an embodiment employing the x86 instruction set architecture) as either accelerated or non-accelerated. For example, instructions which are defined to increment/decrement the particular register by an explicit or implicit constant value may be accelerated operations. Upon the occurrence of a non-accelerated operation, the processor may begin accumulating the cumulative effect of accelerated operations to the result of the non-accelerated operation as a cumulative offset. The result of the non-accelerated operation (upon execution thereof) may then be added to the cumulative offset values corresponding to each accelerated operation to generate the particular register value corresponding to that accelerated operation. Accordingly, dependencies upon the register due to the accelerated operations may be alleviated.Type: GrantFiled: March 8, 2001Date of Patent: December 18, 2001Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6330661Abstract: A register content inheriting system contributes for realization of register content inheriting with a hardware of simple construction in a multithread multi-processor. Respective thread execution units and physical common register are provided. Using a register mapping table, a register number to be made reference to from each program is placed in the physical common register. Only as required in inheriting of register content, a relationship of the register mapping table is updated. Upon inheriting the content of the register, the content of the register mapping table is copied.Type: GrantFiled: April 26, 1999Date of Patent: December 11, 2001Assignee: NEC CorporationInventor: Sunao Torii
-
Patent number: 6330660Abstract: An application specific signal processor (ASSP) performs vectorized and nonvectorized operations. Nonvectorized operations may be performed using a saturated multiplication and accumulation operation. The ASSP includes a serial interface, a buffer memory, a core processor for performing digital signal processing which includes a reduced instruction set computer (RISC) processor and four signal processing units. The four signal processing units execute the digital signal processing algorithms in parallel including the execution of the saturated multiplication and accumulation operation. The ASSP is utilized in telecommunication interface devices such as a gateway. The ASSP is well suited to handling voice and data compression/decompression in telecommunication systems where a packetized network is used to transceive packetized data and voice.Type: GrantFiled: October 25, 1999Date of Patent: December 11, 2001Assignee: VxTel, Inc.Inventors: Kumar Ganapathy, Ruban Kanapathipillai
-
Patent number: 6330657Abstract: An apparatus and method are presented for increasing the throughput within a single-channel of a pipeline microprocessor. Back-to-back pairs of micro instructions are evaluated to determine if they can be combined for execution in parallel. If so, then they are combined and issued for concurrent execution. The apparatus includes a micro instruction queue that buffers and orders micro instructions for sequential execution by the pipeline microprocessor. Within the micro instruction queue, a second micro instruction is ordered to execute immediately following execution of a first micro instruction. Pairing logic is coupled to the micro instruction queue. The pairing logic combines the first and second micro instructions so that the first and second micro instructions are executed in parallel by the pipeline microprocessor.Type: GrantFiled: May 18, 1999Date of Patent: December 11, 2001Assignee: IP-First, L.L.C.Inventors: Gerard M. Col, G. Glenn Henry
-
Patent number: 6324639Abstract: A processor can decode short instructions with a word length equal to one unit field and long instructions with a word length equal to two unit fields. An opcode of each kind of instruction is arranged into the first unit field assigned to the instruction. The number of instructions to be executed by the processor in parallel is s. When the ratio of short to long instructions is s-1:1, the s-1 short instructions are assigned to the first unit field to the s-1th unit field in the parallel execution code, and the long instruction is assigned to the sth unit field to the (s+k−1)th unit field in the same parallel execution code.Type: GrantFiled: March 29, 1999Date of Patent: November 27, 2001Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Taketo Heishi, Tetsuya Tanaka, Nobuo Higaki, Shuishi Takayama, Kensuke Odani
-
Publication number: 20010042189Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.Type: ApplicationFiled: February 20, 2001Publication date: November 15, 2001Inventors: Boris A. Babaian, Yuli Kh Sakhin, Vladimir Yu Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
-
Patent number: 6313766Abstract: A method and apparatus to accelerate variable length decode is disclosed. The system includes a logic device to receive a bit stream of variable length encoded information. The logic device outputs a fixed length value corresponding to a variable length code received as part of the bit stream of the variable length encoded information. The system also includes a processor to receive the fixed length value. The processor to performs a write of a coefficient to a system memory device, the coefficient corresponding to the fixed length value received from the logic device.Type: GrantFiled: July 1, 1998Date of Patent: November 6, 2001Assignee: Intel CorporationInventors: Brian K. Langendorf, Brian Tucker
-
Patent number: 6311261Abstract: The invention involves new microarchitecture apparatus and methods for superscalar microprocessors that support multi-instruction issue, decoupled dataflow scheduling, out-of-order execution, register renaming, multi-level speculative execution, and precise interrupts. These are the Distributed Instruction Queue (DIQ) and the Modified Reorder Buffer (MRB). The DIQ is a new distributed instruction shelving technique that is an alternative to the reservation station (RS) technique and offers a more efficient (improved performance/cost) implementation. The Modified Reorder Buffer (MRB) is an improved reorder buffer (RB) result shelving technique eliminates the slow and expensive prioritized associative lookup, shared global buses, and dummy branch entries (to reduce entry usage). The MRB has an associateive key unit which uses a unique associative key.Type: GrantFiled: September 15, 1997Date of Patent: October 30, 2001Assignee: Georgia Tech Research CorporationInventors: Joseph I. Chamdani, Cecil O. Alford