Abstract: A system implemented in hardware includes a main processing core decoding instructions for out of order execution. The instructions include template based user defined instructions. A user execution block executes the template based user defined instructions. An interface is positioned between the main processing core and the user execution block. A computer readable medium includes executable instructions to describe a processing core supporting execution of a proprietary instruction set and decoding of customized instructions that adhere to a specified pattern. The specified pattern includes a source, a destination and a latency period. A user execution block is connected to the processing core to execute the customized instructions.
Abstract: A number of coherence domains are maintained among the multitude of processing cores disposed in a microprocessor. A cache coherency manager defines the coherency relationships such that coherence traffic flows only among the processing cores that are defined as having a coherency relationship. The data defining the coherency relationships between the processing cores is optionally stored in a programmable register. For each source of a coherent request, the processing core targets of the request are identified in the programmable register. In response to a coherent request, an intervention message is forwarded only to the cores that are defined to be in the same coherence domain as the requesting core. If a cache hit occurs in response to a coherent read request and the coherence state of the cache line resulting in the hit satisfies a condition, the requested data is made available to the requesting core from that cache line.
Abstract: The present invention provides a clock ratio controller for dynamic voltage and frequency scaled digital systems, and applications thereof. In an embodiment, a digital system is provided that includes a first digital circuit that operates at a first rate determined by a first clock signal and a second digital circuit that operates at a second rate determined by a second clock signal. The first digital circuit is coupled to the second digital circuit by a bus that is used for communications between the first digital circuit and the second digital circuit. A clock ratio controller is used to adjust the frequency of the first clock signal and/or the second clock signal in response to a power management signal without causing a loss of synchronization between the first digital circuit and the second digital circuit.
Abstract: A processing system is provided consisting of an interrupt pin, multiple registers, a stack pointer, and an automatic interrupt system. The multiple registers store a number of processor states values. When the system detects an interrupt on the interrupt pin the system prepares to enter an exception mode where the automatic interrupt system causes an interrupt vector to be fetched, the stack pointer to be updated, and the processor state values to be read in parallel from the registers and stored in memory locations based on the updated stack pointer, prior to the execution of an interrupt service routine.
Type:
Application
Filed:
July 30, 2010
Publication date:
February 2, 2012
Applicant:
MIPS Technologies, Inc.
Inventors:
Erik K. Norden, David Yiu-Man Lau, James H. Robinson
Abstract: A system and methods that facilitate the design process and minimize the time and effort required to complete the design and fabrication of an integrated circuits (IC) are described. The system and method utilize a plurality of repositories, rules engines and design and verification tools to analyze the workload and automatically produce a hardened GDSII description or other representation of the device. The system and method securely maintains synthesizable RTL on a server in a data center while providing designers access to portions of the mechanism by way of a network portal.
Type:
Grant
Filed:
March 9, 2007
Date of Patent:
January 24, 2012
Assignee:
MIPS Technologies, Inc.
Inventors:
Soumya Banerjee, Todd Michael Bezenek, Clement Tse
Abstract: A method for processing multiple data packets includes receiving from a network interface, at a packet management unit (PMU), a data packet for processing, and determining whether to store or drop the data packet. The method further includes queueing the data packet in a queue within a queueing system and preloading a portion of the data packet into a context from a pool of contexts in a stream processing unit (SPU). The queued data packet referenced by the context is processed by the SPU and outputted to the network interface.
Abstract: A system for processing data packets in a data packet network has at least one input port for receiving data packets, at least one output port for sending out data packets, a processor for processing packet data, and a packet predictor for predicting a future packet based on a received packet, such that at least some processing for the predicted packet may be accomplished before the predicted packet actually arrives at the system. The system is used in preferred embodiments in Internet routers.
Abstract: A conditional move instruction implemented in a processor by forming and processing two decoded instructions, and applications thereof. In an embodiment, the conditional move instruction specifies a first source operand, a second source operand, and a third operand that is both a source and a destination. If the value of the second operand is not equal to a specified value, the first decoded instruction moves the third operand to a completion buffer register. If the value of the second operand is equal to the specified value, the second decoded instruction moves the value of the first operand to the completion buffer. When the decoded instruction that performed the move graduates, the contents of the completion buffer register is transferred to a register file register specified by the third operand.
Abstract: A fetch director in a multithreaded microprocessor that concurrently executes instructions of N threads is disclosed. The N threads request to fetch instructions from an instruction cache. In a given selection cycle, some of the threads may not be requesting to fetch instructions. The fetch director includes a circuit for selecting one of threads in a round-robin fashion to provide its fetch address to the instruction cache. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the threads is selected next. The first addend is an N-bit vector where each bit is false if the corresponding thread is requesting to fetch instructions from the instruction cache. The second addend is a 1-hot vector indicating the last selected thread.
Type:
Grant
Filed:
December 30, 2008
Date of Patent:
December 13, 2011
Assignee:
MIPS Technologies, Inc.
Inventors:
Soumya Banerjee, Michael Gottlieb Jensen
Abstract: A microprocessor coupled to a system memory by a bus includes an instruction decode unit that decodes an instruction that specifies a data stream in the system memory and a stream prefetch priority. The microprocessor also includes a load/store unit that generates load/store requests to transfer data between the system memory and the microprocessor. The microprocessor also includes a stream prefetch unit that generates a plurality of prefetch requests to prefetch the data stream from the system memory into the microprocessor. The prefetch requests specify the stream prefetch priority. The microprocessor also includes a bus interface unit (BIU) that generates transaction requests on the bus to transfer data between the system memory and the microprocessor in response to the load/store requests and the prefetch requests. The BIU prioritizes the bus transaction requests for the prefetch requests relative to the bus transaction requests for the load/store requests based on the stream prefetch priority.
Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.
Type:
Grant
Filed:
June 8, 2009
Date of Patent:
December 6, 2011
Assignee:
MIPS Technologies, Inc.
Inventors:
Timothy J. Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
Abstract: Power management control software including power management policies is provided with those policies divided into observation code and response code. When predetermined execution points within the operating system 2 are reached or an application specific task is executed, registered observation code is triggered and executed to store event data within an event data store. The operating system subsequently executes the response code to read the event data from the event data store and generates power control predictions and control signals therefrom.
Abstract: The present invention provides a clock ratio controller for dynamic voltage and frequency scaled digital systems, and applications thereof. In an embodiment, a digital system is provided that includes a first digital circuit that operates at a first rate determined by a first clock signal and a second digital circuit that operates at a second rate determined by a second clock signal. The first digital circuit is coupled to the second digital circuit by a bus that is used for communications between the first digital circuit and the second digital circuit. A clock ratio controller is used to adjust the frequency of the first clock signal and/or the second clock signal in response to a power management signal without causing a loss of synchronization between the first digital circuit and the second digital circuit.
Abstract: A method and apparatus is described for insuring coherency between memories in a multi-agent system where the agents are interconnected by one or more fabrics. A global arbiter is used to segment coherency into three phases: request; snoop; and response, and to apply global ordering to the requests. A bus interface having request, snoop, and response logic is provided for each agent. A bus interface having request, snoop and response logic is provided for the global arbiter, and a bus interface is provided to couple the global arbiter to each type of fabric it is responsible for. Global ordering and arbitration logic tags incoming requests from the multiple agents and insures that snoops are responded to according to the global order, without regard to latency differences in the fabrics.
Abstract: A coprocessor interface unit for interfacing a coprocessor to an out-of-order execution pipeline, and applications thereof. In an embodiment, the coprocessor interface unit includes an in-order instruction queue, a coprocessor load data queue, and a coprocessor store data queue. Instructions are written into the in-order instruction queue by an instruction dispatch unit. Instructions exit the in-order instruction queue and enter the coprocessor. In the coprocessor, the instructions operate on data read from the coprocessor load data queue. Data is written back, for example, to memory or a register file by inserting the data into the out-of-order execution pipeline, either directly or via the coprocessor store data queue, which writes back the data.
Abstract: Floating-point processors capable of performing multiply-add (Madd) operations and incorporating improved intermediate result handling capability. The floating-point processor includes a multiplier unit coupled to an adder unit. In a specific operating mode, the intermediate result from the multiplier unit is processed (i.e., rounded but not normalized or denormalized) into representations that are more accurate and easily managed in the adder unit. By processing the intermediate result in such manner, accuracy is improved, circuit complexity is reduced, operating speed may be increased.
Type:
Grant
Filed:
December 3, 2007
Date of Patent:
September 20, 2011
Assignee:
MIPS Technologies, Inc.
Inventors:
Ying-wai Ho, John L. Kelley, XingYu Jiang
Abstract: A processor-based method, system and apparatus to comprise a method, system and apparatus to access a memory location in an on-chip memory based on a virtual processing element identification associated with an instruction. The system comprises multiple virtual processing elements, an access list and a comparator coupled to the memory and the access list. In response to an instruction from a virtual processing element to access a memory location in the memory, the comparator compares a first virtual processing identification associated with the instruction to a second virtual processing identification stored in the access list and grants access to the virtual processing element that generated the instruction to read from or write to the memory location if the first virtual processing element identification is equal to the second virtual processing element identification. The data in the memory is allocated and de-allocated by software.
Abstract: A system, apparatus and method for managing input/output requests in a multi-processor system is disclosed. An IO coherence unit includes an IO request handler, a variable size transaction table, and an IO response handler. The size of the transaction table varies according to the number of pending IO requests. The IO request handler stores information about pending IO requests in the transaction table to establish an order among related requests and to permit out-of-order handling of unrelated requests. The IO response handler tracks responses to the IO requests and updates the information in the transaction table. The IO coherence unit returns responses to requesting devices in compliance with device ordering requirements.
Abstract: An apparatus for selecting one of a plurality of transaction queues from which to transmit a transaction out of a port of a switch. The apparatus includes a group indicator, for each of the queues, for indicating which one of a plurality of groups of the queues the queue belongs to. The apparatus also includes a group priority indicator, for each group of the plurality of groups, for indicating a priority of the group, the priority indicating a priority for transmitting transactions of the queues of the group relative to other groups of the plurality of groups. The apparatus includes selection logic, coupled to the group indicators and the priority indicators, configured to select a queue of the queues, for transmitting out of the port a transaction thereof, based on the group indicators and the group priority indicators.
Abstract: A method of controlling the exclusivity mode of a level-two cache includes generating level-two cache exclusivity control information at a processor in response to an exclusivity mode indicator, and utilizing the level-two cache exclusivity control information to configure the exclusivity mode of the level-two cache.