Performance data access

Info

Publication number: 20050223275
Type: Application
Filed: Mar 4, 2005
Publication Date: Oct 6, 2005
Inventors: Robert Jardine (Cupertino, CA), James Smullen (Carmel, CA), Graham Stott (Dublin, CA), John Friedenbach (Santa Clara, CA)
Application Number: 11/071,944

Abstract

Performance data access is described. In an embodiment, events are processed with non-synchronized processor elements of a logical processor in a redundant processor system. Performance data associated with execution of the processor events is stored in one or more accumulators corresponding to a respective processor element. The performance data from each of the non-synchronized processor elements is exchanged via a logical synchronization unit such that each processor element includes the performance data from each of the processor elements. Each processor element then conforms the performance data to generate synchronized performance data which is then communicated to a performance monitoring application that requests the performance data from the logical processor.

Description

Description

RELATED APPLICATION

This application claims priority to U.S. Provisional Application Ser. No. 60/557,812 filed Mar. 30, 2004, entitled “Nonstop Advanced Architecture”, the disclosure of which is incorporated by reference herein.

TECHNICAL FIELD

This invention relates to performance data access.

BACKGROUND

Multiple redundant processor systems are implemented as fault-tolerant systems to prevent downtime, system outages, and to avoid data corruption. A multiple redundant processor system provides continuous application availability and maintains data integrity such as for stock exchange systems, credit and debit card systems, electronic funds transfers systems, travel reservation systems, and the like. In these systems, data processing computations can be performed on multiple, independent processing elements of a processor system.

Processors in a multiple redundant processor system can be loosely synchronized in a loose lock-step implementation such that processor instructions are executed at slightly different times. This loosely synchronized implementation provides that each of the processors can execute the same instruction set faster than a typical tight lock-step configuration because the processors are not restricted to synchronized code execution. The performance of a multiple redundant processor system can be monitored to determine optimizations for software processing and for hardware configurations, such as for cache management and configuration to optimize cache hit rates.

When performance data is requested, such as the processing time for a processor event, the loosely-synchronized processor elements all execute the same instruction set in response to the request, but may all return a different performance response because the performance data is likely asymmetric (e.g., different in each of the multiple processor elements). The different data responses will appear as an error to the performance monitoring application that has requested the data.

BRIEF DESCRIPTION OF THE DRAWINGS

The same numbers are used throughout the drawings to reference like features and components:

FIG. 1 illustrates an exemplary redundant processor system in which an embodiment of performance data access can be implemented.

FIG. 2 further illustrates various components of the exemplary redundant processor system shown in FIG. 1.

FIG. 3 illustrates various components of an exemplary redundant processor system in which an embodiment of performance data access can be implemented.

FIG. 4 illustrates various components of an exemplary redundant processor system in which an embodiment of performance data access can be implemented.

FIG. 5 is a flow diagram that illustrates an embodiment of a method for performance data access.

DETAILED DESCRIPTION

The following describes embodiments of performance data access. Performance monitoring is implemented to obtain system performance data from loosely-synchronized processor elements. Examples of performance data for a redundant processor system include time intervals for performing instruction sequences and counts of various processor events.

Although embodiments of performance data access may be implemented in various redundant processor systems, performance data access is described with reference to the following processing environment.

FIG. 1 illustrates an example of a redundant processor system 100 in which embodiment(s) of performance data access can be implemented. The redundant processor system 100 includes a processor complex 102 which has processor groups 104(1-3). Each processor group 104 includes any number of processor elements which are each a microprocessor that executes, or processes, computer executable instructions. For example, Processor group 104(1) includes processor elements 106(1-N), processor group 104(2) includes processor elements 108(1-N), and processor group 104(3) includes processor elements 110(1-N).

Processor elements, one each from the processor groups 104(1-3), are implemented together as a logical processor 112(1-N). For example, a first logical processor 112(1) includes processor element 106(1) from processor group 104(1), processor element 108(1) from processor group 104(2), and processor element 110(1) from processor group 104(3). Similarly, logical processor 112(2) includes processor elements 106(2), 108(2), and 110(2), while logical processor 112(3) includes processor elements 106(3), 108(3), and 110(3). In an alternate embodiment, a logical processor 112 may be implemented to include only two processor elements 106. For example, a processor complex may be implemented with two processor groups such that each logical processor includes two processor elements, one from each of the two processor groups.

In the example shown in FIG. 1, the three processor elements combine to implement a logical processor 112 and cooperate to perform the computations of the logical processor 112. Logical computations for an input/output operation or an interprocessor communication are executed separately three times in a logical processor 112, once each in the three processor elements of the logical processor 112. Additionally, the three processor elements in a logical processor 112 can coordinate and synchronize with each other to exchange data, replicate input data, and vote on input/output operations and communication outputs.

Each processor group 104(1-3) has an associated memory component 114(1-3), respectively. A memory component 114 can be implemented as any one or more memory components, examples of which include random access memory (RAM), DRAM, SRAM, a disk drive, and the like. Although the memory components 114(1-3) are illustrated as independent components, each processor group 104 can include a respective memory component 114 as an integrated component in an alternate embodiment.

In this example, processor complex 102 is a triplex redundant processor system having triple modular redundancy in that each logical processor 112 includes three redundant processor elements. To maintain data integrity, a faulty processor element can be replaced and reintegrated into the system while the redundant processor system 100 remains on-line without a loss of processing capability. Similarly, in an alternate embodiment, a duplex redundant processor system has dual modular redundancy in that each logical processor includes two redundant processor elements.

The processor elements of a logical processor 112 are loosely synchronized in a loose lock-step implementation such that instructions may be executed, or processed, in each of the processor elements at a slightly different time. This implementation provides that the logical processors can execute instructions faster than a typical tight lock-step configuration because the processor elements and logical processors 112 are not restricted to synchronized code execution. This implementation also provides for non-deterministic execution among the processor elements in a logical processor, such as non-deterministic branch prediction, cache replacement algorithms, and the like. The individual processor elements can also perform independent error recovery without losing synchronization with the other processor elements.

FIG. 2 further illustrates various components 200 of the redundant processor system 100 shown in FIG. 1. The processor elements 106(1-N) of processor group 104(1) are shown, one each of a respective logical processor 112(1-N). Each processor element 106(1-N) is associated with a respective memory region 202(1-N) of the memory component 114(1) for data storage. The memory component 114(1) associated with processor group 104(1) is partitioned among the processor elements 106(1-N) of the processor group 104(1). In an alternate embodiment, each memory region 202(1-N) can be implemented as an independent, separate memory for data storage. Although not shown, the processor elements 108(1-N) of processor group 104(2) are each associated with a respective partitioned memory region of the memory component 114(2). Similarly, the processor elements 110(1-N) of processor group 104(3) are each associated with a respective partitioned memory region of the memory component 114(3).

Each of the logical processors 112(1-N) correspond to one or more respective logical synchronization units 204(1-N). A logical synchronization unit 204 performs various rendezvous operations for an associated logical processor 112 to achieve agreements on data synchronization between the processor elements that cooperate to form a logical processor 112. For example, input/output operations and/or interprocessor communications can be communicated from each processor element of a logical processor 112 to an associated logical synchronization unit 204 to compare and vote on the input/output operations and/or interprocessor communications generated by the processor elements. Logical synchronization units and rendezvous operations are described in greater detail in U.S. patent application Ser. No. ______, which is Attorney Docket No. 200316143-1 entitled “Method and System of Executing User Programs on Non-Deterministic Processors” filed Jan. 25, 2005, to Bernick et al., the disclosure of which is incorporated by reference herein for the purpose of implementing performance data access.

A rendezvous operation may further be implemented by a logical synchronization unit 204 to exchange state information and/or data among the processor elements of a logical processor 112 to synchronize operations and responses of the processor elements. For example, a rendezvous operation may be implemented such that the processor elements deterministically respond to incoming asynchronous interrupts, to accommodate varying processing rates of the processor elements, to exchange software state information when performing operations that are distributed across the processor elements, and the like.

FIG. 3 illustrates various components of an exemplary redundant processor system 300 in which an embodiment of performance data access can be implemented. The redundant processor system 300 includes multiple logical processors and associated logical synchronization units as described with reference to the redundant processor system 100 shown in FIGS. 1 and 2. For illustration, however, only one logical processor 302 and one associated logical synchronization unit 304 is shown in FIG. 3. The logical synchronization unit 304 may be implemented as described with reference to the logical synchronization units 204 shown in FIG. 2.

In this example, logical processor 302 includes processor elements 306(1-3) which are each a microprocessor that executes, or processes, computer executable instructions. The redundant processor system 300 includes the memory components 114(1-3) that are each associated with a respective processor group 104(1-3) as shown in FIG. 1. Each of the processor elements 306(1-3) are one of the processor elements in a respective processor group, and each processor element 306 is associated with a partitioned memory region 308 in a respective memory component 114(1-3). For example, processor element 306(1) corresponds to memory region 308(1) in memory component 114(1), processor element 306(2) corresponds to memory region 308(2) in memory component 114(2), and processor element 306(3) corresponds to memory region 308(3) in memory component 114(3).

The memory regions 308(1-3) form a logical memory 310 that corresponds to logical processor 302. The processor elements 306(1-3) of the logical processor 302 each correspond to a respective partitioned memory region 308(1-3) of the logical memory 310. In practice, a logical processor 302 can communicate with a corresponding logical memory 310 via an input/output bridge memory controller (not shown).

The memory components 114(1-3) each include an instantiation of performance monitoring logic 312(1-3) that corresponds to a respective processor element 306(1-3) of the logical processor 302. Each of the processor elements 306(1-3) can execute the performance monitoring logic 312 to implement performance data access. In this example, the performance monitoring logic 312(1-3) is maintained by the memory components 114(1-3) as a software application.

As used herein, the term “logic” (e.g., the performance monitoring logic 312) can also refer to hardware, firmware, software, or any combination thereof that may be implemented to perform the logical operations associated with performance data access. Logic may also include any supporting circuitry utilized to complete a given task including supportive non-logical operations. For example, logic may also include analog circuitry, memory components, input/output (I/O) circuitry, interface circuitry, power providing/regulating circuitry, and the like.

Each of the processor elements 306(1-3) of logical processor 302 include a high-frequency clock 314, a cache memory 316, and one or more accumulators 318, respectively. For illustration, only the clock 314, cache memory 316, and accumulator(s) 318 for processor element 306(1) are shown. The description of the processor element components, however, applies to each processor element 306(1-3). The one or more accumulators 318 of a processor element 306 can be implemented as memory to store, update, and/or maintain performance data corresponding to a respective processor element 306.

The performance monitoring logic 312(1-3) implements performance data access such that system performance data can be obtained from the non-synchronized processor elements 306(1-3) of the logical processor 302. The performance of the processor elements 306(1-3) can be monitored for time durations to execute processor events, such as a procedure, and for any number of other operational features, such as cache hit rates, interrupt handling, and the like. While the non-synchronized processor elements 306(1-3) all execute the same instruction set (e.g., a processor event or procedure), each may return a different performance response and the corresponding performance data is likely asymmetric (e.g., different in each of the multiple processor elements 306).

The different performance data responses from each of the processor elements 306(1-3) may appear as an error when the data is compared by the logical synchronization unit 304, such as when an output operation of the performance data response is performed. The different performance data responses may also appear as an error if the performance monitoring logic 312 makes a decision based on that data and branches two (or three) different directions causing different action sequences that can be detected by the logical synchronization unit 304.

In an embodiment of performance data access, the performance data requested by the performance monitoring logic 312 can be exchanged via a rendezvous operation with the logical synchronization unit 304 such that the performance monitoring logic 312 receives consistent data from the processor elements 306(1-3). For example, a procedure may take 6.3 microseconds for processor element 306(1) to execute, 6.4 microseconds for processor element 306(2) to execute, and 5.9 microseconds for processor element 306(3) to execute. The time duration for each processor element 306 to execute the procedure can be stored in an accumulator 318 for each respective processor element 306(1-3).

When the performance data for each of the processor elements 306(1-3) is requested by the performance monitoring logic 312, the logical synchronization unit 304 exchanges the performance data of each of the processor elements such that each processor element has a copy of all three processor elements' individual performance measurement. For example, processor element 306(1) will have the 6.3 microseconds to execute the procedure, the 6.4 microseconds for processor element 306(2) to execute the procedure, and the 5.9 microseconds for processor element 306(3) to execute the procedure.

Each of the processor elements 306(1-3) then conform, or synchronize, the performance data. In this example, the 6.3 microseconds, 6.4 microseconds, and 5.9 microseconds can be averaged as 6.2 microseconds to execute the procedure. The averaging operation is deterministic, and all three processor elements 306(1-3) will arrive at the same answer of 6.2 microseconds. The average 6.2 microseconds is then returned to the performance monitoring logic 312 as the synchronized performance data.

Other conforming operations or algorithms can be implemented to synchronize the performance data from the multiple processor elements 306(1-3). For example, the processor elements 306(1-3) can select a performance measurement from any one of the processor elements 306(1-3), such as the minimum performance measurement, the middle performance measurement, or the maximum performance measurement corresponding to a particular processor element 306. Alternatively, the processor elements 306(1-3) can discard the performance data value that is the farthest from the other two, and then average the two remaining performance data values (e.g., for a system with triple modular redundancy), or any other form of a deterministic algorithm can be implemented.

Alternatively, each processor element 306(1-3) can replicate the performance measurements from the other processor elements 306. For example, prior to the logical synchronization unit 304 exchange of data, processor element 306(1) will have value A, processor element 306(2) will have value B, and processor element 306(3) will have value C. After the data exchange, each processor element 306(1-3) will have all three values A, B, and C which are replicated as if each processor element generated the performance data three times rather than just the one time.

In an implementation, the time duration of a processor event can be determined by obtaining a first time from a clock 314 of the respective processor element 306 at the beginning of a processor event, and subtracting the first time from an accumulator 318 of the processor element 306. A second time can be obtained from the clock 314 after the processor event has been executed by the processor element. The second time is then added to the accumulator 318 such that a time difference between the first time and the second time is the time duration of the processor event. The time duration is maintained in the accumulator 318 as the performance data.

For multiple performance data requests, alternate embodiments of performance data access can be implemented if it is not practicable to conform each individual performance data measurement of the processor elements 306(1-3). For example, the processor time required to accomplish each individual exchange and conforming operation may not be available within the implementation constraints of a redundant processor system.

In another embodiment of performance data access, the performance data is accumulated, or aggregated, in the accumulators 318 for the respective processor elements 306(1-3). For example, time durations for multiple executions of a repeated processor event can be stored and updated as the performance data in the accumulators 318 of each respective processor element 306(1-3). A procedure may be executed as a processor event multiple times by each of the processor elements 306(1-3). For a procedure that is executed ten-thousand times, and which takes on average 3 microseconds to execute, the accumulated time duration would be approximately 30 milliseconds. An accumulator 318 for processor element 306(1) can have stored performance data of 31.5 milliseconds, an accumulator 318 for processor element 306(2) can have stored performance data of 32.3 milliseconds, and an accumulator 318 for processor element 306(3) can have stored performance data of 29.7 milliseconds.

When the performance data for each of the processor elements 306(1-3) is requested by the performance monitoring logic 312, the logical synchronization unit 304 exchanges the data and an average (or other conforming operation) of the performance data for each processor element 306(1-3) is synchronized. In this example, an average 3.15 microseconds for processor element 306(1), an average 3.23 microseconds for processor element 306(2), and an average 2.97 microseconds for processor element 306(3) can be averaged, or conformed, to approximately 3.12 microseconds to execute the procedure each time. The approximate 3.12 microseconds is then returned to the performance monitoring logic 312 as the synchronized performance data.

This embodiment of performance data access avoids the extensive processing overhead of exchanging and conforming the performance data for each individual measurement, and provides performance data obtained for multiple processor events over a duration of time. The asymmetric performance data is maintained by the accumulators 318 in each respective processor element 306 such that the performance monitoring logic 312 can not directly access the performance data. Rather, the performance monitoring logic interfaces with the processor elements 306(1-3) of the logical processor 302 via application program interfaces (APIs) for performance data access.

In an implementation of performance data access, code (e.g., software) executing in each of the processor elements 306(1-3) interfaces with an array of the accumulators 318. The performance monitoring logic 312 calls the code via APIs to register and have accumulator(s) allocated, and to request the performance data stored in the accumulator(s). The code communicates the requested performance data to the logical synchronization unit 304, and the performance data is conformed, or synchronized. In an embodiment, the code can be implemented as millicode which is software running as the lowest-level software in the operating system.

FIG. 4 illustrates various components of an exemplary redundant processor system 400 in which an alternate embodiment of performance data access can be implemented. As described above with reference to the exemplary redundant processor system 300 shown in FIG. 3, logical processor 302 includes processor elements 306(1-3) which are each a microprocessor that executes processor events as computer executable instructions. The redundant processor system 400 includes the memory components 114(1-3) that are each associated with a respective processor group 104(1-3) as shown in FIG. 1. Further, each processor element 306 is associated with a partitioned memory region 308 in a respective memory component 114(1-3).

Each of the processor elements 306(1-3) of logical processor 302 include a high-frequency clock 314, a cache memory 316, and one or more accumulators 318, respectively. For illustration, only the clock 314, cache memory 316, and accumulator(s) 318 for processor element 306(1) are shown. The description of the processor element components, however, applies to each processor element 306(1-3). The one or more accumulators 318 of a processor element 306 can be implemented as memory to store, update, and/or maintain performance data corresponding to the respective processor element 306.

The exemplary redundant processor system 400 includes a remote computing device 402 configured for communication with components of the redundant processor system via a communication network 404. The remote computing device 402 includes a performance monitoring application 406 which implements performance data access as described above with reference to FIG. 3. Performance data can be requested by the performance monitoring application 406 and obtained from the non-synchronized processor elements 306(1-3) of the logical processor 302.

The performance of the processor elements 306(1-3) can be monitored for time durations to execute processor events, such as a procedure, and for any number of other operational features, such as cache hit rates, interrupt handling, and the like. While the non-synchronized processor elements 306(1-3) all execute the same instruction set (e.g., a processor event), each may return a different performance response and the corresponding performance data is likely asymmetric (e.g., different in each of the multiple processor elements 306). The different performance data responses from each of the processor elements 306(1-3) may appear as an error to the performance monitoring application 406 when the performance data responses are compared (or “voted”) by the logical synchronization unit 304.

The performance data requested by the performance monitoring application 406 can be exchanged via a rendezvous operation with the logical synchronization unit 304 and synchronized in each of the processor elements 306(1-3) such that the performance monitoring application 406 receives consistent data from each of the processor elements 306(1-3). The performance monitoring application 406 calls code (e.g., software) executed by each of the processor elements 306(1-3) via APIs to register and have accumulator(s) allocated, and to request that the performance data be stored in the accumulator(s). The code communicates the requested performance data to the logical synchronization unit 304 which exchanges the performance data. The performance data is conformed, or synchronized, in the processor elements 306(1-3) before being returned to the remote computing device 402 and to the performance monitoring application 406 via the communication network 404.

Methods for performance data access, such as exemplary method 500 described with reference to FIG. 5, may be described in the general context of computer executable instructions. Generally, computer executable instructions include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. The methods may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.

FIG. 5 illustrates an embodiment of a method 500 for performance data access. The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks may be combined in any order to implement the method. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.

At block 502, processor events are processed with non-synchronized processor elements of a logical processor in a redundant processor system. For example, each processor element 306(1-3) of logical processor 302 (FIG. 3) executes the same set of computer executable instructions, such as for a procedure or processor event. At block 504, time duration(s) of processor events are determined. For example, time durations for multiple executions of a repeated processor event can be determined.

In an embodiment of performance data access to determine a time duration of a processor event, a first time is obtained from a clock of a processor element at block 504(A). For example, a time is obtained from clock 314 of processor element 306(1) at the beginning of a processor event. At block 504(B), the first time is subtracted from a time stored in an accumulator of the processor element. For example, the time obtained from clock 314 is subtracted from accumulator 318 for the respective processor element 306(1).

If the time stored in the accumulator is initially zero, then the time obtained from clock 314 will be subtracted from zero and the accumulator will initially have a negative time. At block 504(C), a second time is obtained from the clock of the processor element after the processor event has been executed. At block 504(D), the second time is added to the accumulator such that a time difference between the first time and the second time is the time duration of the processor event. To accumulate multiple time durations for multiple executions of a repeated processor event or procedure, the method blocks 504(A-D) can be repeated to accumulate the performance data of processor elements 306(1-3). Each beginning time of a processor event is subtracted from the accumulator at block 504(B) and each time after the processor event has executed is added to the accumulator at block 504(D) such that a sum of all the time differences is accumulated.

At block 506, performance data associated with execution of the processor event(s) is stored in one or more accumulators corresponding to a respective processor element. For example, each processor element 306(1-3) includes one or more accumulators 318 to store, update, and maintain performance data associated with a respective processor element 306. Storing the performance data includes storing time duration(s) of a processor event as the performance data. For example, processor element 306(1) stores a first time duration of a processor event in an accumulator 318 of the processor element 306(1), processor element 306(2) stores a second time duration of the processor event in an accumulator 318 of the processor element 306(2), and processor element 306(3) stores a third time duration of the processor event in an accumulator 318 of the processor element 306(3). Performance data may also include counts of a repeated processor event, such as cache hits or misses, for example.

At block 508, the performance data from each of the non-synchronized processor elements is conformed as synchronized performance data. Conforming the performance data includes conforming an average of the time durations from each of the non-synchronized processor elements to generate the synchronized performance data. The logical synchronization unit 304 exchanges the performance data from each of the processor elements 306(1-3), and each of the processor elements conform the performance data to generate the synchronized performance data.

At block 510, the synchronized performance data is communicated to a performance monitoring application or logic that requests the performance data from the logical processor (e.g., the performance data stored in the one or more accumulators of the non-synchronized processor elements). For example, the logical synchronization unit 304 communicates the synchronized performance data to the performance monitoring logic 312 (FIG. 3) and/or to a performance monitoring application 406 in a remote computing device 402 (FIG. 4).

Although embodiments of performance data access have been described in language specific to structural features and/or methods, it is to be understood that the subject of the appended claims is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as exemplary implementations of performance data access.

Claims

1. A redundant processor system, comprising:

non-synchronized processor elements of a logical processor, each processor element configured to process events and store performance data associated with execution of the processor events, each processor element including one or more accumulators configured to maintain the performance data corresponding to a respective processor element;

performance monitoring logic configured to request the performance data from the logical processor; and

a logical synchronization unit configured to exchange the performance data from each of the non-synchronized processor elements and return synchronized performance data to the performance monitoring logic, the synchronized performance data being generated by the processor elements.

2. A redundant processor system as recited in claim 1, wherein each of the processor elements are further configured to conform the performance data exchanged from each of the processor elements to generate the synchronized performance data.

3. A redundant processor system as recited in claim 2, wherein each of the processor elements are further configured to average the performance data exchanged from each of the processor elements to generate the synchronized performance data.

4. A redundant processor system as recited in claim 2, wherein each of the processor elements are further configured to conform the performance data exchanged from each of the processor elements based on a deterministic algorithm to generate the synchronized performance data.

5. A redundant processor system as recited in claim 2, wherein each of the processor elements are further configured to select the performance data from a particular processor element to generate the synchronized performance data, the selected performance data being at least one of a minimum, a middle, or a maximum of the performance data exchanged from each of the processor elements.

6. A redundant processor system as recited in claim 1, wherein:

a first time duration of a processor event is stored as performance data in a first accumulator of a first processor element;

a second time duration of the processor event is stored as performance data in a second accumulator of a second processor element;

a third time duration of the processor event is stored as performance data in a third accumulator of a third processor element; and

the logical synchronization unit is further configured to receive the first time duration, the second time duration, and the third time duration, and exchange the time durations with each of the processor elements.

7. Non-synchronized processors of a multiple redundant processor system each configured to maintain and update performance data associated with executing processor events, the performance data stored in one or more accumulators of a respective non-synchronized processor, and the performance data from each of the non-synchronized processors being conformed as synchronized performance data after being exchanged via a logical synchronization unit in response to a request for the performance data from a performance monitoring application.

8. Non-synchronized processors as recited in claim 7, wherein each of the non-synchronized processors are further configured to conform the performance data from each of the non-synchronized processors after the performance data is exchanged via the logical synchronization unit.

9. Non-synchronized processors as recited in claim 7, wherein each of the non-synchronized processors are further configured to average the performance data from each of the non-synchronized processors after the performance data is exchanged via the logical synchronization unit.

10. Non-synchronized processors as recited in claim 7, wherein a time duration of a processor event is stored as the performance data in an accumulator of the respective non-synchronized processor.

11. Non-synchronized processors as recited in claim 7, wherein counts for multiple executions of a repeated processor event are stored as the performance data in an accumulator of the respective non-synchronized processor.

12. Non-synchronized processors as recited in claim 7, wherein time durations for multiple executions of a repeated processor event are stored and updated as the performance data in an accumulator of the respective non-synchronized processor.

13. Non-synchronized processors as recited in claim 7, wherein each non-synchronized processor includes a clock, and wherein:

a first time is obtained from the clock at a beginning of a processor event, and the first time is subtracted from an initial time stored in an accumulator of the respective non-synchronized processor;

a second time is obtained from the clock after the processor event has been executed by the non-synchronized processor; and

the second time is added to the accumulator such that a time difference between the first time and the second time is a time duration of the processor event that is maintained as the performance data in the accumulator of the respective non-synchronized processor.

14. A method, comprising:

processing events with non-synchronized processor elements of a logical processor in a redundant processor system;

storing performance data associated with execution of the processor events in one or more accumulators corresponding to a respective processor element;

exchanging the performance data such that each of the processor elements includes the performance data from each of the other non-synchronized processor elements;

conforming the performance data from each of the non-synchronized processor elements to generate synchronized performance data; and

communicating the synchronized performance data to a performance monitoring application that requests the performance data from the logical processor.

15. A method as recited in claim 14, wherein each of the processor elements conform the performance data exchanged from each of the processor elements to generate the synchronized performance data.

16. A method as recited in claim 14, wherein conforming the performance data includes each of the processor elements averaging the performance data to generate the synchronized performance data.

17. A method as recited in claim 14, wherein conforming the performance data includes each of the processor elements using a deterministic algorithm to conform the performance data to generate the synchronized performance data.

18. A method as recited in claim 14, wherein conforming the performance data includes each of the processor elements selecting the performance data from a particular processor element, the selected performance data being at least one of a minimum, a middle, or a maximum of the performance data exchanged from each of the processor elements.

19. A method as recited in claim 14, further comprising determining a time duration of a processor event, and wherein storing the performance data includes storing the time duration of the processor event as the performance data.

20. A method as recited in claim 14, further comprising accumulating counts for multiple executions of a repeated processor event, and wherein storing the performance data includes storing the counts of the repeated processor event as the performance data.

21. A method as recited in claim 14, wherein storing the performance data includes:

storing a first time duration of a processor event in a first accumulator of a first processor element;

storing a second time duration of the processor event in a second accumulator of a second processor element;

storing a third time duration of the processor event in a third accumulator of a third processor element; and

wherein conforming the performance data includes conforming the first time duration, the second time duration, and the third time duration to generate the synchronized performance data.

22. A method as recited in claim 14, wherein communicating the synchronized performance data includes communicating the synchronized performance data to the performance monitoring application in a remote computing device configured for communication with the redundant processor system.

23. One or more computer readable media comprising computer executable instructions that, when executed, direct a performance data access system to:

process events with non-synchronized processor elements of a logical processor in a redundant processor system;

store performance data associated with execution of the processor events in one or more accumulators corresponding to a respective processor element;

conform the performance data from each of the non-synchronized processor elements to generate synchronized performance data; and

communicate the synchronized performance data to a performance monitoring application that requests the performance data from the logical processor.

24. One or more computer readable media as recited in claim 23, further comprising computer executable instructions that, when executed, direct the performance data access system to exchange the performance data such that each of the non-synchronized processor elements includes the performance data from each of the other non-synchronized processor elements.

25. One or more computer-readable media as recited in claim 23, further comprising computer executable instructions that, when executed, direct the performance data access system to:

store a first time duration of a processor event in a first accumulator of a first processor element;

store a second time duration of the processor event in a second accumulator of a second processor element;

store a third time duration of the processor event in a third accumulator of a third processor element; and

conform the first time duration, the second time duration, and the third time duration to generate the synchronized performance data.