Failure data collection system apparatus and method
An apparatus for collecting dump data collection receives an on demand data (ODD) dump request, pauses one or more scan loops, transfers dump data to an ODD dump buffer space, unpauses the scan loops, and offloads dump data from the ODD dump buffer space to the storage device. The apparatus may also prioritize dump data for transfer to the ODD dump buffer space, load balance dump data for transfer to the ODD dump buffer space, and schedule offloading of the dump data from the ODD dump buffer space to non-volatile storage.
Latest IBM Patents:
1. Field of the Invention
This invention relates to systems, apparatus, and methods for recovering data and more particularly relates to systems, apparatus, and methods for collecting dump data.
2. Description of the Related Art
Dump data often includes data located in the volatile memory of a digital system (such as a storage controller) at the time of a processing error or failure. Dump data is valuable when assessing the performance of a digital system. Dump data may be directly associated with the performance of one or more hardware and/or software components of the digital system. Though the value of dump data is clear, current solutions to collecting dump data include certain shortcomings.
For example, many dump data collection solutions include inconveniently restarting the digital system commonly referred to as a warmstart. A warmstart is effective to collect dump data because a warmstart suspends operation of scan loops also referred to as event loops or work dispatchers to ensure the volatile memory data is not altered before it can be collected. In addition to requiring time, warmstarting a digital device often results in a storage controller busy signal being transmitted to an associated host computer which suspends system operations. Suspending system operations is more severe in systems that include multiple host computers, storage controllers, and storage devices.
SUMMARY OF THE INVENTIONThe present invention has been developed in response to the present state of the art, and in particular, in response to the problems and needs in the art that have not yet been fully solved by currently available failure data collection solutions. Accordingly, the present invention has been developed to provide an apparatus, system, and method for collecting dump data.
In one aspect of the present invention, a dump data collection system includes one or more host computers that communicate with storage controllers that in turn communicate with storage devices. Each storage controller may receive an on demand data (ODD) dump request, pause one or more storage controller scan loops, transfer dump data to an ODD dump buffer space, unpause the scan loops, and offload the dump data from the ODD dump buffer space to the storage devices.
In another aspect of the present invention, a dump data collection apparatus includes a communication module that receives an on demand data (ODD) dump request, a scan loop management module that pauses one or more scan loops in response to the ODD dump request, and a dump data transfer module that transfers dump data to an ODD dump buffer space. The scan loop management module may also unpause the scan loops to enable the scan loops to resume normal operation, and the dump data transfer module may offload the dump data from the ODD dump buffer space to storage.
In certain embodiments, the scan loop management module may pause one or more scan loops, initiate a scan loop pause timer, attempt to pause any remaining scan loops before expiration of the scan loop pause timer, and unpause all scan loops if all scan loops are not paused before expiration of the scan loop pause timer. In certain embodiments, the scan loop management module may also reattempt to pause all scan loops and reinitiate the scan loop pause timer upon expiration of a reset timer.
In certain embodiments, the apparatus also includes a dump data prioritization module that prioritizes dump data before dump data is transferred to the ODD dump buffer space. In certain embodiments, the dump data prioritization module is further configured to register and deregister perspective dump data in real-time to facilitate dump data prioritization. In certain embodiments, the apparatus includes a load balance module that load balances dump data amongst multiple processing threads that simultaneously transfer dump data to distinct segments of the ODD dump buffer space.
In certain embodiments, the dump data transfer module also transfers dump data to the ODD dump buffer space until expiration of a dump data transfer timer. In certain embodiments, the scan loop management module also restarts the scan loops to normal processing in spite of an incomplete data dump. In certain embodiments, the apparatus includes an offload schedule module that schedules the offloading of the dump data from the ODD dump buffer space to storage so as to maximize performance.
A method of the present invention is also presented for collecting dump data. The method in the disclosed embodiments substantially includes the operations necessary to carry out the functions presented above with respect to the described system and apparatus. In one embodiment, the method includes receiving an on demand data (ODD) dump request, pausing one or more scan loops, transferring dump data to an ODD dump buffer space, unpausing the scan loops, and offloading dump data from the ODD dump buffer space to storage.
Furthermore, the described features, advantages, and characteristics of the invention may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize that the invention may be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the invention. These features and advantages of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
In order that the advantages of the invention will be readily understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:
Many of the functional units described in this specification have been labeled as modules, in order to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising custom VLSI circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
Modules may also be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the module and achieve the stated purpose for the module.
Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, among different processors, and across several memory devices. Similarly, operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices.
Reference throughout this specification to “one embodiment,” “an embodiment,” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
A computer readable medium may be embodied by a compact disk, digital-video disk, a magnetic tape, a Bernoulli drive, a magnetic disk, holographic disk or tape, a punch card, flash memory, magnetoresistive memory, integrated circuits, or other digital processing apparatus memory device.
Furthermore, the described features, structures, or characteristics of the invention may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
In certain embodiments, the host computer 110 communicates and executes input/output operations corresponding to the storage device 130 via the storage controller 120. In certain embodiments, the storage controller 120 receives an on demand data (ODD) dump request. In response to the ODD dump request, the storage controller 120 may pause any storage controller scan loops to ensure that data in the non-volatile memory of the storage controller 120 is not altered.
In certain embodiments, the storage controller 120 transfers dump data to an ODD dump buffer space (see
In certain embodiments, the communication module 210 receives an ODD dump request. The ODD dump request may originate from a variety of sources such as a user/operator, the host computer 110, a companion storage controller (see
In certain embodiments, the dump data prioritization module 230 prioritizes the dump data. Prioritizing dump data ensures that the dump data of the highest priority is transferred to the ODD dump buffer space 260 first. The description of
The ODD dump buffer space 260 may include a selected volume of volatile memory for temporarily storing dump data. The scan loop management module 220 may unpause and/or restart the scan loops to enable the scan loops to resume normal operations because the dump data has been transferred to the ODD dump buffer space 260. Providing an ODD dump buffer space 260 enables an efficient relocation to temporary store dump data and ensures that the dump data may be transferred to non-volatile storage without alteration by the dump data offload module 280. The offload schedule module 270 may schedule the transfer of the dump data to storage with minimal burden to the performance of the storage controller 200. As such, the present invention provides an efficient solution to performing an on demand data (ODD) dump.
Receiving 310 an ODD dump request may include receiving an ODD dump request from a host computer or storage controller operator. Pausing 320 scan loops may include pausing one or more scan loops associated with a storage controller so that the data located in volatile memory is not altered. Prioritizing 330 dump data may include prioritizing data in a volatile memory volume according to selected prioritization instructions to ensure that the data of the highest priority is transferred to an ODD dump buffer space first. Load balancing 340 dump data may include balancing the dump data to be transferred to the ODD dump buffer space amongst any or all of the processing threads to maximize efficient data transfer.
Transferring 350 dump data to an ODD dump buffer space may include transferring the dump data from a volatile memory volume specified for more general use to a selected volatile memory volume specified to operate as a dump data buffer, for the duration of a dump data transfer timer. In certain embodiments, providing a dump data transfer timer ensures that the storage controller will be returned to normal operating conditions within an acceptable period of time.
Unpausing 360 the scan loops may include enabling the scan loops to restart or resume normal operations. Scheduling 370 offload of dump data from the buffer space to storage may include scheduling an offload of the dump data within normal input/output operations of the storage controller so little or no effect of the performance of the storage controller. Offloading 380 dump data may include transferring the dump data from the ODD dump buffer space to a non-volatile storage volume for storage.
Receiving 410 an ODD dump request may include receiving an ODD dump request from a host computer, an operator, or a storage controller. Pausing 420 a first scan loop may include pausing the scan loop that process reception of the ODD dump request. Initiating 430 a scan loop pause timer may include initiating a timer for attempting to pause the remaining scan loops. A scan loop pause timer ensures that the system will not unsuccessfully attempt to pause the remaining scan loops for an undesirably long period of time. Attempting 435 to pause remaining scan loops may include instructing any other scan loops to discontinue processing input/output requests so as to maintain the integrity of the dump data in volatile memory.
Determining 440 whether any remaining scan loops are busy may include determining whether the attempt to pause the remaining scan loops was successful upon the expiration of the scan loop pause timer initiated by operation 435. Assuming that at least one of the remaining scan loops is busy (possibly because the scan loop is processing an instruction of higher priority), unpausing 450 all the scan loops to enable the paused scan loops to normal resume input/output operations.
Initiating 455 a rest timer and waiting 460 for the expiration of the rest timer may include allowing the scan loops to perform normal input/output operations for a given period of time before reattempting to pause the first scan loop and so on. Once all the scan loops are successfully paused, the method 400 continues by performing 470 remaining ODD dump operations as described in
Determining 510 component priority may include receiving component priority instructions from a user/operator. In certain embodiments, priority instructions may include how one component is prioritized with respect to another and how dump data corresponding to each component should be prioritized. In certain embodiments, an operator may register or deregister priority information corresponding to perspective dump data in real-time to facilitate data prioritization which enables developers to focus on different types of dump data through the development cycle.
Determining 520 component buffer space minimums may include determining whether any component has been assigned more buffer space than is necessary. Determining 530 free buffer space may include determining the buffer space already allocated to the component buffer space minimums. Free buffer space may be dynamically allocated to another component upon transferring the dump data to the ODD dump buffer space, according to component priority. Determining 540 component buffer space maximums may include determining the maximum amount of buffer space that each component may use to ensure that the component with the highest priority is not allocated all of the free buffer space.
As such, when dump data is transferred to the ODD dump buffer space, the data may be transferred according to the priority determined by operation 510, first with respect to component buffer space minimums and then to the free buffer space in accordance with the component priority order and the component buffer space maximums. One of skill in the art will appreciate that, in certain embodiments, this more general prioritization method 500 may be altered depending upon the source of the ODD dump request, type of ODD dump request (user-specified test cases), or the type of operations that were being performed by the storage controller.
The host computers 610 may communicate with the storage controller 620 and thereby execute input/output operations with respect to the data storage devices 630. The storage controllers 620 may receive an ODD dump request from the host computers 610, an operator/user, or a storage controller 620. Similarly, the storage controllers may store collected dump data in either of the data storage devices 630. In this manner, the present invention may be implemented with multiple components and over a local or distributed network.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims
1. An apparatus for collecting dump data, the apparatus comprising:
- a computer readable medium storing machine-readable instructions;
- a processor executing the machine-readable instructions, the machine-readable instructions comprising:
- a communication module receiving an on demand data (ODD) dump request from a host to transfer dump data from a storage controller memory of a storage controller distinct from the host, wherein the storage controller comprises at least two components and the ODD dump request comprises a component priority for each component;
- a scan loop management module pausing at least one scan loop performed by the storage controller on the storage controller memory in response to receiving the ODD dump request;
- a dump data transfer module allocating free buffer space within an ODD dump buffer space for each component according to the component priority by determining a component buffer space minimum and a component buffer space maximum for each component, wherein the component buffer space minimum comprises previously allocated buffer space and the component buffer space maximum is less than all free buffer space, transferring the dump data from the storage controller memory to the ODD dump buffer space according to the component priority, and upon transferring the dump data for a first component, dynamically allocating free buffer space from the first component with a higher priority to a second component with a lower priority;
- the scan loop management module further unpausing the at least one scan loop in response to the transfer of the dump data to the ODD dump buffer space; and
- a dump data offload module offloading the dump data from the ODD dump buffer space to a data storage device distinct from the host and the storage controller.
2. The apparatus of claim 1, the scan loop management module further pausing the at least one scan loop, initiating a scan loop pause timer, attempting to pause any remaining scan loops before expiration of the scan loop pause timer, and unpausing all scan loops if all scan loops are not paused before expiration of the scan loop pause timer.
3. The apparatus of claim 2, the scan loop management module further reattempting to pause all scan loops and reinitiating the scan loop pause timer upon expiration of a rest timer.
4. The apparatus of claim 1, further comprising a dump data prioritization module prioritizing dump data before dump data is transferred to the ODD dump buffer space.
5. The apparatus of claim 4, the dump data prioritization module further registering and deregistering perspective dump data in real-time to facilitate dump data prioritization.
6. The apparatus of claim 1, further comprising a load balance module balancing dump data amongst multiple processing threads that simultaneously transfer the dump data to distinct segments of the ODD dump buffer space.
7. The apparatus of claim 1, the dump data transfer module further transferring the dump data to the ODD dump buffer space until expiration of a dump data transfer timer.
8. The apparatus of claim 7, the scan loop management module further restarting the at least one scan loop to normal processing in spite of an incomplete data dump.
9. The apparatus of claim 1, further comprising an offload scheduling module scheduling offloading of the dump data from the ODD dump buffer space to storage.
10. A non-transitory computer readable medium tangibly embodying a program of machine-readable instructions executed by a digital processing apparatus to perform operations for collecting dump data, the operations comprising:
- receiving an on demand data (ODD) dump request from a host to transfer dump data from a storage controller memory of a storage controller distinct from the host, wherein the storage controller comprises at least two components and the ODD dump request comprises a component priority for each component;
- pausing at least one scan loop performed by the storage controller on the storage controller memory;
- allocating free buffer space within an ODD dump buffer space for each component according to the component priority by determining a component buffer space minimum and component buffer space maximum for each component, wherein the component buffer space minimum comprises previously allocated buffer space and the component buffer space maximum is less than all free buffer space;
- transferring the dump data from the storage controller memory to the ODD dump buffer space according to the component priority;
- upon transferring the dump data for a first component, dynamically allocating free buffer space from the first component with a higher priority to a second component with a lower priority;
- unpausing the at least one scan loop in response to the transfer of the dump data to the ODD dump buffer space; and
- offloading the dump data from the ODD dump buffer space to a data storage device distinct from the host and the storage controller.
11. The computer readable medium of claim 10, wherein pausing the at least one scan loop comprises pausing a first scan loop, initiating a scan loop pause timer, attempting to pause any remaining scan loops before expiration of the scan loop pause timer, and unpausing all scan loops if all scan loops are not paused before expiration of the scan loop pause timer.
12. The computer readable medium of claim 11, wherein pausing the at least one scan loop further comprises reattempting to pause all scan loops and reinitiating the scan loop pause timer upon expiration of a rest timer.
13. The computer readable medium of claim 10, further comprising prioritizing the dump data prior to transferring the dump data to the ODD dump buffer space.
14. The computer readable medium of claim 13, further comprising real-time registering and deregistering of perspective dump data to facilitate a proper prioritization.
15. The computer readable medium of claim 10, further comprising balancing the dump data amongst multiple processing threads configured to simultaneously transfer the dump data to distinct segments of the ODD dump buffer space.
16. The computer readable medium of claim 10, wherein transferring the dump data to the ODD dump buffer space comprises transferring the dump data to the ODD dump buffer space until expiration of a dump data transfer timer.
17. The computer readable medium of claim 16, further comprising restarting the at least one scan loop to normal processing in spite of an incomplete data dump.
18. The computer readable medium of claim 10, further comprising scheduling an offload of the dump data from the ODD dump buffer space to storage.
19. A method for collecting dump data, the operations comprising:
- receiving, by use of a processor, an on demand data (ODD) dump request from a host to transfer dump data from a storage controller memory of a storage controller distinct from the host, wherein the storage controller comprises at least two components and the ODD dump request comprises a component priority for each component;
- pausing at least one scan loop performed by the storage controller on the storage controller memory;
- allocating free buffer space within an ODD dump buffer space for each component according to the component priority by determining a component buffer space minimum and a component buffer space maximum for each component, wherein the component buffer space minimum comprises previously allocated buffer space and the component buffer space maximum is less than all free buffer space;
- transferring the dump data from the storage controller memory to the ODD dump buffer space according to the component priority;
- upon transferring the dump data for a first component, dynamically allocating free buffer space from the first component with a higher priority to a second component with a lower priority;
- unpausing the at least one scan loop in response to the transfer of the dump data to the ODD dump buffer space; and
- offloading the dump data from the ODD dump buffer space to a data storage device distinct from the host and the storage controller.
20. A system for collecting dump data, the system comprising:
- a plurality of host computers configured to communicate with at least one storage controller; and
- at least one storage device configured to store data on a data bearing medium;
- the at least one storage controller configured to:
- receive an on demand data (ODD) dump request from a host to transfer dump data from a storage controller memory of a storage controller distinct from the host, wherein the storage controller comprises at least two components and the ODD dump request comprises a component priority for each component;
- pause at least one scan loop performed by the storage controller on the storage controller memory;
- allocate free buffer space within an ODD dump buffer space for each component according to the component priority by determining a component buffer space minimum and a component buffer space maximum for each component, wherein the component buffer space minimum comprises previously allocated buffer space and the component buffer space maximum is less than all free buffer space;
- transfer the dump data from the storage controller memory to the ODD dump buffer space according to the component priority;
- upon transferring the dump data for a first component, dynamically allocating free buffer space form the first component with a higher priority to a second component with a lower priority
- unpause the at least one scan loop in response to the transfer of the dump data to the ODD dump buffer space; and
- offload the dump data from the ODD dump buffer space to the storage device, wherein the storage device is distinct from the host and the storage controller.
5287496 | February 15, 1994 | Chen et al. |
5581749 | December 3, 1996 | Hossain et al. |
5682523 | October 28, 1997 | Chen et al. |
6150679 | November 21, 2000 | Reynolds |
6792518 | September 14, 2004 | Armangau et al. |
6848106 | January 25, 2005 | Hipp |
7062677 | June 13, 2006 | Chigurupati |
20030220948 | November 27, 2003 | Green et al. |
20040015740 | January 22, 2004 | Dautelle |
20050055603 | March 10, 2005 | Soran et al. |
20060036655 | February 16, 2006 | Lastovica |
20060047926 | March 2, 2006 | Zheng |
20110276842 | November 10, 2011 | Shibamori et al. |
02047735 | February 1990 | JP |
05134906 | June 1993 | JP |
05189254 | July 1993 | JP |
05210608 | August 1993 | JP |
2002073585 | March 2002 | JP |
2007193414 | August 2007 | JP |
2007523404 | August 2007 | JP |
- Dubois, LL, et al. “Dumping the DFT Data Segment”, IBM Technical Disclosure Bulletin, vol. 32, No. 8B, Jan. 1990, p. 193.
Type: Grant
Filed: Oct 1, 2007
Date of Patent: Aug 19, 2014
Patent Publication Number: 20090089336
Assignee: International Business Machines Corporation (Armonk, NY)
Inventors: Douglas William Dewey (Tucson, AZ), Brian David Hatfield (Tucson, AZ), Ivan Ronald Olguin, II (Tucson, AZ), William Griswold Sherman (Tucson, AZ)
Primary Examiner: Jacob F Bétit
Assistant Examiner: Amanda Willis
Application Number: 11/865,559
International Classification: G06F 11/07 (20060101); G06F 17/30 (20060101);