Abstract: A method and apparatus for distributed on-chip debug triggering is presented. A first bus includes a plurality of lines and a debugging state machine configurable to monitor the plurality of lines of the first bus. One or more nodes are configurable to detect triggering events and provide, in response to detecting one or more triggering events, signals to the debugging state machine using a first subset of the plurality of lines that is allocated to the node(s).
Abstract: Restoration devices in a cloud storage system are paired with source containers associated with a mainframe computer, and series of commands are generated based on the pairings to cause copies of data at the source containers to be stored to the restoration devices. A point-in-time copy of the copy of the data at the source containers may be stored to some restoration devices, and a second copy of the data may be stored to other restoration devices. The restoration devices may be reallocated from inactive source containers. Execution of the commands is monitored, and the commands are modified if the execution of the commands does not satisfy one or more desired conditions. For example, a cycle time associated with copying data to a restoration device may be measured, and if the cycle time exceeds a threshold, the command may be modified.
Abstract: According to one aspect, a non-transitory computer-readable recording medium stores therein a log acquisition program causing a computer to execute a process. The process includes receiving information for designating multiple functions of an application for which logs are to be acquired and designating a log detail level with respect to each of the multiple functions; and acquiring a log of the application with a corresponding log detail level with respect to each of the multiple functions designated.
Abstract: Increasing disaster resiliency in one aspect may comprise running an optimization algorithm that simultaneously solves for at least a first objective to increase a spread of a backup of virtual machines from a given site onto other sites in proportion to an amount of available space for backup at each site, a second objective to increase a number of backups at one or more of the other sites with low probability of system crash while reducing backups at one or more of the other sites with higher probability of system crash, and a third objective to minimize a violation of recovery time objectives of the virtual machines during recovery. One or more backup sites and one or more recovery sites in an event the given site crashes may be determined based on a solution of the optimization algorithm.
Type:
Grant
Filed:
April 2, 2014
Date of Patent:
September 6, 2016
Assignee:
International Business Machines Corporation
Abstract: Automated health monitoring and recovery is provided for infrastructure devices supporting server devices in a data center. Health analysis operations may be selected to be performed on an infrastructure device based on the capabilities of the infrastructure device and/or how the infrastructure device is being used to support server devices in the data center. If the infrastructure device is unhealthy, an automated recovery operation may be performed. The automated recovery operation may include recovery actions selected based on the capabilities of the infrastructure device, the failure mode of the infrastructure device, and/or how the infrastructure device is being used to support server devices in the data center.
Type:
Grant
Filed:
January 27, 2014
Date of Patent:
August 30, 2016
Assignee:
Microsoft Technology Licensing, LLC
Inventors:
Chandan Aggarwal, Asad Yaqoob, Josh David McKone, Matthew Jeremiah Eason, Akil M. Merchant
Abstract: A computer program product for prioritizing First Failure Data Capture (FFDC) data for analysis. A processor configured to: identify, by the processor, FFDC data in response to receiving an error message, the FFDC data comprising at least one of: a computer system event which may lead to system failure; a computer system event led to system failure; a computer system condition which may lead to system failure; a computer system condition which led to system failure; determine, by the processor, a relevancy rank for each data value in the FFDC data based on the error message received and a probability a given data value is relevant in resolving a cause of the error message; and send, by the processor, in order of relevancy, the data values of the FFDC data to a second server.
Type:
Grant
Filed:
January 29, 2016
Date of Patent:
August 23, 2016
Assignee:
International Business Machines Corporation
Inventors:
Douglas J. Griffith, Anil Kalavakolanu, Minh Q. Pham, Richard B. Sutton
Abstract: A method, system, and computer program product are described. The system includes a first memory device to store programming code of the device driver, the device driver providing an interface to a data manipulation device, and a second memory device to store a test case to test the device driver, the device driver receiving version information specifying a targeted version or the device driver determining the version independently of the test case. The system also includes a third memory device to store a simulation including a version verification portion and a data manipulation portion, and a processor to execute the test case on the device driver, execution of the test case including, based on a request by the device driver, execution of the version verification portion of the simulation and, based on a result of executing the version verification portion, execution of the data manipulation portion of the simulation.
Type:
Grant
Filed:
September 30, 2014
Date of Patent:
July 12, 2016
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A method, system, and computer program product are described. The method of testing a device driver includes executing a test case for the device driver, the device driver receiving version information specifying a targeted version of a data manipulation device to be targeted by the device driver from the test case or the device driver determining the targeted version of the data manipulation device independently of the test case. The method also includes verifying whether a version of the data manipulation device specified in a request from the device driver is a match or a non-match with the targeted version of the data manipulation device. The method further includes simulating the data manipulation device to provide output to the device driver based on the verifying, the simulating the data manipulation device being unchanged for every version of the data manipulation device.
Type:
Grant
Filed:
March 19, 2014
Date of Patent:
July 12, 2016
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: An information processing apparatus according to one aspect of the present disclosure includes a communication control portion, an error code storage portion, an acquiring portion, and a determination portion. Communication control portion communicates with storage device based on interface communication standard, to perform data transfer therewith. Error code storage portion stores one or a plurality of selected error codes selected from a plurality of error codes defined by interface communication standard. Acquiring portion acquires error information outputted from storage device. Determination portion determines whether or not error code indicated by error information coincides with selected error code. When determination portion determines that error code coincides with selected error code, communication control portion communicates again with storage device to perform data transfer therewith.
Abstract: Methods and systems for a network device are provided. The network device includes a storage protocol controller having a port for interfacing with a storage area network (SAN) based storage device; a processor executing instructions for managing a local storage device that is configured to operate as a caching device for a computing device. The local storage device is used to store a recovery copy of an operating system of the computing device, where the recovery copy is accessible via a processor executable basic/input output (BIOS) utility.
Abstract: Method and system for replacing a first node and a second of a clustered storage system by a third node and a fourth node are provided. The method includes migrating all storage objects managed by the first node to the second node; replacing the first node by the third node and migrating all the storage objects managed by the first node and the second node to the third node; and replacing the second node by the fourth node and then migrating the storage objects previously managed by the second node but currently managed by the third node to the fourth node. The nodes may also be replaced by operationally connecting the third node and the fourth node to storage managed by the first node and the second node; joining the third node and the fourth node to a same cluster as the first node and the second node.
Abstract: A method, processor, and computer system for handling interrupts within a hierarchical register structure. The method includes receiving at a root-level register an indication of an interrupt occurring at a lower level register in the register structure, using a system interrupt handler to invoke an error handler assigned to a set of registers of the structure that includes the lower level register, and using the invoked error handler to handle the interrupt and return to the system interrupt handler.
Abstract: A data processing apparatus has processing circuitry for executing program instructions and trace circuitry for generating trace data indicating processing activities of the processing circuitry. The trace circuitry may detect a lockup state of the processing circuitry in which the processing circuitry does not make forward progress of execution of the program instructions. In response to detecting the lockup state, the trace circuitry may include in the trace data a lockup identifier indicating that the lockup state has occurred.
Abstract: A directory file includes a plurality of entries, wherein an entry of the plurality of entries includes a file or directory name field, and a snapshot list field that includes a snapshot list. A clone snapshot identifier (ID) is determined for a data file. The directory file is updated to produce an updated directory file, wherein the updating includes updating the snapshot list field associated with the data file to include the clone snapshot ID in the snapshot list.
Type:
Grant
Filed:
July 28, 2014
Date of Patent:
May 31, 2016
Assignee:
International Business Machines Corporation
Inventors:
Andrew Baptist, Ilya Volvovski, Wesley Leggette
Abstract: An asset health monitoring system (AHMS) can assign a confidence indicator to some or all the monitored computing asset in a data center, such as computing systems or networking devices. In response to drops in the confidence indicators, the AHMS can automatically initiate testing of computing assets in order to raise confidence that the asset will perform correctly. Further, the AHMS can automatically initiate remediation procedures for computing assets that fail the confidence testing. By automatically triggering testing of assets and/or remediation procedures, the AHMS can increase reliability for the data center by preemptively identifying problems.
Abstract: Methods, systems, and computer readable media for early detection of potential flash failures using an adaptive system level algorithm based on NAND program verify are disclosed. According to one aspect, a method for early detection of potential flash failures using an adaptive system level algorithm based on NAND program verify includes performing a program verify operation after a write to a non-volatile memory, where the program verify mechanism reports a pass or fail based on an existing measurement threshold value, and dynamically adjusting the measurement threshold value used by subsequent program verify operations based on the results of previous program verify operations.
Abstract: A directory file includes a plurality of entries, wherein an entry of the plurality of entries includes a file or directory name field, and a snapshot list field that includes a snapshot list in accordance with one of a plurality of snapshot paths of a snapshot tree. A new snapshot identifier (ID) is determined for a data file. The directory file is updated to produce an updated directory file, wherein the updating includes updating the snapshot list field associated with the data file to include the new snapshot ID in the snapshot list in accordance with the one of a plurality of snapshot paths of the snapshot tree.
Type:
Grant
Filed:
July 28, 2014
Date of Patent:
May 3, 2016
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Andrew Baptist, Ilya Volvovski, Wesley Leggette
Abstract: A method, system and computer program product are provided for implementing client based throttled error logging in a computer system. A log governor, controlled by a client of a log manager, prevents the flooding of the logs, identifies how many repetitive logs have been suppressed, and is tailored such that log suppression requirements are enabled to be specified for each individual log. A space required for the log governor features or log governing information is allocated in the client.
Abstract: This invention teaches how to use prediction software and algorithms to minimize the risk of failure, and to increase the likelihood of success of information technology (IT) and telecommunications system changes. The method identifies the systems, people, documents and other unanticipated consequences of system changes. The invention teaches how use of the prediction software and algorithms allow system administrators to find more advantageous ways and times to perform system changes.
Abstract: A computing device collects wear life data of a first and a second solid state drive, wherein each solid state drive includes at least one stride, and wherein wear life data is data which includes information regarding wear and deterioration of each stride of each solid state drive. Based on the collected wear life data, the computing device determines the first solid state drive contains more high usage strides than the second solid state drive, wherein a high usage stride is a stride containing high usage data. The computing device then re-allocates data from at least one high usage stride of the first solid state drive to a stride of the second solid state drive, wherein the re-allocated data includes parity data.