AUTOMATED DATA CENTER MAINTENANCE
Techniques for automated data center maintenance are described. In an example embodiment, an automated maintenance device may comprise processing circuitry and non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to cause the automated maintenance device to receive an automation command from an automation coordinator for a data center, identify an automated maintenance procedure based on the received automation command, and perform the identified automated maintenance procedure. Other embodiments are described and claimed.
This application claims priority to U.S. Provisional Patent Application No. 62/365,969, filed Jul. 22, 2016, U.S. Provisional Patent Application No. 62/376,859, filed Aug. 18, 2016, and U.S. Provisional Patent Application No. 62/427,268, filed Nov. 29, 2016, each of which is hereby incorporated by reference in its entirety.
BACKGROUNDIn the course of ordinary operation of a data center, various types of maintenance are typically necessary in order to maintain desired levels of performance, stability, and reliability. Examples of such maintenance include testing, repair, replacement, and/or reconfiguration of components, installing new components, upgrading existing components, repositioning components and equipment, and other tasks of such a nature. A large modern data center may contain great numbers of components and equipment of various types, and as a result, may have the potential to impose a fairly substantial maintenance burden.
Various embodiments may be generally directed to techniques for automated data center maintenance. In one embodiment, for example, an automated maintenance device may comprise processing circuitry and non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to cause the automated maintenance device to receive an automation command from an automation coordinator for a data center, identify an automated maintenance procedure based on the received automation command, and perform the identified automated maintenance procedure. Other embodiments are described and claimed.
Various embodiments may comprise one or more elements. An element may comprise any structure arranged to perform certain operations. Each element may be implemented as hardware, software, or any combination thereof, as desired for a given set of design parameters or performance constraints. Although an embodiment may be described with a limited number of elements in a certain topology by way of example, the embodiment may include more or less elements in alternate topologies as desired for a given implementation. It is worthy to note that any reference to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrases “in one embodiment,” “in some embodiments,” and “in various embodiments” in various places in the specification are not necessarily all referring to the same embodiment.
The illustrative data center 100 differs from typical data centers in many ways. For example, in the illustrative embodiment, the circuit boards (“sleds”) on which components such as CPUs, memory, and other components are placed are designed for increased thermal performance. In particular, in the illustrative embodiment, the sleds are shallower than typical boards. In other words, the sleds are shorter from the front to the back, where cooling fans are located. This decreases the length of the path that air must to travel across the components on the board. Further, the components on the sled are spaced further apart than in typical circuit boards, and the components are arranged to reduce or eliminate shadowing (i.e., one component in the air flow path of another component). In the illustrative embodiment, processing components such as the processors are located on a top side of a sled while near memory, such as DIMMs, are located on a bottom side of the sled. As a result of the enhanced airflow provided by this design, the components may operate at higher frequencies and power levels than in typical systems, thereby increasing performance. Furthermore, the sleds are configured to blindly mate with power and data communication cables in each rack 102A, 102B, 102C, 102D, enhancing their ability to be quickly removed, upgraded, reinstalled, and/or replaced. Similarly, individual components located on the sleds, such as processors, accelerators, memory, and data storage drives, are configured to be easily upgraded due to their increased spacing from each other. In the illustrative embodiment, the components additionally include hardware attestation features to prove their authenticity.
Furthermore, in the illustrative embodiment, the data center 100 utilizes a single network architecture (“fabric”) that supports multiple other network architectures including Ethernet and Omni-Path. The sleds, in the illustrative embodiment, are coupled to switches via optical fibers, which provide higher bandwidth and lower latency than typical twister pair cabling (e.g., Category 5, Category 5e, Category 6, etc.). Due to the high bandwidth, low latency interconnections and network architecture, the data center 100 may, in use, pool resources, such as memory, accelerators (e.g., graphics accelerators, FPGAs, ASICs, etc.), and data storage drives that are physically disaggregated, and provide them to compute resources (e.g., processors) on an as needed basis, enabling the compute resources to access the pooled resources as if they were local. The illustrative data center 100 additionally receives usage information for the various resources, predicts resource usage for different types of workloads based on past resource usage, and dynamically reallocates the resources based on this information.
The racks 102A, 102B, 102C, 102D of the data center 100 may include physical design features that facilitate the automation of a variety of types of maintenance tasks. For example, data center 100 may be implemented using racks that are designed to be robotically-accessed, and to accept and house robotically-manipulable resource sleds. Furthermore, in the illustrative embodiment, the racks 102A, 102B, 102C, 102D include integrated power sources that receive a greater voltage than is typical for power sources. The increased voltage enables the power sources to provide additional power to the components on each sled, enabling the components to operate at higher than typical frequencies.
In various embodiments, dual-mode optical switches may be capable of receiving both Ethernet protocol communications carrying Internet Protocol (IP packets) and communications according to a second, high-performance computing (HPC) link-layer protocol (e.g., Intel's Omni-Path Architecture's, Infiniband) via optical signaling media of an optical fabric. As reflected in
Included among the types of sleds to be accommodated by rack architecture 600 may be one or more types of sleds that feature expansion capabilities.
MPCMs 916-1 to 916-7 may be configured to provide inserted sleds with access to power sourced by respective power modules 920-1 to 920-7, each of which may draw power from an external power source 921. In various embodiments, external power source 921 may deliver alternating current (AC) power to rack 902, and power modules 920-1 to 920-7 may be configured to convert such AC power to direct current (DC) power to be sourced to inserted sleds. In some embodiments, for example, power modules 920-1 to 920-7 may be configured to convert 277-volt AC power into 12-volt DC power for provision to inserted sleds via respective MPCMs 916-1 to 916-7. The embodiments are not limited to this example.
MPCMs 916-1 to 916-7 may also be arranged to provide inserted sleds with optical signaling connectivity to a dual-mode optical switching infrastructure 914, which may be the same as—or similar to—dual-mode optical switching infrastructure 514 of
Sled 1004 may also include dual-mode optical network interface circuitry 1026. Dual-mode optical network interface circuitry 1026 may generally comprise circuitry that is capable of communicating over optical signaling media according to each of multiple link-layer protocols supported by dual-mode optical switching infrastructure 914 of
Coupling MPCM 1016 with a counterpart MPCM of a sled space in a given rack may cause optical connector 1016A to couple with an optical connector comprised in the counterpart MPCM. This may generally establish optical connectivity between optical cabling of the sled and dual-mode optical network interface circuitry 1026, via each of a set of optical channels 1025. Dual-mode optical network interface circuitry 1026 may communicate with the physical resources 1005 of sled 1004 via electrical signaling media 1028. In addition to the dimensions of the sleds and arrangement of components on the sleds to provide improved cooling and enable operation at a relatively higher thermal envelope (e.g., 250 W), as described above with reference to
As shown in
In another example, in various embodiments, one or more pooled storage sleds 1132 may be included among the physical infrastructure 1100A of data center 1100, each of which may comprise a pool of storage resources that is available globally accessible to other sleds via optical fabric 1112 and dual-mode optical switching infrastructure 1114. In some embodiments, such pooled storage sleds 1132 may comprise pools of solid-state storage devices such as solid-state drives (SSDs). In various embodiments, one or more high-performance processing sleds 1134 may be included among the physical infrastructure 1100A of data center 1100. In some embodiments, high-performance processing sleds 1134 may comprise pools of high-performance processors, as well as cooling features that enhance air cooling to yield a higher thermal envelope of up to 250 W or more. In various embodiments, any given high-performance processing sled 1134 may feature an expansion connector 1117 that can accept a far memory expansion sled, such that the far memory that is locally available to that high-performance processing sled 1134 is disaggregated from the processors and near memory comprised on that sled. In some embodiments, such a high-performance processing sled 1134 may be configured with far memory using an expansion sled that comprises low-latency SSD storage. The optical infrastructure allows for compute resources on one sled to utilize remote accelerator/FPGA, memory, and/or SSD resources that are disaggregated on a sled located on the same rack or any other rack in the data center. The remote resources can be located one switch jump away or two-switch jumps away in the spine-leaf network architecture described above with reference to
In various embodiments, one or more layers of abstraction may be applied to the physical resources of physical infrastructure 1100A in order to define a virtual infrastructure, such as a software-defined infrastructure 1100B. In some embodiments, virtual computing resources 1136 of software-defined infrastructure 1100B may be allocated to support the provision of cloud services 1140. In various embodiments, particular sets of virtual computing resources 1136 may be grouped for provision to cloud services 1140 in the form of SDI services 1138. Examples of cloud services 1140 may include—without limitation—software as a service (SaaS) services 1142, platform as a service (PaaS) services 1144, and infrastructure as a service (IaaS) services 1146.
In some embodiments, management of software-defined infrastructure 1100B may be conducted using a virtual infrastructure management framework 1150B. In various embodiments, virtual infrastructure management framework 1150B may be designed to implement workload fingerprinting techniques and/or machine-learning techniques in conjunction with managing allocation of virtual computing resources 1136 and/or SDI services 1138 to cloud services 1140. In some embodiments, virtual infrastructure management framework 1150B may use/consult telemetry data in conjunction with performing such resource allocation. In various embodiments, an application/service management framework 1150C may be implemented in order to provide QoS management capabilities for cloud services 1140. The embodiments are not limited in this context.
Based on data center operation information such as may be collected at 1202, a maintenance task to be completed may be identified at 1204. In one example, based on data center operation information indicating that processing resources on a given sled are non-responsive to communications from resources on other sleds, it may be determined at 1204 that the sled is to be pulled for testing. In another example, based on data center operation information indicating that a particular DIMM has reached the end of its estimated service life, it may be determined that the DIMM is to be replaced. At 1206, a set of physical actions associated with the maintenance task may be determined, and those physical actions may be performed at 1208 in order to complete the maintenance task. For instance, in the aforementioned example in which it is determined at 1204 that a DIMM is to be replaced, the physical actions identified at 1206 and performed at 1208 may include traveling to a particular rack in order to access a sled comprising the DIMM, removing the DIMM from a socket on the sled, and inserting a replacement DIMM into the socket. The embodiments are not limited to this example.
In various embodiments, according to an automated maintenance scheme implemented in data center 1300, robots 1360 may be used to service, repair, replace, clean, test, configure, upgrade, move, position, and/or otherwise manipulate equipment housed in racks 1302. Racks 1302 may be arranged in such fashion as to define and/or accommodate access pathways via which robots 1360 can physically access such equipment. Robots 1360 may traverse such access pathways in conjunction with moving around in data center 1300 to perform various tasks. Physical features of equipment housed in racks 1302 may be designed to facilitate robotic manipulation/handling. It is to be appreciated that in various embodiments, the equipment housed in racks 1302 may include some equipment that is not robotically accessible/serviceable. Further, in some embodiments, there may be some equipment within data center 1300 that is robotically accessible/serviceable but is not housed in racks 1302. The embodiments are not limited in this context.
Locomotion elements 1462 may generally comprise physical elements enabling automated maintenance device 1400 to move around within a data center. In various embodiments, locomotion elements 1462 may comprise wheels. In some embodiments, locomotion elements 1462 may comprise caterpillar tracks. In various embodiments, automated maintenance device 1400 may provide the motive power/force required for motion. For example, in some embodiments, automated maintenance device 1400 may feature a battery that provides power to drive wheels or tracks used by automated maintenance device 1400 for moving around in a data center. In various other embodiments, the motive power/force may be provided by an external source. The embodiments are not limited in this context.
Manipulation elements 1463 may generally comprise physical elements that are usable to manipulate various types of equipment in a data center. In some embodiments, manipulation elements 1463 may include one or more robotic arms. In various embodiments, manipulation elements 1463 may include one or more multi-link manipulators. In some embodiments, manipulation elements 1463 may include one or more end effectors usable for gripping various types of equipment, components, and/or other objects within the data center. In various embodiments, manipulation elements 1463 may include one or more end effectors comprising impactive grippers, such as jaw or claw grippers. In some embodiments, manipulation elements 1463 may include one or more end effectors comprising ingressive grippers, which may feature pins, needles, hackles, or other elements that are to physically penetrate the surface of an object being gripped. In various embodiments, manipulation elements 1463 may include one or more end effectors comprising astrictive grippers, which may grip objects using air suction, magnetic adhesion, or electroadhesion. The embodiments are not limited to these examples.
Sensory elements 1464 may generally comprise physical elements that are usable to sense various aspects of ambient conditions within a data center. Examples of sensory elements 1464 may include cameras, alignment guides/sensors, distance sensors, proximity sensors, barcode readers, RFID/NFC readers, temperature sensors, airflow sensors, air quality sensors, humidity sensors, and pressure sensors. The embodiments are not limited to these examples.
Communication elements 1465 may generally comprise a set of electronic components and/or circuitry operable to perform functions associated with communications between automated maintenance device 1400 and one or more external devices. In a given embodiment, such communications may include wireless communications, wired communications, or both. In various embodiments, communication elements 1465 may include elements operative to generate/construct packets, frames, messages, and/or other information to be wirelessly communicated to external device(s), and/or to process/deconstruct packets, frames, messages, and/or other information wirelessly received from external device(s). In various embodiments, for example, communication elements 1465 may include baseband circuitry supporting wireless communications according to one or more wireless communication protocols/standards. In some embodiments, communication elements 1465 may include elements operative to generate, process, construct, and/or deconstruct packets, frames, messages, and/or other information communicated over wired media. In various embodiments, for example, communication elements 1465 may include network interface circuitry supporting wired communications according to one or more wired communication protocols/standards. The embodiments are not limited in this context.
In various embodiments, interfaces 1466 may include one or more communication interfaces 1466A. As reflected in
Communication interfaces 1466A may generally comprise interfaces usable to transmit and/or receive signals via one or more communication media, which may include wired media, wireless media, or both. In various embodiments, communication interfaces 1466A may include one or more wireless communication interfaces, such as radio frequency (RF) interfaces and/or optical wireless communication (OWC) interfaces. In some embodiments, communication interfaces may additionally or alternatively include one or more wired communication interfaces, such as interface(s) for communicating over media such as coaxial cable, twisted pair, and optical fiber. The embodiments are not limited to these examples.
In various embodiments, interfaces 1466 may include one or more testing interfaces 1466B. Testing interfaces 1466B may generally comprise interfaces via which automated maintenance device 1400 is able to test physical components/resources of one or more types, which may include—without limitation—one or more of physical storage resources 205-1, physical accelerator resources 205-2, physical memory resources 205-3, and physical compute resources 205-4 of
In various embodiments, interfaces 1466 may include one or more power interfaces 1466C. Power interfaces 1466C may generally comprise interfaces via which automated maintenance device 1400 can draw and/or source power. In various embodiments, power interfaces 1466C may include one or more interfaces via which automated maintenance device 1400 can draw power from external source(s). In some embodiments, automated maintenance device 1400 may feature one or more power interfaces 1466C configured to provide charge to one or more batteries (not shown), and automated maintenance device may draw its operating power from those one or more batteries. In various embodiments, automated maintenance device 1400 may feature one or more power interfaces 1466C via which it can directly draw operating power. In various embodiments, automated maintenance device 1400 may feature one or more power interfaces 1466C via which it can source power to external devices. For example, in various embodiments, automated maintenance device 1400 may feature a power interface 1466C via which it can source power to charge a battery of a second automated maintenance device. The embodiments are not limited to this example.
In some embodiments, interfaces 1466 may include one or more user interfaces. User interfaces 1466D may generally comprise interfaces via which information can be provided to human technicians and/or user input can be accepted from human technicians. Examples of user interfaces 1466D may include displays, touchscreens, speakers, microphones, keypads, mice, trackballs, trackpads, joysticks, fingerprint readers, retinal scanners, buttons, switches, and the like. The embodiments are not limited to these examples.
Memory/storage elements 1467 may generally comprise a set of electronic components and/or circuitry capable of retaining data, such as any of various types of data that may be generated, transmitted, received, and/or used by automated maintenance device 1400 during normal operation. In some embodiments, memory/storage elements 1467 may include one or both of volatile memory and non-volatile memory. For example, in various embodiments, memory/storage elements 1467 may include one or more of read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, hard disks, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices, solid state drives (SSDs), or any other type of media suitable for storing information. The embodiments are not limited to these examples.
OMC elements 1468 may generally comprise a set of components and/or circuitry capable of performing computing operations required to implement logic for managing and controlling the operations of automated maintenance device 1400. In various embodiments, OMC elements 1468 may include processing circuitry, such as one or more processors/processing units. In some embodiments, an automation engine 1469 may execute on such processing circuitry. Automation engine 1469 may generally be operative to conduct overall management, control, coordination, and/or oversight of the operations of automated maintenance device 1400. In various embodiments, this may include management, coordination, control, and/or oversight of the operations/usage of various other elements within automated maintenance device 1400, such as any or all of locomotion elements 1462, manipulation elements 1463, sensory elements 1464, communication elements 1465, interfaces 1466, and memory/storage elements 1467. The embodiments are not limited in this context.
In some embodiments, management/coordination functionality of automation coordinator 1555 may be provided by a coordination engine 1572. In various embodiments, coordination engine 1572 may execute on processing circuitry of automation coordinator 1555. In various embodiments, coordination engine 1572 may generate automation commands 1573 for transmission to robots 1360 in order to instruct robots 1360 to perform automated maintenance tasks and/or actions associated with such tasks. In some embodiments, robots 1360 may provide automation coordinator 1555 with various types of feedback 1574 in order to—for example—acknowledge automation commands 1573, report the results of attempted maintenance tasks, provide information regarding the statuses of components, resources, and/or equipment, provide information regarding information regarding the statuses of robots 1360 themselves, and/or report measurements of one or more aspects of ambient conditions in the data center. The embodiments are not limited to these examples.
In some embodiments, coordination engine 1572 may consider various types of information in conjunction with automated maintenance coordination/management. As reflected in
Physical infrastructure information 1575 may generally comprise information identifying equipment, devices, components, interconnects, physical resources, and/or other infrastructure elements that comprise portions of the physical infrastructure of data center 1300, and describing characteristics of such elements. Data center operations information 1576 may generally comprise information describing various aspects of ongoing operations within data center 1300. In some embodiments, for example, data center operations information 1576 may include information describing one or more workloads currently being processed in data center 1300. In various embodiments, data center operations information 1576 may include metrics characterizing one or more aspects of current operations in data center 1300. For example, in some embodiments, data center operations information 1576 may include performance metrics characterizing the relative level of performance currently being achieved in data center 1300, efficiency metrics characterizing the relative level of efficiency with which the physical resources of data center 1300 are being used to handle the current workloads, and utilization metrics generally indicative of current usage levels of various types of resources in data center 1300. In various embodiments, data center operations information 1576 may include telemetry data 1571, such as automation coordinator 1555 may receive via telemetry framework 1570 or from robots 1360. The embodiments are not limited in this context.
Maintenance task information 1577 may generally comprise information identifying and describing ongoing and pending maintenance tasks of data center 1300. Maintenance task information 1577 may also include information identifying and describing previously completed maintenance tasks. In various embodiments, maintenance task information 1577 may include a pending task queue 1578. Pending task queue 1578 may generally comprise information identifying a set of maintenance tasks that need to be performed in data center 1300. Maintenance equipment information 1579 may generally comprise identifying and describing automated maintenance equipment—such as robots 1360—of data center 1300. In some embodiments, maintenance equipment information 1579 may include a candidate device pool 1580. Candidate device pool 1580 may generally comprise information identifying a set of robots 1360 that are currently available for use in data center 1300. The embodiments are not limited in this context.
In various embodiments, based on telemetry data 1571, automation coordinator 1555 may identify automated maintenance tasks to be performed in data center 1300 by robots 1360. For example, based on telemetry data 1571 indicating a high bit error rate at a DIMM, automation coordinator 1555 may determine that a robot 1360 should be assigned to replace that DIMM. In some embodiments, automation coordinator 1555 may use telemetry data 1571 to prioritize among automated maintenance tasks, such as tasks comprised in pending task queue 1578. For example, automation coordinator 1555 may use telemetry data 1571 to assess the respective expected performance impacts of multiple automated maintenance tasks in pending task queue 1578, and may assign out an automated maintenance task with the highest expected performance impact first. In some embodiments, in identifying and/or prioritizing among automated maintenance tasks, automation coordinator 1555 may consider any or all of physical infrastructure information 1575, data center operations information 1576, maintenance task information 1577, and maintenance equipment information 1579 in addition to—or in lieu of—telemetry data 1571.
In a first example, automation coordinator 1555 may assign a low priority to an automated maintenance task involving replacement of a malfunctioning compute sled based on physical infrastructure information 1575 indicating that another sled in a different rack can be used as a substitute without need for replacing the malfunctioning compute sled. In a second example, automation coordinator 1555 may assign a high priority to an automated maintenance task involving replacing a malfunctioning memory sled based on data center operation information 1576 indicating that a scarcity of memory constitutes a performance bottleneck with respect to workloads being processed in data center 1300. In a third example, automation coordinator 1555 may determine not to add a new maintenance task to pending task queue 1578 based on a determination that a maintenance task already present in pending task queue 1578 may render the new maintenance task unnecessary and/or moot. In a fourth example, in determining an extent to which to prioritize an automated maintenance task that requires the use of particular robots 1360 featuring specialized capabilities, automation coordinator 1555 may consider maintenance equipment information 1579 indicating whether any robots 1360 featuring such specialized capabilities are currently available. The embodiments are not limited to these examples.
In various embodiments, based on telemetry data 1571, automation coordinator 1555 may control the positioning and/or movement of robots 1360 within data center 1300. For example, having used telemetry data 1571 to identify a region of data center 1300 within which a greater number of hardware failures have been and/or are expected to be observed, automation coordinator 1555 may position robots 1360 more densely within that identified region than within other regions of data center 1300. The embodiments are not limited in this context.
In some embodiments, in response to automated maintenance decisions—such as may be reached based on any or all of telemetry data 1571, physical infrastructure information 1575, data center operations information 1576, maintenance task information 1577, and maintenance equipment information 1579—automation coordinator 1555 may send automation commands 1573 to robots 1360 in order to instruct robots 1360 to perform operations associated with automated maintenance tasks. For example, upon determining that a particular compute sled should be replaced, automation coordinator 1555 may send an automation command 1573 in order to instruct a robot 1360 to perform a sled replacement procedure to replace the sled. In various embodiments, automation coordinator 1555 may inform robots 1360 of various parameters characterizing assigned automated maintenance tasks by including such parameters in automation commands 1573. For instance, in the context of the preceding example, the automation command 1573 may contain fields specifying a sled ID uniquely identifying the sled to be replaced and a rack ID and/or sled space ID identifying the location of that sled within the data center, as well as analogous parameters associated with the replacement sled. The embodiments are not limited to this example.
It is worthy of note that in various embodiments, with respect to some aspects of automated maintenance operations, decision-making may be handled in a distributed—rather than centralized—fashion. In such embodiments, robots 1360 may make some automated maintenance decisions autonomously. In some such embodiments, as illustrated in
Position data 1681 may generally comprise data for use by automation coordinator 1555 to determine/track the positions and/or movements of robots 1360 within data center 1300. In some embodiments, position data 1681 may comprise data associated with an indoor positioning system. In some such embodiments, the indoor positioning system may be a radio-based system, such as a Wi-Fi-based or Bluetooth-based indoor positioning system. In some other embodiments, a non-radio based positioning system, such as a magnetic, optical, or inertial indoor positioning system may be used. In various embodiments, the indoor positioning system may be a hybrid system, such as one that combines two or more of radio-based, magnetic, optical, and inertial indoor positioning techniques. The embodiments are not limited in this context.
Assistance data 1682 may generally comprise data for use by automation coordinator 1555 to provide human maintenance personnel with information aiding them in the identification and/or performance of manual maintenance tasks. In various embodiments, a given robot 1360 may generate assistance data 1682 in response to identifying a maintenance issue that it cannot correct/resolve in an automated fashion. For instance, after identifying a component that needs to be replaced and determining that it cannot perform the replacement itself, a robot 1360 take a picture of the component and provide assistance data 1682 comprising that picture to automation coordinator 1555. Automation coordinator 1555 may then cause the picture to be presented on a display for reference by human maintenance personnel in order to aid visual identification of the component to be replaced. The embodiments are not limited to this example.
In some embodiments, the performance and/or reliability of various types of hardware in data center 1300 may potentially be affected by one or more aspects of the ambient conditions within data center 1300, such as ambient temperature, pressure, humidity, and air quality. For example, a rate at which corrosion occurs on metallic contacts of components such as DIMMs may depend on the ambient temperature and humidity. In various embodiments, it may thus be desirable to monitor various types of environmental parameters at various locations during ongoing operations of data center 1300.
In some embodiments, robots 1360 may be configured to support environmental condition monitoring by measuring one or more aspects of ambient conditions within the data center during ongoing operations and providing those collected measurements to automation coordinator 1555 in the form of environmental data 1683. In various embodiments, robots 1360 may collect environmental data 1683 using sensors or sensor arrays comprising sensory elements such as sensory elements 1464 of
In various embodiments, access to dynamic, continuous, and location-specific measurements of such parameters may enable a data center operator to predict failures, dynamically configure systems for best performance, and dynamically move resources for data center optimization. In some embodiments, based on environmental data 1683 provided by robots 1360, a data center operator may be able to predict accelerated failure of parts versus standard factory specification and replace parts earlier (or move to lower priority tasks). In various embodiments, environmental data 1683 provided by robots 1360 may enable a data center operator to initiate service tickets ahead of predicted failure timelines. For example, a cleaning of DIMM contacts may be initiated in order to avoid corrosion build-up to the level where failures start occurring. In some embodiments, environmental data 1683 provided by robots 1360 may enable a data center operator to continuously and dynamically configure servers based on, for example, altitude, pressure and other parameters that may be important to such things as fan speeds and cooling configurations which in turn may affect performance of a server in a given environment and temperature. In various embodiments, environmental data 1683 provided by robots 1360 may enable a data center operator to detect and move data center resources automatically from zones/locations of the data center that may be affected by equipment failures or environment variations detected by the robot's sensors. For example, based on environmental data 1683 indicating an excessive temperature or air quality deterioration in a particular data center region, servers and/or other resources may be relocated from the affected region to a different region. The embodiments are not limited to these examples.
In some embodiments, robot 1760 may perform one or more automated maintenance tasks involving the installation and/or removal of sleds at racks of a data center such as data center 1300. In various embodiments, for example, robot 1760 may be operative to install a sled 1704 at rack 1702. In some embodiments, robot 1760 may install sled 1704 by inserting it into an available sled space of rack 1702. In various embodiments, in conjunction with inserting sled 1704, robot 1760 may grip particular physical elements designed to accommodate robotic manipulation/handling. In some embodiments, robot 1760 may use image recognition and/or other location techniques to locate the elements to be gripped, and may insert sled 1704 while gripping those elements. In various embodiments, rather than installing sled 1704, robot 1760 may instead remove sled 1704 from rack 1702 and install a replacement sled 1704B. In some embodiments, robot 1760 may install replacement sled 1704B in a same sled space as was occupied by sled 1704, once it has removed sled 1704. In various other embodiments, robot 1760 may install replacement sled 1704B in a different sled space, such that it does not need to remove sled 1704 before installing replacement sled 1704B. The embodiments are not limited in this context.
In some embodiments, robot 1760 may perform one or more automated maintenance tasks involving upkeep, repair, and/or replacement of particular components on sleds of a data center such as data center 1300. In various embodiments, robot 1760 may be used to power up a component 1706 in accordance with a scheme for periodically powering up components in the data center on a periodic basis in order to improve the reliability of such components. In some embodiments, for example, storage and/or memory components may tend to malfunction when left idle for excessive periods of time, and thus robots may be used to power up such components according to a defined cycle. In such an embodiment, robot 1760 may be operative to power up an appropriate component 1706 by plugging that component 1706 into a powered interface/slot. The embodiments are not limited to this example.
In various embodiments, robot 1760 may be operative to manipulate a given component 1706 in accordance with a scheme for automated upkeep of pooled memory resources of a data center. According to such a scheme, robots may be used to assess/troubleshoot apparently malfunctioning memory resources such as DIMMs. In some embodiments, according to such a scheme, robot 1760 may identify a component 1706 comprising a memory resource such as a DIMM, remove that component 1706 from a slot on sled 1704, and clean the component 1706. Robot 1760 may then test the component 1706 to determine whether the issue has been resolved, and may determine to pull sled 1704 for “back-room” servicing if it finds that the problem persists. In various embodiments, robot 1760 may test the component 1706 after reinserting it into its slot on sled 1704. In some other embodiments, robot 1760 may be configured with a testing slot into which it can insert the component 1706 for the purpose of testing. The embodiments are not limited in this context.
As shown in
In various embodiments, according to a procedure for automated CPU cache servicing, robot 1860 may remove CPU 1806A and heat sink 1806C from sled 1804 in order to gain physical access to cache memory 1806B. In some embodiments, robot 1860 may remove sled 1804 from rack 1802 prior to removing CPU 1806A and heat sink 1806C from sled 1804. In various other embodiments, robot 1860 may remove CPU 1806A and heat sink 1806C from sled 1804 while sled 1804 remains seated within a sled space of rack 1802. In some embodiments, robot 1860 may first remove heat sink 1806C, and then remove CPU 1806A. In various other embodiments, robot 1860 may remove both heat sink 1806C and CPU 1806A simultaneously and/or as a collective unit (i.e., without removing heat sink 1806C from CPU 1806A). In some embodiments, after replacing cache memory 1806B, robot 1860 may reinstall CPU 1806A and heat sink 1806C upon sled 1804, which it may then reinsert into a sled space of rack 1802 in embodiments in which it was previously removed. The embodiments are not limited in this context.
As shown in
In some embodiments, according to a procedure for automated compute state storage and/or transfer, robot 1960 may insert a memory card 1918 into connector 1906B. In various embodiments, robot 1960 may remove compute sled 1904 from rack 1902 prior to inserting memory card 1918 into connector 1906B. In some other embodiments, robot 1960 may insert memory card 1918 into connector 1906B while compute sled 1904 remains seated within a sled space of rack 1902. In still other embodiments, memory card 1918 may be present and coupled with connector 1906B prior to initiation of the automated compute state storage and/or transfer procedure. In various embodiments, memory card 1918 may comprise a set of physical memory resources 1906C. In some embodiments, once memory card is inserted into/coupled with connector 1906B, a compute state 1984 of compute sled 1904 may be stored on memory card 1918 using one or more of the physical memory resources 1906C comprised thereon. In various embodiments, compute state 1984 may include respective states of each CPU 1906A comprised on compute sled 1904. In some embodiments, compute state 1984 may also include states of one or more memory resources comprised on compute sled 1904. The embodiments are not limited in this context.
In various embodiments, robot 1960 may perform an automated compute state storage/transfer procedure in order to preserve the compute state of compute sled 1904 during upkeep/repair of compute sled 1904. In some such embodiments, once compute state 1984 is stored on memory card 1918, robot 1960 may remove memory card 1918 from connector 1906B, perform upkeep/repair of compute sled 1904, reinsert memory card 1918 into connector 1906B, and then restore compute sled 1904 to the compute state 1984 stored on memory card 1918. For instance, in an example embodiment, robot 1960 may remove a CPU 1906A from a socket on compute sled 1904 and insert a replacement CPU into that socket, and then cause compute sled 1904 to be restored to the compute state 1984 stored on memory card 1918. In various other embodiments, robot 1960 may perform an automated compute state storage/transfer procedure in order to replace compute sled 1904 with another compute sled. In some such embodiments, once compute state 1984 is stored on memory card 1918, robot 1960 may remove memory card 1918 from connector 1906B, insert memory card 1918 into a connector on a replacement compute sled, insert the replacement compute sled into a sled space of rack 1902 or another rack, and cause the replacement compute sled to realize the compute state 1984 stored on memory card 1918. The embodiments are not limited in this context.
It is worthy of note that the absence of automation coordinator 1555 in
At 2104, a determination may be made to initiate automated performance of the maintenance task. For example, having added an identified maintenance task to pending task queue 1578 in operating environment 1500 of
At 2106, an automated maintenance device to which to assign the maintenance task may be selected. For example, among one or more robots 1360 comprised in candidate device pool 1580 in operating environment 1500 of
At 2108, one or more automation commands may be sent to cause an automated maintenance device selected at 2106 to perform an automated maintenance procedure associated with the maintenance task. For example, in operating environment 1500 of
At 2204, an automated maintenance procedure may be identified based on the one or more automation commands received at 2202. For example, based on one or more automation commands 1573 received from automation coordinator 1555 in operating environment 1500 of
A second automated maintenance device with which to collaborate during performance of the collaborative maintenance procedure may be identified at 2304, and interdevice coordination information may be sent to the second automated maintenance device at 2306 in order to initiate the collaborative maintenance procedure. For example, in operating environment 2000 of
As used in this application, the terms “system” and “component” and “module” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution, examples of which are provided by the exemplary computing architecture 2500. For example, a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Further, components may be communicatively coupled to each other by various types of communications media to coordinate operations. The coordination may involve the uni-directional or bi-directional exchange of information. For instance, the components may communicate information in the form of signals communicated over the communications media. The information can be implemented as signals allocated to various signal lines. In such allocations, each message may be a signal. Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces.
The computing architecture 2500 includes various common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth. The embodiments, however, are not limited to implementation by the computing architecture 2500.
As shown in
The system bus 2508 provides an interface for system components including, but not limited to, the system memory 2506 to the processing unit 2504. The system bus 2508 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. Interface adapters may connect to the system bus 2508 via a slot architecture. Example slot architectures may include without limitation Accelerated Graphics Port (AGP), Card Bus, (Extended) Industry Standard Architecture ((E)ISA), Micro Channel Architecture (MCA), NuBus, Peripheral Component Interconnect (Extended) (PCI(X)), PCI Express, Personal Computer Memory Card International Association (PCMCIA), and the like.
The system memory 2506 may include various types of computer-readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory, solid state drives (SSD) and any other type of storage media suitable for storing information. In the illustrated embodiment shown in
The computer 2502 may include various types of computer-readable storage media in the form of one or more lower speed memory units, including an internal (or external) hard disk drive (HDD) 2514, a magnetic floppy disk drive (FDD) 2516 to read from or write to a removable magnetic disk 2518, and an optical disk drive 2520 to read from or write to a removable optical disk 2522 (e.g., a CD-ROM or DVD). The HDD 2514, FDD 2516 and optical disk drive 2520 can be connected to the system bus 2508 by a HDD interface 2524, an FDD interface 2526 and an optical drive interface 2528, respectively. The HDD interface 2524 for external drive implementations can include at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
The drives and associated computer-readable media provide volatile and/or nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For example, a number of program modules can be stored in the drives and memory units 2510, 2512, including an operating system 2530, one or more application programs 2532, other program modules 2534, and program data 2536.
A user can enter commands and information into the computer 2502 through one or more wire/wireless input devices, for example, a keyboard 2538 and a pointing device, such as a mouse 2540. Other input devices may include microphones, infra-red (IR) remote controls, radio-frequency (RF) remote controls, game pads, stylus pens, card readers, dongles, finger print readers, gloves, graphics tablets, joysticks, keyboards, retina readers, touch screens (e.g., capacitive, resistive, etc.), trackballs, trackpads, sensors, styluses, and the like. These and other input devices are often connected to the processing unit 2504 through an input device interface 2542 that is coupled to the system bus 2508, but can be connected by other interfaces such as a parallel port, IEEE 1394 serial port, a game port, a USB port, an IR interface, and so forth.
A monitor 2544 or other type of display device may also be connected to the system bus 2508 via an interface, such as a video adaptor 2546. The monitor 2544 may be internal or external to the computer 2502. In addition to the monitor 2544, a computer typically includes other peripheral output devices, such as speakers, printers, and so forth.
The computer 2502 may operate in a networked environment using logical connections via wire and/or wireless communications to one or more remote computers, such as a remote computer 2548. The remote computer 2548 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 2502, although, for purposes of brevity, only a memory/storage device 2550 is illustrated. The logical connections depicted include wire/wireless connectivity to a local area network (LAN) 2552 and/or larger networks, for example, a wide area network (WAN) 2554. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet.
When used in a LAN networking environment, the computer 2502 may be connected to the LAN 2552 through a wire and/or wireless communication network interface or adaptor 2556. The adaptor 2556 can facilitate wire and/or wireless communications to the LAN 2552, which may also include a wireless access point disposed thereon for communicating with the wireless functionality of the adaptor 2556.
When used in a WAN networking environment, the computer 2502 can include a modem 2558, or may be connected to a communications server on the WAN 2554, or has other means for establishing communications over the WAN 2554, such as by way of the Internet. The modem 2558, which can be internal or external and a wire and/or wireless device, connects to the system bus 2508 via the input device interface 2542. In a networked environment, program modules depicted relative to the computer 2502, or portions thereof, can be stored in the remote memory/storage device 2550. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
The computer 2502 may be operable to communicate with wire and wireless devices or entities using the IEEE 802 family of standards, such as wireless devices operatively disposed in wireless communication (e.g., IEEE 802.16 over-the-air modulation techniques). This includes at least Wi-Fi (or Wireless Fidelity), WiMax, and Bluetooth™ wireless technologies, among others. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices. Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, n, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wire networks (which use IEEE 802.3-related media and functions).
As shown in
The clients 2602 and the servers 2604 may communicate information between each other using a communication framework 2606. The communications framework 2606 may implement any well-known communications techniques and protocols. The communications framework 2606 may be implemented as a packet-switched network (e.g., public networks such as the Internet, private networks such as an enterprise intranet, and so forth), a circuit-switched network (e.g., the public switched telephone network), or a combination of a packet-switched network and a circuit-switched network (with suitable gateways and translators).
The communications framework 2606 may implement various network interfaces arranged to accept, communicate, and connect to a communications network. A network interface may be regarded as a specialized form of an input output interface. Network interfaces may employ connection protocols including without limitation direct connect, Ethernet (e.g., thick, thin, twisted pair 10/100/1000 Base T, and the like), token ring, wireless network interfaces, cellular network interfaces, IEEE 802.11a-x network interfaces, IEEE 802.16 network interfaces, IEEE 802.20 network interfaces, and the like. Further, multiple network interfaces may be used to engage with various communications network types. For example, multiple network interfaces may be employed to allow for the communication over broadcast, multicast, and unicast networks. Should processing requirements dictate a greater amount speed and capacity, distributed network controller architectures may similarly be employed to pool, load balance, and otherwise increase the communicative bandwidth required by clients 2602 and the servers 2604. A communications network may be any one and the combination of wired and/or wireless networks including without limitation a direct interconnection, a secured custom connection, a private network (e.g., an enterprise intranet), a public network (e.g., the Internet), a Personal Area Network (PAN), a Local Area Network (LAN), a Metropolitan Area Network (MAN), an Operating Missions as Nodes on the Internet (OMNI), a Wide Area Network (WAN), a wireless network, a cellular network, and other communications networks.
As used herein, the term “circuitry” may refer to, be part of, or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group), and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality. In some embodiments, the circuitry may be implemented in, or functions associated with the circuitry may be implemented by, one or more software or firmware modules. In some embodiments, circuitry may include logic, at least partially operable in hardware. Embodiments described herein may be implemented into a system using any suitably configured hardware and/or software.
The device 2700 may implement some or all of the structure and/or operations for one or more of robots 1360, 1760, 1860, 1960, 2060A, and 2060B, automated maintenance device 1400, automation coordinator 1555, logic flows 2100, 2200, and 2300, storage media 2400 and 2450, computing architecture 2500, clients 2602, servers 2604, and logic circuit 2728 in a single computing entity, such as entirely within a single device. Alternatively, the device 2700 may distribute portions of the structure and/or operations for one or more of robots 1360, 1760, 1860, 1960, 2060A, and 2060B, automated maintenance device 1400, automation coordinator 1555, logic flows 2100, 2200, and 2300, storage media 2400 and 2450, computing architecture 2500, clients 2602, servers 2604, and logic circuit 2728 across multiple computing entities using a distributed system architecture, such as a client-server architecture, a 3-tier architecture, an N-tier architecture, a tightly-coupled or clustered architecture, a peer-to-peer architecture, a master-slave architecture, a shared database architecture, and other types of distributed systems. The embodiments are not limited in this context.
In one embodiment, radio interface 2710 may include a component or combination of components adapted for transmitting and/or receiving single-carrier or multi-carrier modulated signals (e.g., including complementary code keying (CCK), orthogonal frequency division multiplexing (OFDM), and/or single-carrier frequency division multiple access (SC-FDMA) symbols) although the embodiments are not limited to any specific over-the-air interface or modulation scheme. Radio interface 2710 may include, for example, a receiver 2712, a frequency synthesizer 2714, and/or a transmitter 2716. Radio interface 2710 may include bias controls, a crystal oscillator and/or one or more antennas 2718-f. In another embodiment, radio interface 2710 may use external voltage-controlled oscillators (VCOs), surface acoustic wave filters, intermediate frequency (IF) filters and/or RF filters, as desired. Due to the variety of potential RF interface designs an expansive description thereof is omitted.
Baseband circuitry 2720 may communicate with radio interface 2710 to process receive and/or transmit signals and may include, for example, a mixer for down-converting received RF signals, an analog-to-digital converter 2722 for converting analog signals to digital form, a digital-to-analog converter 2724 for converting digital signals to analog form, and a mixer for up-converting signals for transmission. Further, baseband circuitry 2720 may include a baseband or physical layer (PHY) processing circuit 2726 for PHY link layer processing of respective receive/transmit signals. Baseband circuitry 2720 may include, for example, a medium access control (MAC) processing circuit 2727 for MAC/data link layer processing. Baseband circuitry 2720 may include a memory controller 2732 for communicating with MAC processing circuit 2727 and/or a computing platform 2730, for example, via one or more interfaces 2734.
In some embodiments, PHY processing circuit 2726 may include a frame construction and/or detection module, in combination with additional circuitry such as a buffer memory, to construct and/or deconstruct communication frames. Alternatively or in addition, MAC processing circuit 2727 may share processing for certain of these functions or perform these processes independent of PHY processing circuit 2726. In some embodiments, MAC and PHY processing may be integrated into a single circuit.
The computing platform 2730 may provide computing functionality for the device 2700. As shown, the computing platform 2730 may include a processing component 2740. In addition to, or alternatively of, the baseband circuitry 2720, the device 2700 may execute processing operations or logic for one or more of robots 1360, 1760, 1860, 1960, 2060A, and 2060B, automated maintenance device 1400, automation coordinator 1555, logic flows 2100, 2200, and 2300, storage media 2400 and 2450, computing architecture 2500, clients 2602, servers 2604, and logic circuit 2728 using the processing component 2740. The processing component 2740 (and/or PHY 2726 and/or MAC 2727) may comprise various hardware elements, software elements, or a combination of both. Examples of hardware elements may include devices, logic devices, components, processors, microprocessors, circuits, processor circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. Examples of software elements may include software components, programs, applications, computer programs, application programs, system programs, software development programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation.
The computing platform 2730 may further include other platform components 2750. Other platform components 2750 include common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components (e.g., digital displays), power supplies, and so forth. Examples of memory units may include without limitation various types of computer readable and machine readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory, solid state drives (SSD) and any other type of storage media suitable for storing information.
Device 2700 may be, for example, an ultra-mobile device, a mobile device, a fixed device, a machine-to-machine (M2M) device, a personal digital assistant (PDA), a mobile computing device, a smart phone, a telephone, a digital telephone, a cellular telephone, user equipment, eBook readers, a handset, a one-way pager, a two-way pager, a messaging device, a computer, a personal computer (PC), a desktop computer, a laptop computer, a notebook computer, a netbook computer, a handheld computer, a tablet computer, a server, a server array or server farm, a web server, a network server, an Internet server, a work station, a mini-computer, a main frame computer, a supercomputer, a network appliance, a web appliance, a distributed computing system, multiprocessor systems, processor-based systems, consumer electronics, programmable consumer electronics, game devices, display, television, digital television, set top box, wireless access point, base station, node B, subscriber station, mobile subscriber center, radio network controller, router, hub, gateway, bridge, switch, machine, or combination thereof. Accordingly, functions and/or specific configurations of device 2700 described herein, may be included or omitted in various embodiments of device 2700, as suitably desired.
Embodiments of device 2700 may be implemented using single input single output (SISO) architectures. However, certain implementations may include multiple antennas (e.g., antennas 2718-f) for transmission and/or reception using adaptive antenna techniques for beamforming or spatial division multiple access (SDMA) and/or using MIMO communication techniques.
The components and features of device 2700 may be implemented using any combination of discrete circuitry, application specific integrated circuits (ASICs), logic gates and/or single chip architectures. Further, the features of device 2700 may be implemented using microcontrollers, programmable logic arrays and/or microprocessors or any combination of the foregoing where suitably appropriate. It is noted that hardware, firmware and/or software elements may be collectively or individually referred to herein as “logic” or “circuit.”
It should be appreciated that the exemplary device 2700 shown in the block diagram of
In the exemplary broadband wireless access system 2800, radio access networks (RANs) 2812 and 2818 are capable of coupling with evolved node Bs (eNBs) 2814 and 2820, respectively, to provide wireless communication between one or more fixed devices 2816 and internet 2810 and/or between or one or more mobile devices 2822 and Internet 2810. One example of a fixed device 2816 and a mobile device 2822 is device 2700 of
Broadband wireless access system 2800 may further comprise a visited core network (CN) 2824 and/or a home CN 2826, each of which may be capable of providing one or more network functions including but not limited to proxy and/or relay type functions, for example authentication, authorization and accounting (AAA) functions, dynamic host configuration protocol (DHCP) functions, or domain name service controls or the like, domain gateways such as public switched telephone network (PSTN) gateways or voice over internet protocol (VoIP) gateways, and/or internet protocol (IP) type server functions, or the like. However, these are merely example of the types of functions that are capable of being provided by visited CN 2824 and/or home CN 2826, and the scope of the claimed subject matter is not limited in these respects. Visited CN 2824 may be referred to as a visited CN in the case where visited CN 2824 is not part of the regular service provider of fixed device 2816 or mobile device 2822, for example where fixed device 2816 or mobile device 2822 is roaming away from its respective home CN 2826, or where broadband wireless access system 2800 is part of the regular service provider of fixed device 2816 or mobile device 2822 but where broadband wireless access system 2800 may be in another location or state that is not the main or home location of fixed device 2816 or mobile device 2822. The embodiments are not limited in this context.
Fixed device 2816 may be located anywhere within range of one or both of eNBs 2814 and 2820, such as in or near a home or business to provide home or business customer broadband access to Internet 2810 via eNBs 2814 and 2820 and RANs 2812 and 2818, respectively, and home CN 2826. It is worthy of note that although fixed device 2816 is generally disposed in a stationary location, it may be moved to different locations as needed. Mobile device 2822 may be utilized at one or more locations if mobile device 2822 is within range of one or both of eNBs 2814 and 2820, for example. In accordance with one or more embodiments, operation support system (OSS) 2828 may be part of broadband wireless access system 2800 to provide management functions for broadband wireless access system 2800 and to provide interfaces between functional entities of broadband wireless access system 2800. Broadband wireless access system 2800 of
In various embodiments, wireless network 2900 may comprise a wireless local area network (WLAN), such as a WLAN implementing one or more Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (sometimes collectively referred to as “Wi-Fi”). In some other embodiments, wireless network 2900 may comprise another type of wireless network, and/or may implement other wireless communications standards. In various embodiments, for example, wireless network 2900 may comprise a WWAN or WPAN rather than a WLAN. The embodiments are not limited to this example.
In some embodiments, wireless network 2900 may implement one or more broadband wireless communications standards, such as 3G or 4G standards, including their revisions, progeny, and variants. Examples of 3G or 4G wireless standards may include without limitation any of the IEEE 802.16m and 802.16p standards, 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) and LTE-Advanced (LTE-A) standards, and International Mobile Telecommunications Advanced (IMT-ADV) standards, including their revisions, progeny and variants. Other suitable examples may include, without limitation, Global System for Mobile Communications (GSM)/Enhanced Data Rates for GSM Evolution (EDGE) technologies, Universal Mobile Telecommunications System (UMTS)/High Speed Packet Access (HSPA) technologies, Worldwide Interoperability for Microwave Access (WiMAX) or the WiMAX II technologies, Code Division Multiple Access (CDMA) 2000 system technologies (e.g., CDMA2000 1×RTT, CDMA2000 EV-DO, CDMA EV-DV, and so forth), High Performance Radio Metropolitan Area Network (HIPERMAN) technologies as defined by the European Telecommunications Standards Institute (ETSI) Broadband Radio Access Networks (BRAN), Wireless Broadband (WiBro) technologies, GSM with General Packet Radio Service (GPRS) system (GSM/GPRS) technologies, High Speed Downlink Packet Access (HSDPA) technologies, High Speed Orthogonal Frequency-Division Multiplexing (OFDM) Packet Access (HSOPA) technologies, High-Speed Uplink Packet Access (HSUPA) system technologies, 3GPP Rel. 8-12 of LTE/System Architecture Evolution (SAE), and so forth. The embodiments are not limited in this context.
In various embodiments, wireless stations 2904, 2906, and 2908 may communicate with access point 2902 in order to obtain connectivity to one or more external data networks. In some embodiments, for example, wireless stations 2904, 2906, and 2908 may connect to the Internet 2912 via access point 2902 and access network 2910. In various embodiments, access network 2910 may comprise a private network that provides subscription-based Internet-connectivity, such as an Internet Service Provider (ISP) network. The embodiments are not limited to this example.
In various embodiments, two or more of wireless stations 2904, 2906, and 2908 may communicate with each other directly by exchanging peer-to-peer communications. For example, in the example of
Various embodiments may be implemented using hardware elements, software elements, or a combination of both. Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor. Some embodiments may be implemented, for example, using a machine-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the embodiments. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. The machine-readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, Compact Disk Read Only Memory (CD-ROM), Compact Disk Recordable (CD-R), Compact Disk Rewriteable (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD), a tape, a cassette, or the like. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
The following examples pertain to further embodiments:
Example 1 is a method for automated data center maintenance, comprising processing, by processing circuitry of an automated maintenance device, an automation command received from an automation coordinator for a data center, identifying an automated maintenance procedure based on the received automation command, and performing the identified automated maintenance procedure.
Example 2 is the method of Example 1, the identified automated maintenance procedure to comprise a sled replacement procedure.
Example 3 is the method of Example 2, the sled replacement procedure to comprise replacing a compute sled.
Example 4 is the method of Example 3, the sled replacement procedure to comprise removing the compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 5 is the method of Example 4, the memory card to store a compute state of the compute sled.
Example 6 is the method of Example 5, the sled replacement procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 7 is the method of Example 2, the sled replacement procedure to comprise replacing an accelerator sled.
Example 8 is the method of Example 2, the sled replacement procedure to comprise replacing a memory sled.
Example 9 is the method of Example 2, the sled replacement procedure to comprise replacing a storage sled.
Example 10 is the method of Example 1, the identified automated maintenance procedure to comprise a component replacement procedure.
Example 11 is the method of Example 10, the component replacement procedure to comprise removing a component from a socket of a sled, and inserting a replacement component into the socket.
Example 12 is the method of Example 11, the component to comprise a processor.
Example 13 is the method of Example 11, the component to comprise a field-programmable gate array (FPGA).
Example 14 is the method of Example 11, the component to comprise a memory module.
Example 15 is the method of Example 11, the component to comprise a non-volatile storage device.
Example 16 is the method of Example 15, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 17 is the method of Example 16, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 18 is the method of Example 10, the component replacement procedure to comprise a cache memory replacement procedure.
Example 19 is the method of Example 18, the cache memory replacement procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 20 is the method of Example 19, the cache memory replacement procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 21 is the method of Example 1, the identified automated maintenance procedure to comprise a component servicing procedure.
Example 22 is the method of Example 21, the component servicing procedure to comprise servicing a component on a sled.
Example 23 is the method of Example 22, the component servicing procedure to comprise removing the sled from a sled space of a rack.
Example 24 is the method of any of Examples 22 to 23, the component servicing procedure to comprise removing the component from the sled.
Example 25 is the method of any of Examples 22 to 24, the component servicing procedure to comprise testing the component.
Example 26 is the method of any of Examples 22 to 25, the component servicing procedure to comprise cleaning the component.
Example 27 is the method of any of Examples 22 to 26, the component servicing procedure to comprise power-cycling the component.
Example 28 is the method of any of Examples 22 to 27, the component servicing procedure to comprise capturing one or more images of the component.
Example 29 is the method of Example 28, comprising sending the one or more captured images to the automation coordinator.
Example 30 is the method of any of Examples 22 to 29, the component to comprise a processor.
Example 31 is the method of any of Examples 22 to 29, the component to comprise a field-programmable gate array (FPGA).
Example 32 is the method of any of Examples 22 to 29, the component to comprise a memory module.
Example 33 is the method of any of Examples 22 to 29, the component to comprise a non-volatile storage device.
Example 34 is the method of Example 33, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 35 is the method of Example 34, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 36 is the method of any of Examples 1 to 35, comprising identifying the automated maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 37 is the method of any of Examples 1 to 36, comprising performing the identified automated maintenance procedure based on one or more maintenance task parameters.
Example 38 is the method of Example 37, the one or more maintenance task parameters to be comprised in the received automation command.
Example 39 is the method of Example 37, at least one of the one or more maintenance task parameters to be comprised in a second automation command received from the automation coordinator.
Example 40 is the method of any of Examples 37 to 39, the one or more maintenance task parameters to include one or more location parameters.
Example 41 is the method of Example 40, the one or more location parameters to include a rack identifier (ID) associated with a rack within the data center.
Example 42 is the method of any of Examples 40 to 41, the one or more location parameters to include a sled space identifier (ID) associated with a sled space within the data center.
Example 43 is the method of any of Examples 40 to 42, the one or more location parameters to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 44 is the method of any of Examples 37 to 43, the one or more maintenance task parameters to include a sled identifier (ID) associated with a sled within the data center.
Example 45 is the method of any of Examples 37 to 44, the one or more maintenance task parameters to include a component identifier (ID) associated with a component on a sled within the data center.
Example 46 is the method of any of Examples 1 to 45, the automation command to be comprised in signals received via a communication interface of the automated maintenance device.
Example 47 is the method of Example 46, the communication interface to comprise a radio frequency (RF) interface, the signals to comprise RF signals.
Example 48 is the method of any of Examples 1 to 47, comprising sending a message to the automation coordinator to acknowledge the received automation command.
Example 49 is the method of any of Examples 1 to 48, comprising sending a message to the automation coordinator to report a result of the automated maintenance procedure.
Example 50 is the method of any of Examples 1 to 49, comprising sending position data to the automation coordinator, the position data to indicate a position of the automated maintenance device within the data center.
Example 51 is the method of any of Examples 1 to 50, comprising sending assistance data to the automation coordinator, the assistance data to comprise an image of a component that is to be manually replaced or serviced.
Example 52 is the method of any of Example 1 to 51, comprising sending environmental data to the automation coordinator, the environmental data to comprise measurements of one or more aspects of ambient conditions within the data center.
Example 53 is the method of Example 52, comprising one or more sensors to generate the measurements comprised in the environmental data.
Example 54 is the method of any of Examples 52 to 53, the environmental data to comprise one or more temperature measurements.
Example 55 is the method of any of Examples 52 to 54, the environmental data to comprise one or more humidity measurements.
Example 56 is the method of any of Examples 52 to 55, the environmental data to comprise one or more air quality measurements.
Example 57 is the method of any of Examples 52 to 56, the environmental data to comprise one or more pressure measurements.
Example 58 is a computer-readable storage medium storing instructions that, when executed, cause an automated maintenance device to perform a method according to any of Examples 1 to 57.
Example 59 is an automated maintenance device, comprising processing circuitry and computer-readable storage media storing instructions for execution by the processing circuitry to cause the automated maintenance device to perform a method according to any of Examples 1 to 57.
Example 60 is a method for coordination of automated data center maintenance, comprising identifying, by processing circuitry, a maintenance task to be performed in a data center, determining to initiate automated performance of the maintenance task, selecting an automated maintenance device to which to assign the maintenance task, and sending an automation command to cause the automated maintenance device to perform an automated maintenance procedure associated with the maintenance task.
Example 61 is the method of Example 60, comprising identifying the maintenance task based on telemetry data associated with one or more physical resources of the data center.
Example 62 is the method of Example 61, comprising receiving the telemetry data via a telemetry framework of the data center.
Example 63 is the method of any of Examples 61 to 62, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 64 is the method of any of Examples 61 to 63, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 65 is the method of any of Examples 61 to 64, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 66 is the method of any of Examples 61 to 65, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 67 is the method of any of Examples 60 to 66, comprising identifying the maintenance task based on environmental data received from one or more automated maintenance devices of the data center.
Example 68 is the method of Example 67, the environmental data to include one or more temperature measurements.
Example 69 is the method of any of Examples 67 to 68, the environmental data to include one or more humidity measurements.
Example 70 is the method of any of Examples 67 to 69, the environmental data to include one or more air quality measurements.
Example 71 is the method of any of Examples 67 to 70, the environmental data to include one or more pressure measurements.
Example 72 is the method of any of Examples 60 to 71, comprising adding the maintenance task to a pending task queue following identification of the maintenance task.
Example 73 is the method of Example 72, comprising determining to initiate automated performance of the maintenance task based on a determination that the maintenance task constitutes a highest priority task among one or more maintenance tasks comprised in the pending task queue.
Example 74 is the method of any of Examples 60 to 73, comprising selecting the automated maintenance device from among one or more automated maintenance devices in a candidate device pool.
Example 75 is the method of any of Examples 60 to 74, comprising selecting the automated maintenance device based on one or more capabilities of the automated maintenance device.
Example 76 is the method of any of Examples 60 to 75, comprising selecting the automated maintenance device based on position data received from the automated maintenance device.
Example 77 is the method of any of Examples 60 to 76, the automation command to comprise a maintenance task code indicating a task type associated with the maintenance task.
Example 78 is the method of any of Examples 60 to 77, the automation command to comprise location information associated with the maintenance task.
Example 79 is the method of Example 78, the location information to include a rack identifier (ID) associated with a rack within the data center.
Example 80 is the method of any of Examples 78 to 79, the location information to include a sled space identifier (ID) associated with a sled space within the data center.
Example 81 is the method of any of Examples 78 to 80, the location information to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 82 is the method of any of Examples 60 to 81, the automation command to comprise a sled identifier (ID) associated with a sled within the data center.
Example 83 is the method of any of Examples 60 to 82, the automation command to comprise a physical resource identifier (ID) associated with a physical resource within the data center.
Example 84 is the method of any of Examples 60 to 81, the maintenance task to comprise replacement of a sled.
Example 85 is the method of Example 83, the sled to comprise a compute sled, an accelerator sled, a memory sled, or a storage sled.
Example 86 is the method of any of Examples 60 to 81, the maintenance task to comprise replacement of one or more components of a sled.
Example 87 is the method of any of Examples 60 to 81, the maintenance task to comprise repair of one or more components of a sled.
Example 88 is the method of any of Examples 60 to 81, the maintenance task to comprise testing of one or more components of a sled.
Example 89 is the method of any of Examples 60 to 81, the maintenance task to comprise cleaning of one or more components of a sled.
Example 90 is the method of any of Examples 60 to 81, the maintenance task to comprise power cycling one or more memory modules.
Example 91 is the method of any of Examples 60 to 81, the maintenance task to comprise power cycling one or more non-volatile storage devices.
Example 92 is the method of any of Examples 60 to 81, the maintenance task to comprise storing a compute state of a compute sled, replacing the compute sled with a second compute sled, and transferring the stored compute state to the second compute sled.
Example 93 is the method of any of Examples 60 to 81, the maintenance task to comprise replacing one or more cache memory modules of a processor.
Example 94 is a computer-readable storage medium storing instructions that, when executed by an automation coordinator for a data center, cause the automation coordinator to perform a method according to any of Examples 60 to 93.
Example 95 is an apparatus, comprising processing circuitry and computer-readable storage media storing instructions for execution by the processing circuitry to perform a method according to any of Examples 60 to 93.
Example 96 is a method for automated data center maintenance, comprising identifying, by processing circuitry of an automated maintenance device, a collaborative maintenance procedure to be performed in a data center, identifying a second automated maintenance device with which to collaborate during performance of the collaborative maintenance procedure, and sending interdevice coordination information to the second automated maintenance device to initiate the collaborative maintenance procedure.
Example 97 is the method of Example 96, comprising identifying the collaborative maintenance procedure based on telemetry data associated with one or more physical resources of the data center.
Example 98 is the method of Example 97, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 99 is the method of any of Examples 97 to 98, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 100 is the method of any of Examples 97 to 99, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 101 is the method of any of Examples 97 to 100, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 102 is the method of any of Examples 96 to 101, comprising identifying the collaborative maintenance procedure based on environmental data comprising measurements of one or more aspects of ambient conditions within the data center.
Example 103 is the method of Example 102, comprising one or more sensors to generate the measurements comprised in the environmental data.
Example 104 is the method of any of Examples 102 to 103, the environmental data to comprise one or more temperature measurements.
Example 105 is the method of any of Examples 102 to 104, the environmental data to comprise one or more humidity measurements.
Example 106 is the method of any of Examples 102 to 105, the environmental data to comprise one or more air quality measurements.
Example 107 is the method of any of Examples 102 to 106, the environmental data to comprise one or more pressure measurements.
Example 108 is the method of Example 96, comprising identifying the collaborative maintenance procedure based on an automation command received from an automation coordinator for the data center.
Example 109 is the method of Example 108, comprising identifying the collaborative maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 110 is the method of any of Examples 96 to 109, comprising selecting the second automated maintenance device from among a plurality of automated maintenance devices in a candidate device pool for the data center.
Example 111 is the method of any of Examples 96 to 110, comprising identifying the second automated maintenance device based on a parameter comprised in a command received from an automation coordinator for the data center.
Example 112 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise replacing a sled.
Example 113 is the method of Example 112, the sled to comprise a compute sled.
Example 114 is the method of Example 113, the collaborative maintenance procedure to comprise removing the compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 115 is the method of Example 114, the memory card to store a compute state of the compute sled.
Example 116 is the method of Example 115, the collaborative maintenance procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 117 is the method of Example 112, the sled to comprise an accelerator sled, a memory sled, or a storage sled.
Example 118 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise replacing a component on a sled.
Example 119 is the method of Example 118, the component to comprise a processor.
Example 120 is the method of Example 118, the component to comprise a field-programmable gate array (FPGA).
Example 121 is the method of Example 118, the component to comprise a memory module.
Example 122 is the method of Example 118, the component to comprise a non-volatile storage device.
Example 123 is the method of Example 122, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 124 is the method of Example 123, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 125 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 126 is the method of Example 125, the collaborative maintenance procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 127 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise servicing a component on a sled.
Example 128 is the method of Example 127, the collaborative maintenance procedure to comprise removing the sled from a sled space of a rack.
Example 129 is the method of any of Examples 127 to 128, the collaborative maintenance procedure to comprise removing the component from the sled.
Example 130 is the method of any of Examples 127 to 129, the collaborative maintenance procedure to comprise testing the component.
Example 131 is the method of any of Examples 127 to 130, the collaborative maintenance procedure to comprise cleaning the component.
Example 132 is the method of any of Examples 127 to 131, the collaborative maintenance procedure to comprise power-cycling the component.
Example 133 is the method of any of Examples 127 to 132, the collaborative maintenance procedure to comprise capturing one or more images of the component.
Example 134 is the method of any of Examples 127 to 133, the component to comprise a processor.
Example 135 is the method of any of Examples 127 to 133, the component to comprise a field-programmable gate array (FPGA).
Example 136 is the method of any of Examples 127 to 133, the component to comprise a memory module.
Example 137 is the method of any of Examples 127 to 133, the component to comprise a non-volatile storage device.
Example 138 is the method of Example 137, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 139 is the method of Example 138, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 140 is the method of any of Examples 96 to 139, the interdevice coordination information to comprise a rack identifier (ID) associated with a rack within the data center.
Example 141 is the method of any of Examples 96 to 140, the interdevice coordination information to comprise a sled space identifier (ID) associated with a sled space within the data center.
Example 142 is the method of any of Examples 96 to 141, the interdevice coordination information to comprise a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 143 is the method of any of Examples 96 to 142, the interdevice coordination information to comprise a sled identifier (ID) associated with a sled within the data center.
Example 144 is the method of any of Examples 96 to 143, the interdevice coordination information to comprise a component identifier (ID) associated with a component on a sled within the data center.
Example 145 is a computer-readable storage medium storing instructions that, when executed, cause an automated maintenance device to perform a method according to any of Examples 96 to 144.
Example 146 is an automated maintenance device, comprising processing circuitry and computer-readable storage media storing instructions for execution by the processing circuitry to cause the automated maintenance device to perform a method according to any of Examples 96 to 144.
Example 147 is an automated maintenance device, comprising means for receiving an automation command from an automation coordinator for a data center, means for identifying an automated maintenance procedure based on the received automation command, and means for performing the identified automated maintenance procedure.
Example 148 is the automated maintenance device of Example 147, the identified automated maintenance procedure to comprise a sled replacement procedure.
Example 149 is the automated maintenance device of Example 148, the sled replacement procedure to comprise removing a compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 150 is the automated maintenance device of Example 149, the memory card to store a compute state of the compute sled.
Example 151 is the automated maintenance device of Example 150, the sled replacement procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 152 is the automated maintenance device of Example 148, the sled replacement procedure to comprise replacing an accelerator sled, a memory sled, or a storage sled.
Example 153 is the automated maintenance device of Example 147, the identified automated maintenance procedure to comprise a component replacement procedure.
Example 154 is the automated maintenance device of Example 153, the component replacement procedure to comprise removing a component from a socket of a sled, and inserting a replacement component into the socket.
Example 155 is the automated maintenance device of Example 154, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 156 is the automated maintenance device of Example 153, the component replacement procedure to comprise a cache memory replacement procedure.
Example 157 is the automated maintenance device of Example 156, the cache memory replacement procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 158 is the automated maintenance device of Example 157, the cache memory replacement procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 159 is the automated maintenance device of Example 147, the identified automated maintenance procedure to comprise a component servicing procedure.
Example 160 is the automated maintenance device of Example 159, the component servicing procedure to comprise servicing a component on a sled.
Example 161 is the automated maintenance device of Example 160, the component servicing procedure to comprise removing the sled from a sled space of a rack.
Example 162 is the automated maintenance device of any of Examples 160 to 161, the component servicing procedure to comprise removing the component from the sled.
Example 163 is the automated maintenance device of any of Examples 160 to 162, the component servicing procedure to comprise testing the component.
Example 164 is the automated maintenance device of any of Examples 160 to 163, the component servicing procedure to comprise cleaning the component.
Example 165 is the automated maintenance device of any of Examples 160 to 164, the component servicing procedure to comprise power-cycling the component.
Example 166 is the automated maintenance device of any of Examples 160 to 165, the component servicing procedure to comprise capturing one or more images of the component.
Example 167 is the automated maintenance device of any of Examples 160 to 166, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 168 is the automated maintenance device of any of Examples 147 to 167, comprising means for identifying the automated maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 169 is the automated maintenance device of any of Examples 147 to 168, comprising means for performing the identified automated maintenance procedure based on one or more maintenance task parameters.
Example 170 is the automated maintenance device of Example 169, the one or more maintenance task parameters to be comprised in the received automation command.
Example 171 is the automated maintenance device of Example 169, at least one of the one or more maintenance task parameters to be comprised in a second automation command received from the automation coordinator.
Example 172 is the automated maintenance device of any of Examples 169 to 171, the one or more maintenance task parameters to include one or more location parameters.
Example 173 is the automated maintenance device of Example 172, the one or more location parameters to include a rack identifier (ID) associated with a rack within the data center.
Example 174 is the automated maintenance device of any of Examples 172 to 173, the one or more location parameters to include a sled space identifier (ID) associated with a sled space within the data center.
Example 175 is the automated maintenance device of any of Examples 172 to 174, the one or more location parameters to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 176 is the automated maintenance device of any of Examples 169 to 175, the one or more maintenance task parameters to include a sled identifier (ID) associated with a sled within the data center.
Example 177 is the automated maintenance device of any of Examples 169 to 176, the one or more maintenance task parameters to include a component identifier (ID) associated with a component on a sled within the data center.
Example 178 is the automated maintenance device of any of Examples 147 to 177, the automation command to be comprised in signals received via a communication interface of the automated maintenance device.
Example 179 is the automated maintenance device of Example 178, the communication interface to comprise a radio frequency (RF) interface, the signals to comprise RF signals.
Example 180 is the automated maintenance device of any of Examples 147 to 179, comprising means for sending a message to the automation coordinator to acknowledge the received automation command.
Example 181 is the automated maintenance device of any of Examples 147 to 180, comprising means for sending a message to the automation coordinator to report a result of the automated maintenance procedure.
Example 182 is the automated maintenance device of any of Examples 147 to 181, comprising means for sending position data to the automation coordinator, the position data to indicate a position of the automated maintenance device within the data center.
Example 183 is the automated maintenance device of any of Examples 147 to 182, comprising means for sending assistance data to the automation coordinator, the assistance data to comprise an image of a component that is to be manually replaced or serviced.
Example 184 is the automated maintenance device of any of Example 147 to 183, comprising means for sending environmental data to the automation coordinator, the environmental data to comprise measurements of one or more aspects of ambient conditions within the data center.
Example 185 is the automated maintenance device of Example 184, comprising means for generating the measurements comprised in the environmental data.
Example 186 is the automated maintenance device of any of Examples 184 to 185, the environmental data to comprise one or more temperature measurements.
Example 187 is the automated maintenance device of any of Examples 184 to 186, the environmental data to comprise one or more humidity measurements.
Example 188 is the automated maintenance device of any of Examples 184 to 187, the environmental data to comprise one or more air quality measurements.
Example 189 is the automated maintenance device of any of Examples 184 to 188, the environmental data to comprise one or more pressure measurements.
Example 189 is an apparatus for coordination of automated data center maintenance, comprising means for identifying a maintenance task to be performed in a data center, means for determining to initiate automated performance of the maintenance task, means for selecting an automated maintenance device to which to assign the maintenance task, and means for sending an automation command to cause the automated maintenance device to perform an automated maintenance procedure associated with the maintenance task.
Example 190 is the apparatus of Example 189, comprising means for identifying the maintenance task based on telemetry data associated with one or more physical resources of the data center.
Example 191 is the apparatus of Example 190, comprising means for receiving the telemetry data via a telemetry framework of the data center.
Example 192 is the apparatus of any of Examples 190 to 191, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 193 is the apparatus of any of Examples 190 to 192, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 194 is the apparatus of any of Examples 190 to 193, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 195 is the apparatus of any of Examples 190 to 194, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 196 is the apparatus of any of Examples 189 to 195, comprising means for identifying the maintenance task based on environmental data received from one or more automated maintenance devices of the data center.
Example 197 is the apparatus of Example 196, the environmental data to include one or more temperature measurements.
Example 198 is the apparatus of any of Examples 196 to 197, the environmental data to include one or more humidity measurements.
Example 199 is the apparatus of any of Examples 196 to 198, the environmental data to include one or more air quality measurements.
Example 200 is the apparatus of any of Examples 196 to 199, the environmental data to include one or more pressure measurements.
Example 201 is the apparatus of any of Examples 189 to 200, comprising means for adding the maintenance task to a pending task queue following identification of the maintenance task.
Example 202 is the apparatus of Example 201, comprising means for determining to initiate automated performance of the maintenance task based on a determination that the maintenance task constitutes a highest priority task among one or more maintenance tasks comprised in the pending task queue.
Example 203 is the apparatus of any of Examples 189 to 202, comprising means for selecting the automated maintenance device from among one or more automated maintenance devices in a candidate device pool.
Example 204 is the apparatus of any of Examples 189 to 203, comprising means for selecting the automated maintenance device based on one or more capabilities of the automated maintenance device.
Example 205 is the apparatus of any of Examples 189 to 204, comprising means for selecting the automated maintenance device based on position data received from the automated maintenance device.
Example 206 is the apparatus of any of Examples 189 to 205, the automation command to comprise a maintenance task code indicating a task type associated with the maintenance task.
Example 207 is the apparatus of any of Examples 189 to 206, the automation command to comprise location information associated with the maintenance task.
Example 208 is the apparatus of Example 207, the location information to include a rack identifier (ID) associated with a rack within the data center.
Example 209 is the apparatus of any of Examples 207 to 208, the location information to include a sled space identifier (ID) associated with a sled space within the data center.
Example 210 is the apparatus of any of Examples 207 to 209, the location information to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 211 is the apparatus of any of Examples 189 to 210, the automation command to comprise a sled identifier (ID) associated with a sled within the data center.
Example 212 is the apparatus of any of Examples 189 to 211, the automation command to comprise a physical resource identifier (ID) associated with a physical resource within the data center.
Example 213 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise replacement of a sled.
Example 214 is the apparatus of Example 213, the sled to comprise a compute sled, an accelerator sled, a memory sled, or a storage sled.
Example 215 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise replacement of one or more components of a sled.
Example 216 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise repair of one or more components of a sled.
Example 217 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise testing of one or more components of a sled.
Example 218 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise cleaning of one or more components of a sled.
Example 219 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise power cycling one or more memory modules.
Example 220 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise power cycling one or more non-volatile storage devices.
Example 221 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise storing a compute state of a compute sled, replacing the compute sled with a second compute sled, and transferring the stored compute state to the second compute sled.
Example 222 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise replacing one or more cache memory modules of a processor.
Example 223 is an automated maintenance device, comprising means for identifying a collaborative maintenance procedure to be performed in a data center, means for identifying a second automated maintenance device with which to collaborate during performance of the collaborative maintenance procedure, and means for sending interdevice coordination information to the second automated maintenance device to initiate the collaborative maintenance procedure.
Example 224 is the automated maintenance device of Example 223, comprising means for identifying the collaborative maintenance procedure based on telemetry data associated with one or more physical resources of the data center.
Example 225 is the automated maintenance device of Example 224, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 226 is the automated maintenance device of any of Examples 224 to 225, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 227 is the automated maintenance device of any of Examples 224 to 226, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 228 is the automated maintenance device of any of Examples 224 to 227, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 229 is the automated maintenance device of any of Examples 223 to 228, comprising means for identifying the collaborative maintenance procedure based on environmental data comprising measurements of one or more aspects of ambient conditions within the data center.
Example 230 is the automated maintenance device of Example 229, comprising one or more sensors to generate the measurements comprised in the environmental data.
Example 231 is the automated maintenance device of any of Examples 229 to 230, the environmental data to comprise one or more temperature measurements.
Example 232 is the automated maintenance device of any of Examples 229 to 231, the environmental data to comprise one or more humidity measurements.
Example 233 is the automated maintenance device of any of Examples 229 to 232, the environmental data to comprise one or more air quality measurements.
Example 234 is the automated maintenance device of any of Examples 229 to 233, the environmental data to comprise one or more pressure measurements.
Example 235 is the automated maintenance device of Example 223, comprising means for identifying the collaborative maintenance procedure based on an automation command received from an automation coordinator for the data center.
Example 236 is the automated maintenance device of Example 235, comprising means for identifying the collaborative maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 237 is the automated maintenance device of any of Examples 223 to 236, comprising means for selecting the second automated maintenance device from among a plurality of automated maintenance devices in a candidate device pool for the data center.
Example 238 is the automated maintenance device of any of Examples 223 to 237, comprising means for identifying the second automated maintenance device based on a parameter comprised in a command received from an automation coordinator for the data center.
Example 239 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise replacing a sled.
Example 240 is the automated maintenance device of Example 239, the sled to comprise a compute sled.
Example 241 is the automated maintenance device of Example 240, the collaborative maintenance procedure to comprise removing the compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 242 is the automated maintenance device of Example 241, the memory card to store a compute state of the compute sled.
Example 243 is the automated maintenance device of Example 242, the collaborative maintenance procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 244 is the automated maintenance device of Example 239, the sled to comprise an accelerator sled, a memory sled, or a storage sled.
Example 245 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise replacing a component on a sled.
Example 246 is the automated maintenance device of Example 245, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 247 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 248 is the automated maintenance device of Example 247, the collaborative maintenance procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 249 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise servicing a component on a sled.
Example 250 is the automated maintenance device of Example 249, the collaborative maintenance procedure to comprise removing the sled from a sled space of a rack.
Example 251 is the automated maintenance device of any of Examples 249 to 250, the collaborative maintenance procedure to comprise removing the component from the sled.
Example 252 is the automated maintenance device of any of Examples 249 to 251, the collaborative maintenance procedure to comprise testing the component.
Example 253 is the automated maintenance device of any of Examples 249 to 252, the collaborative maintenance procedure to comprise cleaning the component.
Example 254 is the automated maintenance device of any of Examples 249 to 253, the collaborative maintenance procedure to comprise power-cycling the component.
Example 255 is the automated maintenance device of any of Examples 249 to 254, the collaborative maintenance procedure to comprise capturing one or more images of the component.
Example 256 is the automated maintenance device of any of Examples 249 to 255, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 257 is the automated maintenance device of any of Examples 223 to 256, the interdevice coordination information to comprise a rack identifier (ID) associated with a rack within the data center.
Example 258 is the automated maintenance device of any of Examples 223 to 257, the interdevice coordination information to comprise a sled space identifier (ID) associated with a sled space within the data center.
Example 259 is the automated maintenance device of any of Examples 223 to 258, the interdevice coordination information to comprise a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 260 is the automated maintenance device of any of Examples 223 to 259, the interdevice coordination information to comprise a sled identifier (ID) associated with a sled within the data center.
Example 261 is the automated maintenance device of any of Examples 223 to 260, the interdevice coordination information to comprise a component identifier (ID) associated with a component on a sled within the data center.
Numerous specific details have been set forth herein to provide a thorough understanding of the embodiments. It will be understood by those skilled in the art, however, that the embodiments may be practiced without these specific details. In other instances, well-known operations, components, and circuits have not been described in detail so as not to obscure the embodiments. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the embodiments.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
Unless specifically stated otherwise, it may be appreciated that terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices. The embodiments are not limited in this context.
It should be noted that the methods described herein do not have to be executed in the order described, or in any particular order. Moreover, various activities described with respect to the methods identified herein can be executed in serial or parallel fashion.
Although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all adaptations or variations of various embodiments. It is to be understood that the above description has been made in an illustrative fashion, and not a restrictive one. Combinations of the above embodiments, and other embodiments not specifically described herein will be apparent to those of skill in the art upon reviewing the above description. Thus, the scope of various embodiments includes any other applications in which the above compositions, structures, and methods are used.
It is emphasized that the Abstract of the Disclosure is provided to comply with 37 C.F.R. §1.72(b), requiring an abstract that will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate preferred embodiment. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein,” respectively. Moreover, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Claims
1. An automated maintenance device, comprising:
- processing circuitry; and
- non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to cause the automated maintenance device to: receive an automation command from an automation coordinator for a data center; identify an automated maintenance procedure based on the received automation command; and perform the identified automated maintenance procedure in the data center.
2. The automated maintenance device of claim 1, the automated maintenance procedure to comprise replacing a compute sled in the data center.
3. The automated maintenance device of claim 2, the automated maintenance procedure to comprise:
- removing the compute sled from a sled space within a rack;
- removing a memory card from a connector slot of the compute sled, the memory card to store a compute state of the compute sled;
- inserting the memory card into a connector slot of a replacement compute sled;
- inserting the replacement compute sled into the sled space; and
- initiating a restoration of the stored compute state on the replacement compute sled.
4. The automated maintenance device of claim 1, the automated maintenance procedure to comprise replacing one or more cache memory modules of a processor on a sled.
5. The automated maintenance device of claim 4, the automated maintenance procedure to comprise:
- removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor;
- removing the one or cache memory modules;
- inserting one or more replacement cache memory modules; and
- reinserting the processor into the socket.
6. The automated maintenance device of claim 5, the automated maintenance procedure to comprise:
- removing a heat sink from atop the processor prior to removing the processor from the socket; and
- reinstalling the heat sink after reinserting the processor into the socket.
7. The automated maintenance device of claim 1, comprising a radio frequency (RF) interface to receive a wireless signal comprising the automation command.
8. An apparatus for coordination of automated data center maintenance, comprising:
- processing circuitry; and
- non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to: identify a maintenance task to be performed in a data center; determine to initiate automated performance of the maintenance task; select an automated maintenance device to which to assign the maintenance task; and send an automation command to cause the automated maintenance device to perform an automated maintenance procedure associated with the maintenance task.
9. The apparatus of claim 8, the non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to identify the maintenance task based on telemetry data associated with one or more physical resources of the data center.
10. The apparatus of claim 8, the non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to identify the maintenance task based on environmental data received from one or more automated maintenance devices of the data center.
11. The apparatus of claim 8, the non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to add the maintenance task to a pending task queue following identification of the maintenance task.
12. The apparatus of claim 11, the non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to determine to initiate automated performance of the maintenance task based on a determination that the maintenance task constitutes a highest priority task among one or more maintenance tasks comprised in the pending task queue.
13. The apparatus of claim 12, the non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to select the automated maintenance device from among one or more automated maintenance devices in a candidate device pool.
14. A method for automated data center maintenance, comprising:
- receiving, at an automated maintenance device, an automation command from an automation coordinator for a data center;
- identifying, by processing circuitry of the automated maintenance device, an automated maintenance procedure based on the received automation command; and
- performing the identified automated maintenance procedure in the data center.
15. The method of claim 14, the automated maintenance procedure to comprise replacing a compute sled in the data center.
16. The method of claim 15, the automated maintenance procedure to comprise:
- removing the compute sled from a sled space within a rack;
- removing a memory card from a connector slot of the compute sled, the memory card to store a compute state of the compute sled;
- inserting the memory card into a connector slot of a replacement compute sled;
- inserting the replacement compute sled into the sled space; and
- initiating a restoration of the stored compute state on the replacement compute sled.
17. The method of claim 14, the automated maintenance procedure to comprise replacing one or more cache memory modules of a processor on a sled.
18. The method of claim 17, the automated maintenance procedure to comprise:
- removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor;
- removing the one or cache memory modules;
- inserting one or more replacement cache memory modules; and
- reinserting the processor into the socket.
19. The method of claim 18, the automated maintenance procedure to comprise:
- removing a heat sink from atop the processor prior to removing the processor from the socket; and
- reinstalling the heat sink after reinserting the processor into the socket.
20. At least one non-transitory computer-readable storage medium comprising a set of instructions that, when executed by an automation coordinator for a data center, cause the automation coordinator to:
- identify a maintenance task to be performed in a data center;
- determine to initiate automated performance of the maintenance task;
- select an automated maintenance device to which to assign the maintenance task; and
- send an automation command to cause the automated maintenance device to perform an automated maintenance procedure associated with the maintenance task.
21. The at least one non-transitory computer-readable storage medium of claim 20, comprising instructions that, when executed by the automation coordinator, cause the automation coordinator to identify the maintenance task based on telemetry data associated with one or more physical resources of the data center.
22. The at least one non-transitory computer-readable storage medium of claim 20, comprising instructions that, when executed by the automation coordinator, cause the automation coordinator to identify the maintenance task based on environmental data received from one or more automated maintenance devices of the data center.
23. The at least one non-transitory computer-readable storage medium of claim 20, comprising instructions that, when executed by the automation coordinator, cause the automation coordinator to add the maintenance task to a pending task queue following identification of the maintenance task.
24. The at least one non-transitory computer-readable storage medium of claim 23, comprising instructions that, when executed by the automation coordinator, cause the automation coordinator to determine to initiate automated performance of the maintenance task based on a determination that the maintenance task constitutes a highest priority task among one or more maintenance tasks comprised in the pending task queue.
25. The at least one non-transitory computer-readable storage medium of claim 24, comprising instructions that, when executed by the automation coordinator, cause the automation coordinator to select the automated maintenance device from among one or more automated maintenance devices in a candidate device pool.
Type: Application
Filed: Jul 19, 2017
Publication Date: Jan 25, 2018
Inventors: MOHAN J. KUMAR (Aloha, OR), MURUGASAMY K. NACHIMUTHU (Beaverton, OR), AARON GORIUS (Upton, MA), MATTHEW J. ADILETTA (Bolton, MA), MYLES WILDE (Charlestown, MA), MICHAEL T. CROCKER (Portland, OR), DIMITRIOS ZIAKAS (Hillsboro, OR)
Application Number: 15/654,615