SYSTEMS AND METHODS FOR SENSING, RECORDING, ANALYZING AND REPORTING ENVIRONMENTAL CONDITIONS IN DATA CENTERS AND SIMILAR FACILITIES
The present disclosure pertains to utilizing hardware and software to control and record environmental and other data obtained from sensors and other devices, placed throughout a facility, and analyzing and displaying the information in a detailed status report of the environmental conditions inside facility, and once analyzed, the software can provide recommendations to implement measures that increase the efficiency of the facility.
This application is a continuation of U.S. patent application Ser. No. 16/787,711 (now U.S. Pat. No. 11,284,544), filed on Feb. 11, 2020, entitled “SYSTEMS AND METHODS FOR SENSING, RECORDING, ANALYZING AND REPORTING ENVIRONMENTAL CONDITIONS IN DATA CENTERS AND SIMILAR FACILITIES,” which is a continuation-in-part of U.S. patent application Ser. No. 16/383,216 (now U.S. Pat. No. 10,863,330), filed on Apr. 12, 2019, entitled “SYSTEMS AND METHODS FOR SENSING, RECORDING, ANALYZING AND REPORTING ENVIRONMENTAL CONDITIONS IN DATA CENTERS AND SIMILAR FACILITIES,” which is a continuation-in-part of U.S. patent application Ser. No. 15/369,537 (now U.S. Pat. No. 10,516,981), filed on Dec. 5, 2016, entitled “SYSTEMS AND METHODS FOR SENSING, RECORDING, ANALYZING AND REPORTING ENVIRONMENTAL CONDITIONS IN DATA CENTERS AND SIMILAR FACILITIES,” which claimed priority to U.S. provisional application No. 62/262,715, filed Dec. 3, 2015, entitled SYSTEMS AND METHODS FOR SENSING, RECORDING, ANALYZING AND REPORTING ENVIRONMENTAL CONDITIONS IN DATA CENTERS AND SIMILAR FACILITIES, all of which are hereby incorporated by reference in their entirety as though fully set forth herein.
TECHNICAL FIELDThe present disclosure pertains to sensing, measuring, recording, and reporting environmental parameters and conditions in facilities such as data centers. In particular, the present disclosure pertains to systems and methods of utilizing specialized electronics and software to control and record environmental conditions, power consumption, and/or other business and/or technical data obtained from the specialized electronics and/or other devices, placed throughout a facility. In some examples, the data may be analyzed and/or displayed in one or more detailed status reports (and/or other reports).
In some examples, the specialized electronics contemplated by the present disclosure may include sensors configured to sense, detect, determine, measure, and/or record temperatures, air pressures, air flows, various humidities, power consumption, indoor locations (detected by onboard electronics, GPS, beacons, time of flight I time of arrival, etc.), motion, occupancy, light, and/or vibrations. In some examples, the sensors may be mounted, attached, retained and/or otherwise positioned at various locations and/or heights throughout the facilities. In some examples, the sensors may be part of one or more sensor systems comprising one or more sensor modules and/or sensor strands.
In some examples, software may analyze the data from the sensors (e.g., using one or more Computational Fluid Dynamics (CFD) analysis techniques) and provide a detailed view into the environment, conditions, and/or equipment within the facility. For example, the software may analyze data from the sensors and determine power density, cooling requirements, cooling supply, air flow, temperature gradients, and/or other information pertaining to the facility. In some examples, the software may also provide visualizations that can be used to help understand the analytics and/or performance of the facility. Further, the software may allow entry of hypothetical data and/or analysis to test theoretical scenarios and/or circumstances. In some examples, the software may provide one or more recommendations to implement measures that increase the efficiency of the facility. In some examples, the software may consider standard operating procedures, best practices, audit and compliance logs, fault detection, and/or other information when making the recommendation(s).
In some examples, some or all of the data collected by the sensors and/or analytical data determined by the system may be mapped and/or otherwise associated with one or more geographic locations (e.g., of the sensors) within the facilities. In some examples, the data collected by the system and/or analytical data determined by the system may be used to assist in controlling the supporting infrastructure at the facility, such as, for example Heating, Ventilation and Air Conditioning (HVAC) equipment, lighting systems, computing systems, security systems, and/or other appropriate systems. In some examples, the data collected by the system and/or analytical data determined by the system may be used to assist in ensuring that facilities and infrastructure adapt to the most optimal and efficient operation as Information Technology (IT) loads in those facilities evolve.
BACKGROUNDData centers store computer systems, such as, for example, computer servers. Such computer servers are sometimes used to host and/or facilitate network applications. Data centers also use a variety of associated support systems, such as, for example, environmental controls (air conditioning, fire suppression devices, etc.) as well as various security devices.
Data centers typically cost a substantial amount to build and maintain. Part of the cost is the enormous amount of electricity data centers need to run properly. The Department of Energy (DOE) has estimated that approximately half of the energy used to power a data center is used for the cooling and powering of equipment, with the other half going to actually running the servers and other computing equipment. According to DOE statistics, data center electricity use doubled between 2001 to 2006, from 30 to 60 billion kilowatt-hours of electricity, and stood at about 100 billion kilowatt-hours of electricity as of 2013. This amounts to about 2% of all U.S. electricity use and is increasing. Already, there are millions of data centers in the U.S., amounting to about one center per 100 individuals, and this is expected to continue to grow as more computing applications for large and small companies are moved to these facilities.
Data centers are often large enough to need to be housed in large buildings. There are often thousands of computing devices in a large data center. Additionally, the physical arrangement of the computing equipment can change inside data centers. Unfortunately, Computer Aided Design (CAD) drawings used for asset management are constantly out of date due to frequent upgrades and/or changes to the arrangement of equipment inside a data center.
Computing and/or power demands can also shift rapidly within data centers. For example, if the data center acquires a new customer that requires a substantial amount of computing power, the processors in the data center could see dramatically higher utilization in a short time frame. This increased utilization may produce significantly more heat in the physical space that the processors occupy. At the same time, support infrastructure (e.g.,cooling systems, airflow distribution, humidity controls, etc.) may remain relatively static. This can drive up overall operating costs. Over time, incremental changes to the computing and/or power demands may increase the demands on the support infrastructure until the support infrastructure is no longer adequate to safely support the operation of the computing equipment. In some cases, IT changes can be so significant that the environmental conditions push elements of the IT equipment into fault tolerances and can put quality of service at risk.
Limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with the present disclosure as set forth in the remainder of the present application with reference to the drawings.
BRIEF SUMMARYThe present disclosure pertains to a system and method of utilizing software to control and record environmental and other data obtained from sensors and other devices, placed throughout a facility such as a data center. The system and methods are configured to analyze the information obtained from the sensors and to display the information in a detailed status report of the environmental conditions inside the facility, using graphs, maps, charts, windows, dashboards, histograms, scatter plots, and other types of presentation strategies.
As described herein, the disclosure provides for the sensing and measuring of environmental parameters and conditions, which may include some or all of the following: temperature, air pressure, humidity, and power consumption, and others, at various locations and heights throughout the facility. By using the software associated with the system, the user can receive detailed views into the power density, cooling requirements, and cooling supply of the facility. The software can also be configured to provide standard and custom visualizations to view and understand either the low level data, or the high level analytics. Either way, the user is provided analysis and an understanding of the performance of the facility.
In the some examples, the system of the present disclosure is a combination of five main components: sensor network nodes, known location nodes, gateway/edge nodes, a cloud computing component, and a user interface. In some examples, a mesh network (a wireless personal area network or WPAN) and/or a wide area network (WAN) are deployed to connect all of the five main components.
The present disclosure describes systems that have the potential to create flexibility and improve the performance of major support infrastructure assets within data centers that are traditionally static. Data center operators will have access to environmental information that previously had never existed or was not feasible to measure at this level of granularity, all in real time. Operators will also have to ability to reconfigure the sensor fleet with nominal input or configuration required, enabling the data required to keep the infrastructure in sync with the IT immediately available. Ultimately, data center operators are empowered to take action with their infrastructure and drive improvements in redundancy, efficiency, IT equipment performance, lower PUE, and decrease operating costs.
One of the benefits of the present disclosure is that data center operators will be able to review and revise their environmental settings, as described herein, and reduce electricity usage in a way that can offer immediate and ongoing savings. The amount of savings is highly dependent on the size of the data center, the equipment installed, and the infrastructure systems. For example, a typical 800 kw data center could see between $50,000 and $300,000 in power savings annually, which will be even higher as the trend for larger data centers and higher power density continues, thereby increasing these savings.
Another benefit of the present disclosure is the reduced cost in new designs. The design and commission of new data centers often start off with significantly over designed infrastructure, including designs that go beyond necessary redundancy requirements. Over time, data center operators slowly reach the “limits” of the infrastructure as the IT equipment rapidly changes inside and increases in density. Typically, once those limits have been reached, a consultant or an internal team is called in to redesign and often over design the system update, starting the process all over again.
The present disclosure, using appropriate sensors and similar devices, allows data center operators to have access to data at the granularity required, which does not currently exist. This information can be used for thorough thermodynamic analysis of the environmental systems, allowing for vastly improved efficiencies infrastructure efficiencies, and in many cases deferring the need for major upgrades. Infrastructure upgrades can vary in cost, but typically range between $50,000 and $2,000,000 depending on the size and scope of the improvement. The present disclosure also provides for a reduction in the operational cost of sensor management and data gathering.
In yet another benefit of the present disclosure, the sensors and software help improve infrastructure utilization and efficiency, increase reliability, and better protect against brown/black out power shortages. The present disclosure also improves monitoring and alarms that can serve as an early warning sign to help prevent a data center outage. According to an Emerson study in 2011, data centers worldwide suffered complete outages an average of 2.5 times during the year and lasted an average of 134 minutes per outage. The downtime cost of a data center averages approximately $300,000 per hour resulting in $1,700,000 in downtime per data center per year.
In the present disclosure, the sensors and similar devices attach to the outside of server racks using magnets, bolts, clips, or plugs, or any other attachment techniques that would not interfere with the operation of the system and sensors, as known by those having ordinary skill in the art and depending on the device. Although other configurations are possible, as long as the devices can be located properly to sense environmental and other data. The system software can be cloud hosted, virtualized, or run locally.
The system software controls the sensors and measures data from the sensors that are placed throughout the facility. The system software can also display detailed information or status of the environmental conditions inside the data center. Some of the environmental parameters to be measured include, but are not limited to, temperature, air pressure, humidity, and IT power. The system software provides a detailed view into the power density, cooling requirements, and cooling supply of the data center, among other information, including, but not limited to, Computational Fluid Dynamics (CFD) analysis indicating air flow and temperature gradient throughout the facility.
Standard and custom visualizations will be used to view the low level data, and high level analytics will be used to analyze the performance of the data center and recommend or allow the implementation of measures that increase the efficiency. Standard operating procedures, best practices, audit and compliance logs, and fault detection are built into the software, as described herein. Further, the software can allow for hypothetical analysis to test theoretical scenarios and circumstances. All of the actual measured values and calculated analytics determined by the system can be mapped to the geographic location of the sensors.
In the present disclosure, the hardware products, sensors and similar products, utilize magnets or other attachment devices to attach to the side of a server rack. The hardware products can even be used to replace existing power supply power cables, in some configurations. By utilizing multiple sensors and magnets that attach to shelves at various heights on the rack, the sensors can measure vital temperatures at multiple points on the rack, as opposed to a single temperature measurement. The sensors can also be used to measure relative humidity and ambient pressure which gives a full picture of the data center environment in general and at specific locations, which can be automatically mapped in the data center by the sensors. Power monitors can replace the existing server power supply cables, and the sensor configurations are completely customizable and flexible for a variety of data center configurations and for growth.
In some examples, a computing system of the present disclosure may use sensor data to keep track of pertinent activities and/or events involving server racks in a data center. This may assist in keeping track of and/or managing valuable data center assets. This may also help with fulfilling certain tracking and/or logging obligations for server tenants.
In addition to keeping track of events involving server racks, in some examples, the computing system may also be used to keep track of the health of cooling equipment. The cooling equipment may be located within the data center and/or outside of the data center. Keeping track of the health of the cooling equipment can be an important task, as proper operation of cooling equipment is essential to the continued functioning of a data center.
In some examples, the computing system may also be used to determine inefficiencies within the data center (e.g., pertaining to the environmental conditions of the data center). In some examples, the computing system may additionally recommend corrective action to remedy the inefficiencies. Because of the high cost of operating a data center, the cost savings that come with correcting even small inefficiencies can be significant. Likewise, the cost to allowing inefficiencies to fester can be significant.
Other objects and advantages of the present disclosure will become apparent to one having ordinary skill in the art after reading the specification in light of the drawing figures, however, the spirit and scope of the present disclosure should not be limited to the description of the exemplary embodiments contained herein.
The figures are not necessarily to scale. Where appropriate, the same or similar reference numerals are used in the figures to refer to similar or identical elements. For example, reference numerals utilizing lettering (e.g., rack sensor strand 2020a, plenum sensor strand 2020b) refer to instances of the same reference numeral that does not have the lettering (e.g., sensor strands 2020).
DETAILED DESCRIPTIONSome examples of the present disclosure may relate to a system, comprising a sensor system configured to mount to a server rack within a data center, the sensor system comprising a sensor configured to measure data within the data center, a computing system configured to receive the data, the computing system comprising processing circuitry, and memory circuitry comprising machine readable instructions which, when executed, cause the processing circuitry to determine a position of the sensor within the data center, determine an efficiency indicator based on the data measured by the sensor and the position of the sensor, determine whether there is an inefficiency within the data center based on the efficiency indicator, and in response to determining there is an inefficiency, recommend a solution to the inefficiency.
In some examples, the sensor comprises a first sensor, the sensor system further comprises a second sensor configured to measure data within the data center, the position of the first sensor comprises a first position, the memory circuitry comprises machine readable instructions which, when executed, further cause the processing circuitry to determine a second position of the second sensor within the data center, and the efficiency indicator is determined based on the data measured by the first and second sensors, as well as the first position of the first sensor, and the second position of the second sensor. In some examples, the position of the sensor is determined using position data obtained via a local positioning system or a relative positioning system of the data center. In some examples, the data comprises thermal data, humidity data, or pressure data. In some examples, the efficiency indicator comprises a hot spot, an airflow direction, an airflow magnitude, a horizontal temperature gradient, a vertical temperature gradient, or a server rack utilization. In some examples, the inefficiency comprises a temperature above a maximum temperature threshold, a reversed air flow, a horizontal temperature gradient below a low delta threshold, a reversed horizontal temperature gradient, a vertical temperature gradient above a high delta threshold, or a reversed vertical temperature gradient. In some examples, recommending the solution comprises generating a diagram showing a location of a server rack or cooling component within the data center that is impacted by the inefficiency or that will be impacted by the solution, generating a cost saving analysis that includes the solution, or generating a work order to implement the solution.
Some examples of the present disclosure relate to a method of determining inefficiencies in a data center, comprising measuring data within the data center via a sensor of a sensor system configured to mount to a server rack, determining a position of the sensor within the data center, determining an efficiency indicator based on the data measured by the sensor and the position of the sensor, determining whether there is an inefficiency within the data center based on the efficiency indicator, and in response to determining there is an inefficiency, recommending a solution to the inefficiency.
In some examples, the sensor comprises a first sensor, the data is measured via the first sensor and a second sensor of the server rack sensor system, the position comprises a first position, the method further comprises determining a second position of the second sensor within the data center, and the efficiency indicator is determined based on the first position and second position, as well as the data measured by the first sensor and second sensor. In some examples, determining the position of the sensor comprises determining the position via a local positioning system or a relative positioning system of the data center. In some examples, the data comprises thermal data, humidity data, or pressure data. In some examples, the efficiency indicator comprises a hot spot, an airflow direction, a change in temperature, or a temperature gradient. In some examples, the inefficiency comprises a temperature above a temperature threshold, a reverse air flow, a change in temperature above a high delta threshold, a change in temperature below a low delta threshold, a temperature gradient above a gradient threshold, or a reversed temperature gradient. In some examples, the solution comprises a reconfiguration of a server mounted in the server rack, a consolidation of a processing load to fewer server racks, a disbursement of the processing load to more server racks, an installation of a blanking panel in the server rack, an installation of a containment solution around the server rack, a modification of an air supply medium, or a cooling system configuration change.
Some examples of the present disclosure relate to a non-transitory machine readable medium, comprising machine readable instructions which, when executed by a processor determine a position of a sensor within a data center, the sensor being part of a sensor system mounted to a server rack within the data center, the sensor being configured to measure data within the data center, determine an efficiency indicator based on the data measured by the sensor and the position of the sensor, determine whether there is an inefficiency within the data center based on the efficiency indicator, and in response to determining there is an inefficiency, recommend a solution to the inefficiency.
In some examples, the position of the sensor is determined using position data obtained via a local positioning system or a relative positioning system of the data center. In some examples, the data comprises thermal data, humidity data, or pressure data. In some examples, the efficiency indicator comprises a hot spot, an airflow direction, a change in temperature, or a temperature gradient. In some examples, the inefficiency comprises an temperature above a temperature threshold, a reverse air flow, a change in temperature above a high delta threshold, a change in temperature below a low delta threshold, a temperature gradient above a gradient threshold, or a reversed temperature gradient. In some examples, the solution comprises a reconfiguration of a server mounted in the server rack, a consolidation of a processing load to fewer server racks, a disbursement of the processing load to more server racks, an installation of a blanking panel in the server rack, an installation of a containment solution around the server rack, a modification of an air supply medium, or a cooling system configuration change.
Some examples of the present disclosure relate to a cooling monitoring system, comprising a sensor system configured to mount to cooling equipment of a data center, the sensor system configured to measure a cooling equipment parameter, a computing system configured to receive the cooling equipment parameter, the computing system comprising processing circuitry, and memory circuitry comprising a stored health threshold and computer readable instructions which, when executed, cause the processing circuitry to determine a health of the cooling equipment based on the cooling equipment parameter, compare the health of the cooling equipment to the stored health threshold, and in response to determining the health of the cooling equipment is below the stored health threshold, perform an action.
In some examples, the action comprises generating a notification. In some examples, the action comprises determining whether there exists a work order corresponding to a planned or contemporaneous maintenance of the cooling equipment, in response to determining the work order does exist, indicating or confirming the work order is still needed, and in response to determining the work order does not exist, generating a notification or a new work order. In some examples, the computing system further comprises communication circuitry, the sensor system is configured to measure the cooling equipment parameter during a measuring time period, and the action comprises communicating, via the communication circuitry, with a security system regarding security data corresponding to the measuring time period, and associating the security data with the cooling equipment and the measuring time period in the memory circuitry.
In some examples, the memory circuitry further comprises a parameter signature, and wherein the health of the cooling equipment is determined based on a comparison of the cooling equipment parameter to the parameter signature. In some examples, the parameter signature is associated with good health. In some examples, the health of the cooling equipment is determined based on a degree of difference between the cooling equipment parameter and the parameter signature. In some examples, the cooling equipment parameter comprises a first cooling equipment parameter, the sensor system is configured to measure a second cooling equipment parameter, and the health of the cooling equipment is determined based on the first cooling equipment parameter and the second cooling equipment parameter. In some examples, the cooling equipment parameter comprises a temperature in or around the cooling equipment, a pressure in or around the cooling equipment, a humidity in or around the cooling equipment, a vibration of the cooling equipment, a vibration harmonic of the cooling equipment, or a power characteristic of the cooling equipment.
Some examples of the present disclosure relate to a method of monitoring cooling equipment of a data center, comprising measuring a cooling equipment parameter via a sensor system mounted on a component of the cooling equipment, determining, via processing circuitry, a health of the cooling equipment based on the cooling equipment parameter, comparing the health of the cooling equipment to a stored health threshold, and in response to determining the health of the cooling equipment is below the stored health threshold, performing an action.
In some examples, performing the action comprises generating a notification. In some examples, performing the action comprises determining whether there exists a work order corresponding to a planned or contemporaneous maintenance of the cooling equipment, in response to determining the work order does exist, indicating or confirming the work order is still needed, and in response to determining the work order does not exist, generating a notification or a new work order. In some examples, the cooling equipment parameter is measured during a measuring time period, and performing the action comprises communicating, via communication circuitry, with a security system regarding security data corresponding to the measuring time period, and associating the security data with the cooling equipment and the measuring time period in memory circuitry. In some examples, determining the health of the cooling equipment comprises determining the health based on a comparison of the cooling equipment parameter to a parameter signature stored in memory. In some examples, the parameter signature is associated with good health. In some examples, determining the health of the cooling equipment further comprises determining a degree of difference between the cooling equipment parameter and the parameter signature. In some examples, the cooling equipment parameter comprises a first cooling equipment parameter, and wherein the method further comprises measuring a second cooling equipment parameter via the sensor module, wherein determining the health of the cooling equipment comprises determining the health based on the first cooling equipment parameter and the second cooling equipment parameter. In some examples, the cooling equipment parameter comprises a temperature in or around the cooling equipment, a humidity in or around the cooling equipment, a vibration of the cooling equipment, a vibration harmonic of the cooling equipment, or a power characteristic of the cooling equipment.
Some examples of the present disclosure relate to a server rack monitoring system, comprising a sensor system configured to mount to a server rack, the sensor system configured to measure a server rack parameter, a computing system configured to receive the server rack parameter, the computing system comprising, processing circuitry, and memory circuitry comprising one or more stored parameter signatures and computer readable instructions which, when executed, cause the processing circuitry to determine whether a server rack event has occurred based on a comparison of the server rack parameter with the one or more stored parameter signatures, and in response to determining the server rack event has occurred, perform an action.
In some examples, the action comprises logging the server rack event in memory circuitry. In some examples, the sensor system is configured to measure the server rack parameter during a measurement time period, and wherein logging the server rack event in memory circuitry comprises associating the server rack event with the server rack and the measurement time period in memory circuitry. In some examples, the action comprises determining whether there exists a work order corresponding to the server rack event, in response to determining there does exist a work order corresponding to the server rack event, indicating the work order is in process, and in response to determining there does not exist a work order corresponding to the server rack event, generating a notification. In some examples, the computing system further comprises communication circuitry, the sensor system is configured to measure the server rack parameter during a measuring time period, and the action comprises communicating, via the communication circuitry, with a security system regarding security data corresponding to the measuring time period in response to determining that the server rack event occurred, and associating the security data with the server rack event, server rack, and the measuring time period in the memory circuitry. In some examples, the server rack parameter comprises a temperature in or around the server rack, a humidity in or around the server rack, a pressure in or around the server rack, a light intensity around the server rack, a vibration of the server rack, or a power characteristic of the server rack. In some examples, the sensor system is configured to adjust a setting of the sensory system in response to the server rack parameter being outside of a threshold range. In some examples, the setting comprises: an enablement of a sensor of the sensor system, a sample rate of the sensor system, a maximum frequency rate of the sensor system, a maximum measurement range of the sensor system, an operating mode of the sensor system, a power mode of the sensor system, a performance mode of the sensor system, or a bandwidth of the sensor system. In some examples, the memory circuitry comprises a plurality of parameter signatures and a previous server rack event, and the computer readable instructions, when executed, further cause the processing circuitry to select the one or more parameter signatures from the plurality of parameter signatures based on the previous server rack event. In some examples, the server rack event comprises a door open event, a door close event, a server installation event, a server removal event, a cable event, a rack disturbance event, a fan event, a drive failure, a server restart, a natural disaster, or an abnormal operation.
Some examples of the present disclosure relate to a method of server rack monitoring, comprising measuring a server rack parameter via a sensor system mounted to a server rack, determining, via processing circuitry, whether a server rack event has occurred based on a comparison of the server rack parameter with one or more parameter signatures stored in memory circuitry, and in response to determining the server rack event has occurred, performing an action.
In some examples, the action comprises logging the server rack event in the memory circuitry. In some examples, the server rack parameter is measured during a measurement time period, and logging the server rack event in the memory circuitry comprises associating the server rack event with the server rack and the measurement time period in the memory circuitry. In some examples, the action comprises determining whether there exists a work order corresponding to the server rack event, in response to determining there does exist a work order corresponding to the server rack event, indicating the work order is in process, and in response to determining there does not exist a work order corresponding to the server rack event, issuing an alert. In some examples, the action comprises associating security data with the server rack event in memory. In some examples, the server rack parameter comprises a temperature in or around the server rack, a humidity in or around the server rack, a vibration of the server rack, or a power characteristic of the server rack. In some examples, the method further comprises adjusting a setting of the sensor system in response to the server rack parameter being outside of a threshold range. In some examples, the setting comprises an enablement of a sensor of the sensor system, a sample rate of the sensor system, a maximum measurement range of the sensor system, an operating mode of the sensor system, a power mode of the sensor system, a performance mode of the sensor system, or a bandwidth of the sensor system. In some examples, the method further comprises determining a previous server rack event and selecting the one or more parameter signatures from a plurality of parameter signatures based on the previous server rack event. In some examples, the server rack event comprises a door open event, a door close event, a server installation event, a server removal event, a cable event, a fan event or a server restart.
The present disclosure pertains to systems and methods for obtaining environmental measurements (temperature, pressure, humidity, current, voltage, power, etc.) and associating them with sensor location or positional data and time data at a facility, such as a data center (“the environmental reporting system”). These devices are designed to operate as Internet of Things (IoT) devices that communicate over a customized low power mesh network. They are designed to solve two very complex problems simply: (1) environmental thermodynamic analysis and (2) sensor fleet management.
As described herein, the environmental reporting system provides for the sensing, analyzing and measuring of environmental parameters and conditions, which may include some or all of the following: temperature, air pressure, humidity, and power consumption, and others, at various locations and heights throughout the facility. By using the software associated with the system, the user can receive detailed views into the power density, cooling requirements, and cooling supply of the facility. The software can also be configured to provide standard and custom visualizations to view and understand either the low level data, or the high level analytics, so that the user is provided with analysis and an understanding of the performance of the facility.
To setup and install a sensor in the preferred embodiment is extremely easy. Turn it on, push a button, or use NFC to securely connect it to the network (no need to type in a Wi-Fi name or password), and use the magnets to attach it to a server rack (rack module). The sensor will begin to securely communicate encrypted traffic over the mesh network. Utilizing triangulation and trilateration technology, the sensors precisely, accurately, and automatically locate themselves in physical space and communicate their location data along with the environmental data.
This allows for the creation of a novel system and methods for measuring, analyzing and reporting environmental data that was previously unavailable, and at improved granularity. This allows for the generation of software to analyze locations of the sensors, collate the data, and create a 3D representation of the environment. Since the system collects time series data, as the space changes over time, the system gains valuable insights to the complicated thermodynamics and fluid dynamics in play. The ultimate result is better infrastructure management and greatly reduced energy costs.
The system is very robust and self-healing because of the energy scavenging hardware design and customized low power mesh network. The mesh network allows all the devices to use each other as relays to send data back to the server that collects it into the database, as opposed to a traditional star network topology that communicates back a single point, typically a Wi-Fi router. If a device fails, traffic can reroute back through the next nearest node automatically and is in effect, self-repairing.
An additional benefit to the mesh network protocol is that each additional device extends the range of the overall network by the net range radius of the additional device. This is similar to the idea of “daisy chains” in wired connections.
The sensor network nodes 102 consists of three different node types, with the purpose to measure different aspects of the data center 120 (see
The known location nodes 104 are permanently installed in the data center 120 and are used to assist in increasing the accuracy of indoor positioning. The known location nodes 104 also exist in the WPAN 112.
The gateway/edge nodes 106 connect the sensor network nodes 102 to the cloud 108, and provide processing power for analytics and decision making that require low latency. The gateway/edge nodes 106 exist in both the WPAN 112 and WAN 114.
The cloud 108 stores all of the data, provides processing power for the core analytics, and hosts the interface 110. The cloud 108 is understood by one having ordinary skill in the art.
The interface 110 is for the client to view the data and analytics, make decisions, and control the network and environment in the facility. The interface 110 is also used for displaying reports and other output and is understood by one having ordinary skill in the art.
The environmental reporting system 100 utilizes a mesh network 112, such as a wireless personal area network or WPAN, along with a wide area network 114 or WAN to connect all of the components. In the preferred embodiment, the WPAN 112 is the network created by the sensor network nodes. The WPAN 112 will exceed industry standard encryption methods and will be implemented via AES 128-bit encryption. Keys will be stored in dedicated tamper proof hardware and encrypted via 256-bit elliptical curve encryption. The WAN 114 is used for the bridge to communicate with the cloud. HTTPS and VPN tunnels will be implemented for communication purposes.
Of course, other connection platforms can be used to provide connections between the nodes, as understood by one having ordinary skill in the art. Additionally, the preferred embodiment utilizes power nodes 116 and coordinator nodes 118, which may be nodes of any type described above.
As an exemplary embodiment of the present disclosure, three separate hardware devices will be described: a rack node 68, a plenum node 70, and a power meter node 72. Each of the three sensor network node types and the known location nodes will consist of the same core but each has different sensor arrays to perform their specific functions.
The core 64 provides the ability to charge the internal battery from micro USB or energy harvesting mechanisms, monitor the battery, regulate power, read and write to the sensor array, wirelessly communicate with other modules, provide indoor positioning, accept user input, and provide user output. The sensor array 66 is made up of the sensors that are connected to each node type.
The following features of the core 64 and sensor array 66 functional block diagrams set forth in
Functional block [1] 20 is the external interface for charging a module from an external source. Charging 20 will be performed via a micro USB port 22 and will conform to the BC1.1 specification. All supporting integrated hardware will be selected to conform to this specification and to adequately support the power requirements of all of the functional blocks. Functional block [1] 20 will provide user output through functional block [7] 52.
Functional block [2] 24 is the onboard battery charging/energy harvesting/power source. Potential onboard power sources 24 include, but are not limited to, photovoltaic cells 26 and thermoelectric generators 2R The photovoltaic cells will use thin-film technology and the thermoelectric generators will use peltier elements. Both of the power sources will be selected and sized to adequately support the power requirements of all of the functional blocks. Photovoltaic cells 26 will be utilized when a light source is available and thermoelectric generators 28 will be utilized when a temperature differential is available. Functional block [2] 24 will provide user output through functional block [7] 52.
Functional block [3] 30 is the battery/power source. A rechargeable 18650 lithium ion battery 32 will be used. The Microchip 34 (MCP73831T and/or MCP73831-2ATI/MC) will be used for charge management. The Maxim MAX17043 and/or MAX17048G+ will be used for charge status monitoring, or a fuel gauge 36. The battery will be sized to adequately support the power requirements of all of the functional blocks without power being supplied from functional block [1] 20 or functional block [2] 24 for a minimum of two years. Functional block [3] 30 will provide user output through functional block [7] 52.
Functional block [4a] 38 is for wireless communication 34. Wireless communication 38 will be accomplished via 6LoWPAN (and/or a proprietary routing algorithm) on the 802.15.4 protocol. The preferred wireless radio is the decaWave DW1000. The wireless communication/carrier frequency will support 1,000+ nodes with low sampling frequency and low data rate. Typical ranges that will have to be supported are 50 feet in a data center environment. All wireless communications will be encrypted with AES 128-bit encryption, and keys will be stored using 256 elliptical curve encryption. Hardware encryption will be done with the Atmel ATECC508A and/or ATECC608A. Functional block [4] will provide user output through functional block [7] 52.
In an alternative embodiment, wireless communication 38 could be accomplished via low power Bluetooth. Bluetooth hardware could be selected to support the following protocols: Bluetooth 4.2 or newer, mesh networking (Bluetooth 4.2 or newer, CSRMesh, or custom developed), sleeping mesh networking (Bluetooth 4.2 or newer, CSRMesh, or custom developed), and beacons (iBeacon or uBeacon). NFC could be used to commission and configure a module via another NFC enabled device (smartphone). NFC hardware could also be selected to support ISO/IEC 14443 and ISO/IEC 18000-3. Functional block [4a] will provide user output through functional block [7] 52.
Functional block [4b] 38 also represents the indoor positioning. The indoor positioning will be accomplished with an ultra-wide band radio, which is the same or similar radio used for wireless communication in functional block [4a]. Indoor positioning will have an accuracy of <10 cm.
Functional block [5] 40 is data acquisition and orchestration. The hardware for the data acquisition and orchestration 40 will support analog and digital inputs, as well as the SPI, 12C, USART, and/or USB protocols, and general purpose processing to orchestrate the operations of the node. The preferred embodiment uses an ATMEL SAML21 and/or SAME70 microcontroller 42 for data acquisition and orchestration. Function block [5] 40 will be used to interface all of the other functional blocks.
Functional block [6] 44 is the user input. User input 44 will consist of a device on/off switch, button, touch pad, or other such technology 46, and a device commissioning switch 48, button, touch pad, or other such technology. The device commissioning input 48 will be used in place of or in tandem with the device commissioning from functional block [4] 34.
Functional block [7] 52 is the user output 52. User output 52 will consist of three RGB LEDs 54 (although more or less can be incorporated). In one configuration, the first RGB LED, power on LED will indicate if the unit is on, off, or has low power. The second RGB LED, status LED, will indicate the status of the wireless communications, indoor positioning and commissioning. The third RGB LED, notification LED, will indicate if the module is measuring alert or exception conditions. Different LED color combinations can be used for different indications.
Functional block [8] 58 is the sensor array 66. The sensors in the sensor array 66 are broken into two classifications, environment sensors 60 and power sensors 62. The environment sensors 60 are temperature, humidity, pressure, occupancy, movement, and lighting level. The temperature sensors to be selected will be a contact RTD sensor and digital sensor. The humidity sensor to be selected will be a digital relative humidity sensor. The pressure sensor to be selected will be a digital barometric pressure sensor. Pressure differentials will be used to calculate air flows. The power sensors 62 are current and voltage. Voltage and current sensors 62 will be selected to measure RMS values.
Exemplary sensors include temperature sensors (Bosch BME280, Murata NXFT), humidity sensors (Bosch BME280), pressure sensors (Bosch BME 280), light sensors (thin film), occupancy sensors, inertial movement sensors (STI LSM9DS1), and current sensors.
Communication from the gateway edge nodes 106 to the sensor network 102, 116, 118, and known location nodes 104 will be done over the WPAN 112. The gateway/edge nodes 106 will be able to communicate with the decaWave DW1000 radios in the sensor network nodes 102, 116, 118 and known location nodes 104. This can be done through a software defined radio (SDR) or through a USB interface (via the SAML21) to the decaWave radio.
The gateway/edge node 106 can be selected from commercially available IoT gateways and configured or modified to work with the sensor network nodes 102, 116, 118, and known location nodes 104. The gateway/edge node 106 is made up of four functional blocks; the power source block 152, the WPAN communication block 154 (SDR option), the WPAN communication block 156 (USB interface option), the WAN communication block 158, and the server block 160.
The gateway/edge node 106 will be powered from redundant 120 V single phase power supplies 162. Communication from the gateway/edge nodes 106 to the cloud 108 will be done over the WAN 114. This will be accomplished with a wired Ethernet connection 164, a Wi-Fi connection 166, or a cellular connection 168. All traffic will be routed through a VPN.
The server 160 will be a general purpose server 172 capable of running a host operating system (OS), preferably Linux. The OS will run the application code required to utilize functional block [2a] 158 and functional block [2b] 154, 156. In addition to this, application specific code will be located on the server 160.
In WPAN communication 154, using the SDR option 150, the gateway/edge nodes 106 will have an SDR 174 that will be configured to communicate with, for example, the decaWave DW1000. In the SDR option 150 no physical modifications to the white labeled IoT gateway will be required. It will however be necessary to configure the SDR 174.
In the WPAN communication 156, using the USB radio option 170, the gateway/edge 106 will have a USB port 176, which will be connected to a microcontroller 178, for example, the Atmel SAML21, which will act as a USB peripheral. The microcontroller 178 will be connected to a decaWave DW1000 180, as the decaWave DW1000 180 requires a host microcontroller to communicate over USB 176. In the USB radio option 170, physical modifications will be needed to facilitate communication of the gateway/edge router with the WPAN 112. These modifications utilize the same microcontroller as in the other nodes to provide a USB interface to the same radio used in the other nodes, which will allow for the same drives for the radio that are used in the other nodes. These physical modifications will reside either internal or external to the white labeled IoT gateway.
As described herein, modules communicate over a customized network that allows the devices to operate wirelessly, reliably, and for long periods of time with a low power consumption. This allows the module network to heal itself in the event that a module fails or loses power. The network is extremely robust and does not require a centralized point to communicate data. Modules will talk to the nearest device enabling a “daisy chain” of communication. This allows the network to operate with a range that grows with the communication radius of each device.
Potential protocols include, but are not limited to, 6LoWPAN Bluetooth 4.2 or newer, CSRMesh, or a proprietary developed network that may utilize any of the aforementioned protocols. In the preferred embodiment, the gateway/edge nodes 106 will be selected from white labeled commercially available IoT gateways. The gateway/edge nodes 106 gather data from the sensor network, store a rolling window locally, and send the data to the cloud 108. The gateway/edge nodes 106 will also be responsible for analyzing the incoming data and performing any required low latency processes.
Additionally, sleeping mesh networks are a specific subset of mesh network that allow for reduced power consumption. In between communications, modules in a sleeping mesh network can further reduce their power consumption by shutting off their receive and transmit functions and relying on a precise internal clock to re-enable them for periods of communication.
Modules will automatically be located using triangulation and trilateration protocols from time of flight/time of arrive measurements and customized hardware controls that drive energy usage down to very low levels. This allows the module to tie sensor array measurements to a location and thusly create a detailed map of the modules and surroundings.
Commissioning will be defined as the automated process of adding a module to the network, configuring the module, and testing and verifying the communications and sensor array.
In the preferred embodiment, the rack nodes 102 will consist of the core and the follow sensor array: seven temperature sensors, a humidity sensor, a pressure sensor, a light sensor, an occupancy sensor, and an inertial movement sensor. The plenum nodes 102 will consist of the core and the following sensor array: a temperature sensor, a humidity sensor, a pressure sensor, and an inertial movement sensor. The power nodes 116 will consist of the core and the following sensor array: a temperature sensor, a humidity sensor, a pressure sensor, a current sensor, and an inertial movement sensor. The known location nodes 104 will consist of the core and the following sensor array: a temperature sensor, a humidity sensor, a pressure sensor, and an inertial movement sensor. The gateway/edge nodes 106 will be selected from white labeled commercially available IoT gateways.
The hardware will be designed to be tamper proof. An attempted read of the firmware will cause the firmware to be erased. This will be deployed via a RTC tamper alert with a backup coin cell battery and the Atmel ATECC508A and/or ATECC608A. All active core and sensor parts will have registered IDs. Any part without a registered ID will be rejected. This tamper resistance will be implemented via a blockchain structure.
Additionally, the core requirements are as follows: Operating Voltage: 3.3 V, Operating Temperature: −20° C. to 65° C., Operating Humidity: 0% RH to 100% RH, Operating Pressure: 300 hPa to 1100 hPa, Power Consumption: ≤5 mA normal operation. The sensor array requirements are as follows: Operating Voltage: 3.3 V, Interface: Analog or digital (12C, SPI, or USART), Operating Temperature: −20° C. to 65° C., Operating Humidity: 0% RH to 100% RH, Operating Pressure: 300 hPa to 1100 hPa, Power Consumption: ≤0.5 mA normal operation.
The passive support components requirement is as follows: Operating Temperature: −20° C. to 65° C., Operating Humidity: 0% RH to 100% RH, Operating Pressure: 300 hPa to 1100 hPa. The environmental conditions are as follows: Operating Temperature: −20° C. to 65° C., Operating Humidity: 0% RH to 100% RH, Operating Pressure: 300 hPa to 1100 hPa. The service requirements are as follows: Users will be able to replace/recharge the battery, replace the antenna and everything else will be performed via field service or RMAs.
The firmware requirements for the sensor network nodes are modeled in two sections: the main state machine including synchronous interrupts and the asynchronous interrupts.
Otherwise the node 16 will set its wake up timers then enter a low power sleep mode 214. The sensor read timer 216 is used to sample the data from the sensors 16 and the wake up timer 218 is used to send the data sampled from the sensors to the gateway/edge node 106. The wake up timer 218 will be a multiple of the sensor read timer 216. This allows for more energy efficient operation.
Once the sensor read timer has elapsed 220, with nominal wake up, the node 16 will read from the sensors 60 in the sensor array 58 and store the values into a buffer 222. If there were any errors from reading the sensors 60, those will be handled as well 224. When these steps are complete, the node 16 will reset its sensor read timer 214 and return to a full sleep, starting the process over.
Once the wake up timer has elapsed 226 (which is a multiple of the sensor read timer of lower priority, indicating that when both timers elapse, the sensor read timer process will run first), the node 16 will completely wake itself up 228 and establish communication with the network 230. If there are errors in establishing communication with the network, those will be handled 232.
After this step, the node 16 will check if a location update is required 234. There are two forms of location updates 236, IMU and network. An IMU update will be triggered by the movement interrupt state machine 238, as described herein. If an IMU location update is to be performed, the node 16 will package all of the data from the IMU to be transmitted back to the gateway/edge node 106 later. If a network location update is to be performed, which will be a command issued over the WPAN 112 from the gateway/edge node 106, the node 16 will perform network ranging with its peers in the WPAN 112 and package the data to be transmitted back to the gateway/edge node 106 later.
The next step in the sequence, is for the node 16 to read or acquire from its diagnostic sensors 240 (network status from the wireless radio and battery status from the fuel gauge and battery charger) and package the data acquired. The node 16 will then read, process, and package the data stored from the sensor read timer routine 242. Based off the configuration on the node 16, the node 16 will then look at the packaged data to see if an alert condition has been determined 244.
An example of an alert condition could be a temperature value that is too high or a low battery. If there is an alert condition, the user output will be updated 246; otherwise the user output will be reset 248. Once these steps have been performed, the node 16 will transmit all of the packaged data 250 over the WPAN 112 to the gateway/edge node 106, and any errors will be resolved 251.
Finally, the node 16 will check for an over the air or OTA update 252. This will be issued from the gateway/edge node 106. If the OTA update was only for a new configuration, the node 16 will perform the update 254, reset its timers, and go back to sleep 214, starting the process over again. If the OTA was a firmware update 256, the node will perform the firmware update and reset itself back to network initialization 210. If there were any errors, those will be resolved 258.
Upon the interrupt firing, the node 16 will read the data from the IMU and store it to a buffer 304. Then will node 16 will check to see if the interrupt is still valid 306, if the node 16 is still being moved. If the interrupt is no longer valid, the node 16 will set an IMU location update 308 that will be handled by the main state machine 200, as described above, and exit 310.
If the interrupt is still valid, the node will set a timer 312 that will be used to trigger the next read of the data from the IMU 314, when the timer elapses 316, thus starting the process over again. All of this will be done while the node 16 is still sleeping.
Commissioning, as described herein, is the process of registering, updating configuration, and adding a node 16 to the WPAN 112. If the node 16 has not been commissioned, it will enter the commissioning sequence 408, and then re-initialize itself with the new configuration parameters 404.Commissioning can also be manually initiated as indicated by the commissioning interrupt 410. This will be in the form of a user input that can happen at any time.
Otherwise the network will be initialized 412, and the node 16 will establish communication with the network 414. If there are errors in establishing communication with the network, those will be resolved 416. After communications with the network have been established 414, the node 16 will broadcast its location 418 to assist in network location updates. As described herein and shown in
Next, the application 600 will establish communication with WPAN 608, and establish communication with WAN 610, and resolve any errors with establishing communication with WPAN 612, and resolve any errors with establishing communication with WAN 614, appropriately.
Next, the application 600 will run four continual sub-processes; monitor cloud instructions 616, monitor network status 618, collect sensor data 620 and perform edge processing 622.
The monitoring cloud instructions 616 sub-process will maintain communication with the cloud 108 to listen for instructions. These instructions could include, but are not limited to, pushing OTA updates, updating configurations, requests for data, and updating status.
The monitoring network status 618 sub-process will continually monitor the status of the WPAN 112.
The collect sensor data 620 sub-process will continually orchestrate the process of gathering the data from the WPAN 114.
The perform edge processing 622 sub-process will perform any necessary processing on the data from the WPAN 112 that is not done on the other nodes 16 or the cloud 108. This sub-process will be utilized to lower latency and decrease power usage. Examples of edge processing are performing data center equipment control decisions, communicating with data center equipment, and assisting with real time calculations.
The rack modules 68, are made up of a housing 802, and will be attached to the rack with magnets 804, although other attachment methods can be used. The rack modules 68 also contain an antenna 806, which can be internal or external, and energy harvesting functionality 808, as described herein. The housing 802 contains perforations 810 for ambient condition measurements, and a flexible cable 812. Temperature sensors 814 are used to determine the temperature, and each rack module 68 contains inputs 816 such as buttons, and outputs 818, such as LEDs.
In the preferred embodiment, each rack module 68 will be capable of measuring temperatures at three different heights (¼, ½ and ¾ of the rack height), humidity at a single height, barometric pressure at a single height, and identifying its location.
The sensor network nodes 68 must be able to withstand a standard drop tests from 12 feet, withstand 400 lbs. of pressure, with >2,000 on/off cycles. The nodes 68 will be made out of molded plastic, rubber cable sheathings and magnets, with a smooth texture. The color will be orange grey and black, and they will need to have mounts for the main PCB, the user input/output PCB, the antenna, the energy harvesting mechanisms, and the flexible cable. Holes or perforations will need to be made to attach the antenna, expose the user input/output, mount the flexible cable, and let ambient conditions into the unit. Magnets will have to be affixed to the module. The unit should be less than 3 inches long, 2 inches wide and 1 inch deep, except that the harvesting mechanism may extend past the stated dimension by 1 inch.
In the preferred embodiment, each plenum or subfloor module 70 will be capable of measuring temperature, barometric pressure, and identifying its location.
Similarly, the plenum nodes 70 must be able to withstand a standard drop tests from 12 feet, withstand 400 lbs. of pressure, with >2,000 on/off cycles. The plenum nodes 70 will be made out of molded plastic, rubber cable sheathings and magnets, with a smooth texture. The color will be orange grey and black, and they will need to have mounts for the main PCB, the user input/output PCB, the antenna, the energy harvesting mechanisms, and the flexible cable. Holes or perforations will need to be made to attach the antenna, expose the user input/output, mount the flexible cable, and let ambient conditions into the unit. Magnets will have to be affixed to the module. The unit should be less than 3 inches long, 3 inches wide and 1 inch deep, except that the harvesting mechanism may extend past the stated dimension by 1 inch.
The inline module 74 will replace each server's standard power cord with an inline power meter module 74. The inline module 74 will also have a housing 830, and a power outlet plug 832. An antenna 806, user input 816 and output 818, along with a power supply plug 834. Each inline power module 74 will be capable of measuring server current, server voltage, and identifying its location.
The clamp on module 76 will attach to any power supply cable in the data center between 120 and 480 volts. The clamp on module 76 will also have a housing 840, but no power outlet plug 832 or power supply plug 834. Instead, the clamp on module 76 will use a split core CT 842 and a flexible cable 844 to attach to the device, along with an antenna 806, and user input 816 and output 818. Each clamp on power module 76 will be capable of measuring server current and identifying its location.
Similar to the units described above, the in-line power node 74 will be made out of molded plastic, rubber 120 V cable, power output plug, power supply plug, with a smooth texture. The color will be orange grey and black, and they will need to have mounts for the main PCB, the user input/output PCB, the antenna, the energy harvesting mechanisms, and the flexible cable. Holes or perforations will need to be made to attach the antenna, expose the user input/output, mount the flexible cable, and let ambient conditions into the unit needs to be in line with a server power supply cable. The unit should be less than 3 inches long, 2 inches wide and 1 inch deep, except that the harvesting mechanism may extend past the stated dimension by 1 inch.
While the clamp-on power node 76 will be made out of molded plastic, rubber 120 V cable, power output plug, power supply plug, with a smooth texture. The color will be orange grey and black, and they will need to have mounts for the main PCB, the user input/output PCB, the antenna, the energy harvesting mechanisms, and the flexible cable. Holes or perforations will need to be made to attach the antenna, expose the user input/output, mount the flexible cable, and let ambient conditions into the unit. A split core CT will have to be attached to the device. The unit should be less than 3 inches long, 2 inches wide and 1 inch deep, except that the harvesting mechanism may extend past the stated dimension by 1 inch.
The presentation layer 702 is responsible for generating HTML and JavaScript code that is to be delivered to the user interface 110 (e.g., modern web browser). In the preferred embodiment, the use of browser plugins will be avoided due to security issues. The core libraries, frameworks, and technologies that will be used in the presentation layer 702 are, for example, HTML5, CSS3, JavaScript, HTML Canvas, Node.js, React.js, WebPack, WebGL, three.js, and D3.js.
The business logic layer 704 holds all the formulas and proprietary technology. The business logic layer 704 is also responsible for communicating with the services 714, presentation layer 702, persistence layer 706, and in some cases the gateway/edge node 106. As an example, it may be more efficient to do some calculations on the collected data and then store it in the database 712. The business logic layer 704 can perform such calculations before the data is stored in the database 712. The business logic layer 704 is also responsible for mapping the data transfer objects from the persistence layer 706 to the presentation layer 702. This mapping avoids sending unnecessary information to the portal and keeps the html/JavaScript objects and payload small. The core libraries, frameworks, and technologies that will be used in the business logic layer 704 are Java, Python, STAN, Jetty, Spring JDBC, Rest, and Maven.
The persistence layer 706 is responsible for converting language specific code to SQL. This layer 706 is also responsible for mapping one object to one or more tables in the database 712. The opposite is also true, this layer 706 is able to combine a few tables into one object for the client data (in this case the services 714 or business logic layer 704). Although some of the SQL code may be generated dynamically at run time, most of the SQL code is kept inside the SQL repository 708. This repository 708 can be used in future if the main programming language of portal is changed. The core libraries, frameworks, and technologies that will be used in the persistence layer 706 are Java, Jetty, Spring JDBC, Rest, and Maven.
The SQL repository 708 is a subset of the persistence layer 706 that contains SQL code for the services 714. Some SQL may need to be generated dynamically but the most common SQL scripts will be stored in the SQL repository 708. The SQL repository 708 will be able to handle multiple programming languages.
The constants engine 710 is a subset of the persistence layer 706 that contains constants used in static equations. Examples of constants include converting from temperature RTD values to degrees Fahrenheit, triangulation and trilateration constants, and unit conversions.
The database 712 will store all of the data generated from the sensor network nodes 102, 116, 118, known location nodes 104, gateway/edge nodes 106, interface 110, and user actions. In the preferred embodiment, the database 712 is PostgreSQL.
The services layer 714 is responsible for offering a series of REST services to a client. A client can be third party service, sensor, gateway, or the interface. Security is an important factor when building the services layer 714. This layer 714 should be very selective to deny any client that is not trusted. A certificate based security model will be used for this communication. This layer 714 will use the business logic layer 704 to store some information into the database 712. This layer 714 can also use the information in the database 712 to compute some information for the end client.
As described herein, the gateway/edge node 106 will feed data from the sensor network nodes 102, 116, 118 and known location nodes 104 to the database 712 and business logic layer 704. The data will be sent through a VPN IPsec tunnel to the cloud 108.
As described herein, the interface 110 provides the visual experience for the user. It will be delivered through a modern web browser that supports HTML5, CSS3, and WebGL. The interface will consist of a series of dashboards, data visualizations, analytics, and conversations.
Additionally, in the preferred embodiment, the technologies used for security directly on the cloud 108 are OpenLDAP, Apache Shiro, and 256 bit file system/container encryption. Communication between the cloud 108 and gateway/edge nodes 106 will be secured through an IPsec VPN tunnel. Communication between the cloud 108 and interface 110 will be secured via https. Authentication and authorization will be used to access the cloud 108 and interface 110, as well as the features and components of the features.
The cloud application is modeled in five sub-processes. The gateway/edge node data sub-process is responsible for connecting the gateway/edge node 106 and retrieving data. The gateway/edge node 106 will provide a REST service that the cloud application can use to accomplish this. Once the data has been retrieved, the business logic layer 704 and persistence layer 706 will be used to process and store the data in the database.
The gateway/edge node instructions sub-process is responsible for relaying and receiving instructions from the gateway/edge node 106 and any associated required data. These instructions could include, but are not limited to, pushing OTA updates, updating configurations, requests for data, and updating a status.
The interface host and requests sub-process is responsible for serving the interface 110 and processing and/or providing requests to the interface 110.
The service host and requests sub-process is responsible for serving the services 714 responding to requests.
The monitoring and logging sub-process monitoring the cloud 108, cloud application, interface 110, and user actions. The outputs are processed and stored in the database 712 and will be used to identify internal quality issues, identify how users use the interface 110, and provide quantitative data for AB testing.
The interface 110 is divided up into layout and features. The layout depicts the functional layout for the interface window and the widgets. The window is the main layout for the interface 110 and will be accessible through a web browser. There are two main layout features in the window, the feature container and the widget container.
The feature container displays the icons for the different features supported by the interface 110 and an ability to navigate through the different features. The widget container displays the different widgets for the selected feature and an ability to navigate through the different features. The widget layout describes the default minimum layout for any widget. This includes the widget content, a way to reposition the widget in the widget container, and a way to access the widget settings.
The features supported in the interface include dashboards; data center selection; data visualization; data center views; alerts, events and exceptions; trends; CFD modeling; auditing; planning; and workflow and conversations. Additionally, there are universal features, common to most systems, including data browser; export; content, insights, action; conversation; machine learning; and help, as understood by one having ordinary skill in the art.
Customizable dashboards can be created by using widgets from any of the features described herein. Default dashboards can be created to show the status of the data center, performance of the data center, suggested insights and actions to improve the performance, alerts, events, and exceptions. If multiple data centers are to be used in the interface 110, it will be possible to select between them, or combinations of them. This will be done by visually presenting the data centers on a geographic map and displaying a snapshot of the default dashboards for each data center.
Different combinations of data can be selected, including multiple data sets, to be visualized. Aggregation of data can be selected, including selecting multiple sets of data to be visualized as one set of data. As an example, a user or operator can select all temperatures to be combined in a statistical manor and then visualized. Transformations of data can be selected, such as applying an equation to a combination of data sets to be visualized. As an example, the user can add two power data sets to visualize the sum.
Many different charts and types of charts can be used to visualize the data. Examples include table, line, control, bar or pie chart. Also, the environmental reporting system 100 can plot out the data in histograms, scatter plots, violin plots or contour lines, among others. The environmental reporting system 100 can show a stratification, or a visualization showing the data set differentials at different heights inside of the data center. Also, custom data presentation views will utilize data visualization with prepackaged views. Examples of this are visuals presentations of temperature differentials, cooling unit utilizations, and supply and return temperatures.
The user can access different data sets. For example, the user can select the date range, use time and date values to select the date range, or use conditional statements to select the data range, to visualize the data. As an example the user can choose to only view the data sets when a single data set has a value over 80. Further, the user can select the frequency to plot the data visualization, which can be done by averaging the data, taking the minimum, taking the maximum, or representing all three.
The data view can be expanded when viewed. Data tick will be available to see exact values and timestamps. And, when aggregate data is being viewed, it will be possible to select the individual data set with in the aggregate. An example of this is selecting the maximum value or group of values in a violin plot.
Global controls can also be applied to a data visualization, such as normalization or even using a secondary axis to view data of different scales.
The data center view provides automated drawing and rendering of the data center in a three-dimensional view. This will use the location data from the nodes 16. Rules can be applied to fit the constraints of data when drawing and rendering. It will be possible to modify the automated drawing and rendering to correct any of the errors from automation. It will also be possible to navigate through the three-dimensional view, which can be done through panning, zooming, and rotating. All of these will be implemented in an intuitive way.
Current sensor reading values can be overlaid on the data center with the addition of sparklines. Filters can be used to select which type of node or sensor to display. Filters can also be used to select which areas to display. Current sensor reading conditional filters can be used to select which sensors to display. An example would be only displaying all temperature values over 80.
Alerts, event, and exceptions; auditing reports; CFD visualizations and panning scenarios can be overlaid on the data center.
Alerts are individual data points for groups of data points that violate a rule. Events are groups or patterns of alerts that are statistically similar. Exceptions are trends in data sets that can indicate the potential triggering of an alert or event. The environmental reporting system 100 will provide the ability to view alerts, exceptions, and events, and managed each of them. Alerts, events, and exceptions can also be overlaid on the data center view and data visualization features.
The trends feature can be used to identify trends in a single data set or amongst multiple data sets. Methods that will be employed are multivariate regression, pattern recognition, and machine learning, among others. Regression and statistical modeling will be used to discover relationships in the data and data center operations. Models with these relationships will be used to benchmark and track various parameters. PUE and power analysis and forecasting will be used to show how power is being distributed and utilized in the data center.
A CFD modeling feature will provide the ability to model the air flow and temperature gradients: (see https://en.wikipedia.org/wiki/Computational_fluid_dynamics). The output of the CFD modeling will indicate how the air moves through the data center along with the temperature gradients as the air moves. This will be shown in an animation that will be overlaid onto the data center view. It will be possible to perform “what if” analysis by reconfiguring the equipment in the data center. The output of this what if analysis will be another CFD animation and estimated performance of the data center.
An auditing feature will provide historical audit logs for SLA compliance (to manage the SLA's for the data center and report against them), data center performance (historical data center performance reports will be available and annotatable), data center occupancy logs (occupancy sensors on the nodes will be used to measure occupancy at the highest level possible). The user can also correlate occupancy with alerts, events, and exceptions. This will be available in a report. Additionally, interface usage logs will keep track of user access of the interface, features, and feature components, as that access will be logged and presented in a report.
Capacity planning will be available utilizing the CFD modeling. This will be an extension of the what if analysis that involves adding new equipment to the data center. Also, workflow and conversations will provide the ability to create tasks for the data center and manage them in a custom workflow. Computer generated task can be generated as well. Conversations can also be held around the tasks and workflow.
As for the data browser, when selecting data to be view, the selection will be presented in a location based view. This will manifest itself in the form of a slimmed down version of the data center view, and will make it possible to select the correct data to view without having to look up a nodes identifier based on its location.
Additionally, all data and reports will have the ability to be exported as a PDF, CFD, or raw data base dump, and any content that the interface is displaying will have corresponding insights and actions as applicable. The insights functionality can be used to identify, report and forecast a number of different environmental and other issues. For example, hot and cold spot identification, temperature condition reporting, air flow condition reporting, humidity condition reporting, alarming and alarm forecasting, fault detection and fault forecasting, opportunity and efficiency identification, efficiency and savings reporting and forecasting, and 3D representation of the sensors in space, among others.
Specific examples include line plots for a temperature sensor that has an alert condition, an automated analysis of the alert condition examining the cause of the alert, and a recommended action to correct the alert. Many other examples exist.
All data and features will have the ability to comment on them. Comments will be tied the data and visible with the data or in the workflow and conversations feature. Comments may also be automatically generated through the machine learning feature.
The interface will further utilize machine learning to identify changes it should make to itself or the network. The changes will be presented to the user for approval. For example, the sampling frequency of a node is too low to adequately perform a what if analysis; and the user is presented with the situation and a recommend change to the sampling frequency. The user can then approve or deny this change. The machine learning algorithm will also be able to surface relevant features and data sets contextually how the user uses them.
Every feature will have the ability to provide help to the user or operator. The help feature provided will be contextual based off of what the user is doing. It will also be possible for the interface to contact personnel, provide them with the context, and allow them to assist the user.
The rack modules 68 are spaced throughout the data center 80 to get an accurate representation of the data center 80 environmental conditions above the subfloor or plenum. Rack module 68 typical spacing could be on the front and back of every three racks or every group of racks. The plenum or subfloor modules 70 are spaced throughout the subfloor or plenum of the data center 80 to get an accurate representation of the environmental conditions of the subfloor or plenum. Ideally the plenum or subfloor modules 70 would be underneath the perforated tiles closest to the rack modules 68. The inline power modules 74 are to be installed on the primary power supply of every server. Additional installations of the clamp on power modules 76 are shown on a power distribution rail and a computer room cooling unit.
As described in detail herein, the software that works with these devices manages the network that the devices communicate on, collects the data, and analyzes the data to create information and insights about the data center environment and data center power consumption. The software will perform certain functions, as detailed herein. Additionally, there will be software configuration.
Also supported is Augmented Reality (AR) representations of the installed sensors. This will enable the user to walk around and see analysis real time overlaid on top of real physical objects while wearing an AR device (for example, Microsoft HoloLens, Magic Leap, or any other AR devices). In such a data center scenario, a user wearing an AR device could see where every sensor was located as they looked around, relevant metrics related to the sensors (temperature, pressure, humidity, voltage, current, etc.) and could view real-time analysis of the environment, such as heat flow and air flow representations, etc.
Further, the system could provide AR What If Analysis. AR capabilities allow the user to perform and experience simulations in the same physical real-world space that the sensors occupy. For example, the user could look at the space, virtually change the pressures at different points, and look inside the space to see how temperatures and pressures change as a result of the proposed pressure changes.
Referring now to
The method 1400 can include installing the sensor module 1402 in a rack of the data center, as in step 1410, the sensor module 1402 powering on, as in step 1412, the sensor module 1402 beginning a location sensing procedure, as in step 1414, and the sensor module 1402 sending results of the location sensing procedure to the gateway 1406, as in step 1416. The location sensing procedure in step 1414 is described in greater detail with reference to
The method 1400 can also include installing the one or more anchors 1404 in known locations in the data center, as in step 1420. As described above, the one or more anchors 1404 can be installed at or near at least three corners of the data center. The one or more anchors 1404 can be installed prior to or contemporaneously with the installation of the sensor module 1402 (step 1410).
The method 1400 can also include installing the gateway 1406 within the data center, as in step 1430, configuring and launching a gateway application, as in step 1432, initializing communication between the gateway 1406 and the cloud server 1408, as in step 1434, and the gateway 1406 sending location data received from the sensor module 1402 to the cloud server 1408, as in step 1436.
Further still, the method 1400 can include the cloud server 1408 and the cloud software drawing a map of the data center, as in step 1440. According to an exemplary embodiment, the cloud software can draw the map of the data center using at least the location data received from the gateway 1406 and measured by the sensor module 1402. The map drawing procedure in step 1440 is described in greater detail with reference to
Referring to
The method 1500 can include the sensor module 1402, which has been installed in the rack of the data center (
The method 1500 can further include the sensor module 1402 recording distances from the sensor module 1402 to the one or more anchors 1404 based on the time of flight data, as in step 1512, and the sensor module 1402 sending the distances to the gateway 1406, as in step 1514.
The method 1500 can also include the gateway 1406, which can be installed within the data center (
Further still, the method 1500 can include the cloud server 1408 and the cloud software receiving the distances from the gateway 1406, as in step 1540, and the cloud server 1408 using trilateration, triangulation, and multilateration to calculate X, Y, and Z coordinates for the sensor module 1402, as in step 1542. According to an exemplary embodiment, the results of step 1542 can be used to draw the map of the data center. While
Referring now to
The method 1600 can further include the processor grouping sensor modules according to the orientation data, as in 1606. In some embodiments, the processor can group all sensor modules in the data center that face essentially the same direction. For example, the processor can create a first group comprising all sensor modules oriented towards north and a second group comprising all sensor modules not oriented toward north. As another example, the processor can create a first group including all sensor modules oriented towards north, a second group including all sensor modules oriented toward south, a third group including all sensor modules oriented towards east, and a fourth group including all sensor modules oriented towards west. As yet another example, the processor can create a first group comprising all sensors facing north or south and a second group comprising all sensors facing east or west. After grouping all sensor modules in the data center, the method 1600 can include the processor getting a first group of modules, as in step 1608.
After getting the first group in step 1608, the method 1600 can include the processor generating a theoretical line through a first sensor module of the group of sensor modules at an angle perpendicular to an orientation angle of the first sensor module, as in step 1610. For example, if the first sensor module has an orientation angle of “north”, the theoretical line generated by the processor in step 1610 can extend from east to west. Referring now to
Referring to both
Moreover, the method 1600 can further include the processor determining whether any plenum modules are between a sensor module under consideration (e.g. sensor module S2) and a previous member determined to be on the theoretical line 1710, as in step 1618. In the example of sensor module S2, the processor has not currently considered a plenum module before selecting sensor module S2, so the processor can determine that there is not a plenum module between S2 and S1, and the processor can continue to method step 1620. If there is a plenum module between a sensor module under consideration and the previous member determined to be on the theoretical line 1710, the method 1600 can include the processor creating a new row of racks (i.e. data center equipment) for a rendered map, as in step 1622.
The method 1600 can further include the processor determining if the calculated perpendicular distance of the sensor module under consideration exceeds a threshold (e.g. 1 meter), as in step 1620. According to an exemplary embodiment, the threshold can be one meter, although other distances are contemplated. In the example of sensor module S2, the processor can determine that the distance D4 is very small (e.g. less than one meter). As such, the processor can continue to method step 1624 and add the sensor module under consideration (e.g. sensor module S2) to the same row of racks as the first sensor (e.g. sensor module S1). If the perpendicular distance between the sensor module under consideration (e.g. sensor module S2) and the theoretical line 1710 exceeds the threshold, the method 1600 can include the processor creating a new row of racks on the rendered map, as in step 1622.
The method 1600 can repeat steps 1610-1622 for all sensor modules in the first group of sensor modules, and the method 1600 can include the processor determining if any modules remain in the sorted list of sensor modules created in step 1614, as in step 1626. If any modules remain, the method 1600 can return to step 1616 and consider a new module not previously considered. Returning to the example shown in
However, if the processor has considered all sensor modules in the group, the method 1600 can include the processor aligning modules along each determined row of racks using the sensor location data received in step 1604, as in step 1628. The processor can add cabinets to each row of racks created through steps 1610-1622. The processor can also draw data center cabinets between modules in the same row of racks, as in 1630, and eventually render, store, and display the map created through steps 1602-1628, as in step 1632. According to an exemplary embodiment, rendering the map can include drawing the determined rows of racks and also representing the sensor modules in the rows of racks based on the sensor location data. Furthermore, in some embodiments, the method 1600 can apply domain knowledge about data centers to increase map accuracy. For example, domain knowledge includes assumptions about the size of racks in data centers, which are typically standardized or commonly sized, typical distance between racks, and general alignment of racks. Furthermore, using domain knowledge the processor can determine an end of a rack by determining that two sensors along a same row of racks are laterally spaced apart more than a predetermined distance (e.g. 3 meters) because sensor modules can be placed equidistant from each other in a rack.
As described above, using sensor locations and domain knowledge, a processor can automatically render a map of a data center. However, data centers frequently “move”, in that racks may change locations or orientations or equipment is swapped out for other equipment. As such, the map of the data center must be updated frequently anytime the data center “moves”. Referring now to
As shown, the method 1800 include a sensor module (such as a sensor module 1402) detecting a change in the environment of the data center suggesting that the data center is changing in configuration, as in 1802. The sensor node may be equipped with an accelerometer to detect vibration, but the sensor module may detect movement using a combination of vibration, changes in temperature, changes in humidity, and changes in pressure to detect changes in the data center configuration.
After detecting movement, the sensor module can determine if it has come to rest by determining if it has been stationary for a predetermined period of time, as in step 1804. Once the sensor module comes to rest, the sensor module can determine its new location by gathering time of flight data and communicating with one or more anchors (e.g. the anchors 1404), as in step 1806. Step 1806 can substantially correspond with the method 1500 illustrated with reference to
As noted above, sensor modules can move within a data center when the data center changes configuration. In some situations, movement of a sensor module can result in a loss of communication with one or more of the anchors. Despite losing communication with the one or more anchors, the sensor module can still detect its location as long as the sensor module can communicate with three other sensor modules that have been located (e.g. by communicating with the one or more anchors). Referring now to
As shown, the method 1900 can begin by a sensor module (such as a sensor module 1402) attempting to communicate with one or more anchor modules (e.g. the one or more anchor modules 1404), as in step 1902, and the sensor module determining whether it can communicate with three anchor modules, as in step 1904. If the sensor module cannot communicate with three anchor modules, the method 1900 can include the sensor module establishing communication with other sensor modules, as in step 1906, and the sensor module determining its location by gathering time of flight data with the other sensor modules or a combination or other sensor modules and one or two anchors, as in step 1908. The other sensor modules can provide their locations to the sensor module, and the sensor module can send the time of flight data and the names or known locations of the other sensor modules to a gateway, which relays the data to the cloud server.
Alternatively, if the sensor module determines that it can communicate with three anchors in step 1904, the sensor module can determine its location by calculating time of flight data by communicating with three anchor modules, as in step 1910 (see
In the example of
In some examples, the module diagnostic circuitry 2011 may be configured to perform internal diagnostics on the sensor module 2002, such as by, for example, reading and/or responding to status updates and/or issues of various subsystems. In some examples, the module communication circuitry 2010 may be configured for communication via an ultra-wide band (UWB) protocol, a short wavelength ultra-high frequency protocol (commonly referred to as Bluetooth), a cellular and/or IEEE 802.11 standard (commonly referred to as WiFi) protocol, a transmission control protocol (TCP), an internet protocol (IP), an Ethernet protocol, an NFC protocol, and/or an RFID protocol. In some examples, the module processing circuitry 2008 may include one or more processors. In some examples, the module memory circuitry 2006 may store machine readable instructions configured for execution by the module processing circuitry 2008.
In some examples, the module sensors 2012 may comprise one or more temperature sensors, humidity sensors, pressure sensors, light sensors, and/or vibration sensors. In some examples, one or more of the vibration sensors may be implemented via one or more inertial measurement units (IMUs). In some examples, an inertial measurement unit may comprise one or more multi-axis (e.g., 3 axis) accelerometers, gyroscopes, and/or magnetometers. In the example of
In some examples, the module sensors 2012 may include some or all of the sensor array 66. In some examples, the module sensors 2012 may include driving circuitry. In some examples, circuitry for driving the module sensors 2012 may be included as part of the module processing circuitry 2008. In some examples, the module housing 2004 may include holes, apertures, and/or perforations to facilitate measurement of ambient conditions by the module sensors 2012.
In some examples, some or all of the module sensors 2012 may operate according to one or more sensor settings. In some examples, the sensor settings may be stored in the module memory circuitry 2006. In some examples, the sensor settings may include enable/disable settings, a sample rate, a maximum frequency rate, a maximum measurement range, an operating mode, a power mode, a performance mode, and/or a bandwidth. In some examples, the module memory circuitry 2006 may also store one or more thresholds (and/or other corollary data) that define a “normal” range of measurement values. For example, there may be an upper threshold, a lower threshold, and some indicator specifying that a “normal” (and/or expected) measurement value of a particular sensor would be between the upper and lower thresholds, above the upper threshold, or below the lower threshold. In some examples, the module memory circuitry 2006 may additionally store one or more sensor signatures that are defined as being “abnormal.”
In some examples, the “abnormal” signatures and/or “normal” range of measurement values may be determined empirically, through one or more experimental testing procedures. In some examples, a sensor module 2002 (and/or one or more of its module sensors 2012) may be able to make minor alterations to what is considered “normal” to account for different locales and/or non-experimental conditions. In some examples, the sensor settings and/or threshold data may be different for each sensor of the module sensors 2012. In some examples, the sensor settings and/or threshold data may be the same for all sensors of the module sensors 2012. In some examples, some sensors of the module sensors 2012 may have the same sensor settings and/or threshold data, while others have different settings and/or threshold data.
In the examples of
In the example of
In the example of
While three sensor strands 2020 are shown in the example of
In some examples, the sensor module 2002 may be configured to determine, store (e.g., in module memory circuitry 2006), and/or communicate one or more (e.g., relative) locations corresponding to each sensor strand 2020 and/or strand sensor 2026 connected to the sensor module 2002. For example, a sensor module may associate a particular (e.g., relative) location (e.g., plenum, rear of server rack, front of server rack, etc.) with one or more of the module ports 2018 configured for connection to a sensor strand 2020. In such an example, the sensor module 2002 would know that data received via the module port(s) 2018 should be associated with that location. As another examples, each sensor strand 2020 and/or strand sensor 2026 may output identification information along with its sensor measurement(s), and the sensor module 2002 may associate a particular location with particular identification information. Thereby, after the sensor module 2002 determines its own location (e.g., via the method 1500, method 1600, and/or method 1900), it can determine the location of each sensor strand 2020 and/or strand sensor 2026.
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In some examples, the cloud memory circuitry 2408 may implement the SQL repository 708 and/or database 712 shown as part of the cloud 108 of
In the example of
In the example of
In some examples, one or more of the nodes 16 of the sensor system 2000 may be implemented via one or more sensor modules 2002 and/or sensor strands 2020, such as discussed above. In the example of
In some examples, the cooling equipment 124 and/or power equipment 122 may be internal and/or external to the data center 2300 itself. For example, cooling equipment 124 within the data center 2300 may include CRAC units 2124, fans 2126, and/or other cooling equipment 124 within the data center 2300. Power equipment 122 within the data center 2300 may include circuit breakers, power supplies, power cords, and/or other power equipment. Cooling equipment 124 external to the data center may include, for example, chillers, chiller motors, water pumps, cooling tower fans, cooling tower motors, and/or other cooling equipment 124. Power equipment external to the data center 2300 may include, for example, power transformers and/or power lines external to the data center 2300.
In the example of
In the example of
In the example of
In the example of
In some examples, the cloud computing system 2402 may be used to keep track of pertinent activities and/or events involving server racks 2100 in the data center 2300. This may assist in keeping track of and/or managing valuable data center assets. This may also help with fulfilling certain tracking and/or logging obligations for server tenants.
In some examples, the cloud computing system 2402 may execute one or more instances of the server rack event procedure 2500 in order to keep track of the events involving server racks 2100 in the data center 2300. In some examples, the server rack event procedure 2500 may comprise machine readable instructions configured for execution by the cloud processing circuitry 2404. While presented as being stored in cloud memory circuitry 2408 of the cloud computing system 2402, and executed by the cloud processing circuitry 2404, in some examples, portions of the server rack event procedure 2500 may be performed by other components and/or systems outside of the cloud computing system 2402.
In the example of
In the example of
In the example of
In some examples, a server rack parameter signature may comprise one or more measurements and/or output signal patterns (e.g., of the sensor system 2000) of one or more server rack parameters that occur over a measurement time period. For example, an output signal of a temperature sensor, humidity sensor, pressure sensor, light sensor, vibration sensor (e.g., IMU), and/or power sensor of the sensor system 2000 over a measurement time period may constitute a server rack parameter signature if it pertains to a server rack 2100. In some examples, a server rack parameter signature may pertain to more than one server rack parameter. For example, a sensor module 2002 mounted to a server rack 2100 may provide a server rack parameter signature comprising an output signal of its IMU and its temperature sensor over a given time period. In some examples, the temperature and IMU output signals may be considered a single parameter signal. In some examples, a power parameter signature may be obtained from the power system 2420 rather than the sensor system 2000.
In the example of
In some examples, the server rack event procedure 2500 may keep track of recent server rack events (e.g., via the events 2412 stored in cloud memory circuitry 2408). In some examples, the server rack event procedure 2500 may only compare the recently measured parameter signatures with certain known/stored parameter signatures based on the recent server rack events. For example, the server rack event procedure 2500 may only compare the measured parameter signature(s) with parameter signatures associated with a server installation event if a door open event has recently been detected (since the installation event requires the door open event to occur first).
In the example of
In some examples, the server rack event procedure 2500 may be unable to find a known parameter signature 2410 that is close enough to a measured parameter signature to qualify as a match. Yet, because of block 2504, the server rack event procedure 2500 knows that something has occurred, even if just what has occurred is unclear. In such an example, the server rack event procedure 2500 may categorize the event in a catchall event category, such as “abnormal operation.” In some examples, the server rack event procedure 2500 may also determine a severity of the abnormal operation. Such a severity may be based, for example, on an extent to which the measured parameter signature(s) differ from the normal/expected measurements and/or known parameter signatures.
In the example of
In some examples, the server rack event procedure 2500 may determine that all the server racks 2100 have been impacted. For example, the server rack event procedure 2500 may determine that there are multiple measured parameter signatures for multiple server racks 2100 that correspond to the same stored parameter signatures 2410 and/or events 2412. In some examples, this may occur where, for example, the event is of a regional nature, such as a seismic event, weather event, natural disaster, large scale disturbance, etc. In some examples, the server rack event procedure 2500 may analyze parameter signatures across multiple measurement time periods to determine one larger event, such as, for example, an event of a regional nature.
In the example of
In some examples, the action may involve the work order system 2418. For example, the server rack event procedure 2500 may communicate with the work order system 2418 to determine whether there exists one or more work orders that correspond to of the determined server rack event(s). For example, the server rack event procedure 2500 may determine that a work order indicating that a server installation and/or removal is planned for a certain server rack 2100 at a certain date/time corresponds to detected a server installation and/or removal event for that server rack 2100 (e.g., if the dates/times are close). As another example, the server rack event procedure 2500 may determine that a work order indicating maintenance planned for a known faulty fan or drive corresponds to a determined faulty fan and/or drive event. If there does exist one or more work orders that correspond to the determined server rack event(s), the server rack event procedure 2500 may indicate (e.g., via the work order system 2418 and/or some notification/message) that a work order was found corresponding to the detected event and/or that the work order is in progress. If no work order exists, then the server rack event procedure 2500 may generate a new work order and/or generate a notification (e.g., such as discussed above) indicating that no work order exists, that a new work order is being generated, and/or giving the details of the server rack event.
In some examples, the action may involve the security system 2416. For example, the server rack event procedure 2500 may communicate with the security system 2416 to access security data corresponding to the time period(s) and/or location(s) of the server rack events (and/or the measured parameter signature(s)). Thereafter, the server rack event procedure 2500 may store the security data in cloud memory circuitry 2408 and/or associate the security data with the other data surrounding the server rack event, such as discussed above.
In the example of
In addition to keeping track of events involving server racks 2100, in some examples, the cloud computing system 2402 may also be used to keep track of the health of cooling equipment 124. This is an important task, as proper operation of cooling equipment 124 can be essential to the continued function of a data center 2300. In some examples, the cloud computing system 2402 may execute one or more instances of a cooling equipment health procedure 2600 in order to keep track of the health of the cooling equipment 124 for the data center 2300.
In some examples, the cooling equipment health procedure 2600 may comprise machine readable instructions configured for execution by the cloud processing circuitry 2404. While presented as being stored in cloud memory circuitry 2408 of the cloud computing system 2402, and executed by the cloud processing circuitry 2404, in some examples, portions of the cooling equipment health procedure 2600 may be performed by other components and/or systems outside of the cloud computing system 2402.
In the example of
In the example of
In the example of
In some examples, the cooling equipment health procedure 2600 may determine health based, at least in part, on a comparison of one or more of the parameter signatures from block 2604 with one or more known parameter signatures 2410 stored in cloud memory circuitry 2408. In some examples, the known parameter signatures 2410 may be predetermined and/or prerecorded during an empirical testing process that monitors certain cooling equipment parameters at different points in the lifecycle of different kinds of cooling equipment 124. For example, empirical testing may reveal certain distinctive (e.g., vibration) signatures that occur when a piece of cooling equipment 124 (e.g., a motor or fan of a CRAC unit or chiller) is brand new, healthy, aged but ok, inefficient/faulty, and/or breaking down/near terminal failure.
When one or more unique and/or distinctive patterns of one or more sensor outputs are observed to occur for a given piece of cooling equipment 124 of a given health, those patterns may be stored and/or recorded as known parameter signatures 2410 associated with the cooling equipment 124 and/or health. Thus, if the cooling equipment health procedure 2600 determines that a measured parameter signature matches or is significantly similar to (e.g., within some threshold range of) a known parameter signature 2410 associated with a given health, then the cooling equipment health procedure 2600 may determine that the cooling equipment 124 to which that measured parameter signature pertains may be of the same or a similar health. Similarly, if the cooling equipment health procedure 2600 determines that a measured parameter signature differs significantly from (e.g., is outside the threshold range of) a known parameter signature 2410 associated with a given health (e.g., good health), then the cooling equipment health procedure 2600 may determine that the cooling equipment 124 is not of that health. The cooling equipment health procedure 2600 may additionally determine the current health of the cooling equipment 124 based on just how different the measured parameter signature is from the known parameter signature 2410.
In some examples, the cooling equipment health procedure 2600 may additionally, or alternatively, base the health evaluation of the cooling equipment 124 on whether the cooling equipment 124 is behaving as healthy cooling equipment 124 (and/or cooling equipment 124 of a given health) is expected to behave. For example, the cooling equipment health procedure 2600 may determine whether a cooling output (and/or input/output temperature gradient) of a piece of cooling equipment 124 (e.g., a CRAC unit) is what is expected. In some examples, the cooling equipment procedure 2600 may determine an expected cooling output based on a target cooling output (e.g., determined by the sensor system 2000, user input via UI 110, and/or some other system), the power use of the cooling equipment 124 (e.g., as measured and/or reported by the sensor system 2000 and/or power system 2420), and/or a (e.g., previously determined) health of the cooling equipment 124. As another example, the cooling equipment health procedure 2600 may determine whether an air pressure differential produced by cooling equipment 124 (e.g., a fan) is what is expected, given the power use of the cooling equipment 124 and/or health of the cooling equipment 124.
In some examples, the cooling equipment health procedure 2600 may first determine health based on whether a measured parameter signature corresponding to the cooling equipment 124 matches (and/or is significantly similar to) a known parameter signature 2410 associated with a known health (e.g., excellent, good, moderate, poor, or very poor). In some examples, if a measured parameter signature corresponding to the cooling equipment 124 does not match (and/or is not significantly similar to) a known parameter signature 2410, the cooling equipment health procedure 2600 may instead base the health determination on which known parameter signature 2410 is most similar to the measured parameter signature. In some examples, if a measured parameter signature is outside of a threshold standard deviation from any known parameter signature 2410, the cooling equipment health procedure 2600 may determine the health is inconclusive, and/or generate an error.
In some examples, the cooling equipment health procedure 2600 may (e.g., detrimentally) modify the initially determined health of cooling equipment 124 if sensor measurements indicate that the cooling equipment 124 is not behaving as expected. For example, cooling equipment 124 first determined to be in excellent health may have its health status modified to good, moderate, poor, or very poor if a cooling output of the cooling equipment 124 is not what is expected given the target cooling output, power use, and/or (e.g., first determined) health of the cooling equipment 124. In some examples, the degree to which health status is modified may be based on a degree of difference (e.g., number of standard deviations) between expectation and measurement.
In the example of
In the example of
In the example of
In some examples, the action may involve the work order system. For example, the cooling equipment health procedure 2600 may communicate with the work order system 2418 to determine whether there exists one or more work orders that correspond to the unhealthy cooling equipment 124. For example, the cooling equipment health procedure 2600 may determine that a work order indicating that maintenance or replacement is planned for the cooling equipment 124 corresponds to the unhealthy cooling equipment 124. If there does exist one or more work orders that correspond to the cooling equipment 124, the cooling equipment health procedure 2600 may indicate (e.g., via the work order system 2418 and/or UI 110), that a corresponding work order has been found and/or that the work order is still needed. If no work order exists, then the cooling equipment health procedure 2600 may generate a new work order and/or generate a notification (e.g., such as discussed above) indicating that no work order exists, that a new work order is being generated, and/or giving the details of the unhealthy cooling equipment 124.
In some examples, the action may involve the security system 2416. For example, the cooling equipment health procedure 2600 may communicate with the security system 2416 to access security data corresponding to the measurement time period(s) and/or pertinent location(s). Thereafter, the server rack event procedure 2500 may store the security data in cloud memory circuitry 2408 and/or associate the security data with the other data surrounding the cooling equipment 124, such as discussed above.
In the example of
In some examples, the cloud computing system 2402 may also be used to determine inefficiencies within the data center (e.g., pertaining to the environmental conditions of the data center) and/or recommend corrective action to remedy the inefficiencies. Because of the high cost of operating a data center, the cost savings that come with correcting even small inefficiencies can be significant Likewise, the cost to allowing inefficiencies to fester can be significant.
In some examples, the cloud computing system 2402 may execute one or more instances of a recommendation procedure 2700 in order to determine the inefficiencies and/or recommend corrective actions and/or solutions. In some examples, the corrective actions may be physical corrections, such as, for example, changing how air flow is ducted and/or closing off empty areas of server racks 2100. In some examples, the corrective actions may be more virtual actions that can be implemented via control systems, such as, for example, changing a target temperature output of the cooling equipment 124 and/or changing a fan speed. In some examples, the cloud computing system 2402 may use one or more thermodynamic and/or CFD models to continuously analyze data obtained via the sensor system 2000 in real time to determine inefficiencies and/or make recommendations. In some examples, the ability of sensors to continuously update their location via a local positioning system (e.g., method 1500) and/or relative positioning system (e.g., method 1900) allows such modeling and/or analysis to be done in real time. In contrast, conventional CFD models tend to be snapshots that can quickly become outdated as configurations of the data center 2300 change.
In the example of
In the example of
One simple example of an efficiency indicator is a hot spot indicator. A hot spot may be a location (and/or spot) within the data center 2300 that is particularly warm (or cold). In some examples, the recommendation procedure 2700 may analyze the sensor data obtained at block 2702 to determine one or more hot spots within the data center 2300. In some examples, the recommendation procedure 2700 may determine a hot spot for each server rack 2100, such as, for example, the warmest (and/or coldest) temperature recorded by a sensor proximate to each server rack 2100. In some examples, the recommendation procedure 2700 may determine a hot spot for one or more grouping of server racks 2100. In some examples, the recommendation procedure 2700 may determine a hot spot as being any sensor measurement above (or below) a given temperature threshold (e.g., of the thresholds 2414). In some examples, the temperature threshold may be different depending on the location within the data center 2300. For example, different server racks 2100, aisles, and/or areas may be associated with different temperature thresholds.
In some examples, a hot spot may be indicative of an inefficiency if the temperature at the hot spot is outside some maximum (or minimum) temperature threshold for the data center 2300 as a whole or one or more servers 2110 and/or server racks 2100 in particular. In some examples, particular servers 2110, server racks 2100, groups of servers 2110, groups of server racks 2100, and/or areas within the data center 2300 may have different maximum (or minimum) temperature thresholds, such as, for example, if they contain different hardware, or are for different clients.
Another example of an efficiency indicator is a temperature gradient (or temperature difference) between a front and back of a server rack 2100. In some examples, the recommendation procedure 2700 may analyze the sensor data obtained at block 2702 to determine temperature gradients for one or more server racks 2100, and/or groups of server racks 2100, such as by, for example, analyzing data from rack strands 2020a on the front and back of the server rack(s) 2100. In some examples, different front/back temperature gradients may be determined at different heights on the server rack 2100. In some examples, the recommendation procedure 2700 may expect the front/back server rack 2100 temperature gradient to be within a certain threshold range (e.g., of the thresholds 2414). In some examples, the recommendation procedure 2700 may determine the temperature gradient as being indicative of an inefficiency if the front/back server rack 2100 temperature gradient is outside of this threshold range.
Another example of an efficiency indicator is a vertical temperature gradient between a bottom and top of a server rack 2100 (and/or a floor and ceiling of the data center 2300). In some examples, the vertical temperature gradient may be a measure of a gradient between the plenum 2106 and a top of a server rack 2100 (and/or ceiling of the data center 2300). In some examples, the recommendation procedure 2700 may analyze the sensor data obtained at block 2702 to determine one or more vertical temperature gradients for the data center 2300, such as by, for example, analyzing data from sensor modules 2002, sensor strands 2020, and/or other sensors of the sensor system 2000. In some examples, the recommendation procedure 2700 may expect the vertical temperature gradient to be within a certain threshold range (e.g., of the thresholds 2414). In some examples, the recommendation procedure 2700 may determine the vertical temperature gradient as being indicative of an inefficiency if the vertical temperature gradient is outside of this threshold range.
Another example of an efficiency indicator is utilization of server racks 2100. In some examples, the recommendation procedure 2700 may analyze power data (e.g., from the power system 2420 and/or sensor system 2000) for one or more server racks 2100 to determine whether the server racks 2100 are being utilized efficiently. In some examples, the recommendation procedure 2700 may expect each server 2110 of a server 2110 rack to use an amount of power falling within a server power threshold range (e.g., of the thresholds 2414). Further, the recommendation procedure 2700 may expect each server rack 2100 of the data center 2300 to house servers 2110 that cumulatively use an amount of power falling within a rack power threshold range (e.g., of the thresholds 2414). Power use below that range may indicate utilization that is too low, and which may be better consolidated in other server racks 2100. Power use above the range may indicate utilization that is too high, which may produce an excessive amount of heat for that server rack 2100, such that the servers 2110 may be better served if dispersed. In some examples, the recommendation procedure 2700 may determine the utilization of a server rack 2100 to be indicative of an inefficiency if the cumulative power use of the servers 2110 of a server rack 2100 falls outside of the threshold range.
Another example of an efficiency indicator is airflow velocity. Airflow velocity, as used in this disclosure, refers to a vector comprising an airflow direction and an airflow magnitude. In some examples, the recommendation procedure 2700 may analyze sensor data (e.g., pressure data) at different locations within the data center 2300, and determine airflow velocity (e.g., via pressure differentials). While, in theory, airflow and/or pressure differential could be measured directly, this would require more complex sensor and/or peripheral device installation.
In some examples, the recommendation procedure 2700 may expect air to flow in certain directions (and/or within a certain range of directions) at certain locations within the data center 2300. In some examples, the recommendation procedure 2700 may expect airflow magnitude to fall within certain threshold ranges (e.g., of the thresholds 2414) at certain locations within the data center 2300. For example, the recommendation procedure 2700 may expect air to flow from the cooling equipment 124 through the plenum 2106 and up through the perforated tiles 2108 to the front of the server racks 2100, then through the server racks 2100. Further, the recommendation procedure 2700 may expect the airflow magnitude to be larger near the impelling forces of the fans 2126, and lower elsewhere. In some examples, the recommendation procedure 2700 may determine the airflow velocity to be indicative of an inefficiency if the airflow velocity is significantly different (e.g., outside a threshold deviation) from what is expected.
In the example of
In the example of
In some examples, the recommendation(s) output by the recommendation procedure 2700 may depend upon the inefficiencies (and/or efficiency indicators) determined at blocks 2704 and/or 2706. Thus, the recommendation procedure 2700 may first determine which of the inefficiency indicators indicate there is an inefficiency prior to recommending one or more solutions. While the inefficiency of the data center as a whole may be caused, as a general matter, by failing to fully make sue of a resource to achieve a target result or range of results (e.g., by precisely match cooling supply with cooling demand), an analysis of the specific inefficiency indicators may lead to more concrete and/or discrete solutions. Once one or more of the efficiency indicators are identified as indicating an inefficiency, the recommendation procedure 2700 can recommend one or more solutions to resolve both the limited inefficiencies of the efficiency indicators, and the overall inefficiencies of the data center 2300.
For example, the recommendation procedure 2700 may determine that the front/back temperature gradient indicates an inefficiency because the gradient is negative at some height. A negative front/back temperature gradient means that the temperature at the back of the server rack 2100 is higher than at the front at that height. In some examples, the recommendation procedure 2700 may determine that the likely cause of the negative front/back temperature gradient is one or more servers 2110 that are improperly installed.
In some examples, the fan of a server 2110 that is installed backwards will draw hot air into the server 2110 from the rear of the server rack 2100, and propel the air out the front of the server rack 2100. This in contrast to a correctly installed server 2110, where the fan draws air into the server 2110 at the front of the server rack 2100 (where the cooling equipment 124 provides cooled air) and blows the air out the back. Before being blown out the back, the cool air is moved over the heated components, which cools the components and warms the air. Thus, when servers 2110 are properly installed, the temperature is typically cooler at the front of the server rack 2100, and warmer at the rear of the server rack 2100, creating a positive front/back temperature gradient.
Where the recommendation procedure 2700 instead determines that there is a negative front/back temperature gradient, the recommendation procedure 2700 may determine the likely cause is an improperly installed server 2110. In some examples, the recommendation procedure 2700 may recommend that the server rack 2100 be inspected for servers 2110 installed incorrectly in response to determining that there is likely an improperly installed server 2110. In some examples, in response to determining that there is likely an improperly installed server 2110, the recommendation procedure 2700 may further recommend fixing the installation of any servers 2110 found to be installed incorrectly.
As another example, the recommendation procedure 2700 may determine that the front/back temperature gradient is below the threshold range. There may be several causes for a low front/back temperature gradient. In some examples, there may simply be a low utilization of one or more server racks 2100, where less heat is generated by the servers 2110.
In examples where there is low utilization of one or more server racks 2100, the cold air drawn through the servers 2110 by the server fans will be heated less because the components are not as hot, resulting in a lower temperature gradient. In some examples, the recommendation procedure 2700 may analyze the utilization of server racks 2100 to confirm that low utilization could indeed be the culprit. If there is low utilization, the recommendation procedure 2700 may recommend consolidating processing operations into fewer servers 2110, and/or consolidating operational servers 2110 into fewer server racks 2100. If there is low utilization, the recommendation procedure 2700 may also recommend that the fully utilized server racks 2100 be more closely positioned to one another, and/or modifying the cooling equipment 124 to route more (or all) cool air to server racks 2100 where there are fully utilized servers 2110.
Another potential cause of a low front/back temperature gradient is infiltration, where the cool air provided in the aisle at the front of the server racks 2100 becomes intermixed with the hot air at the back of the sever racks 2100. The intermixing may occur if, for example, there are open spaces in the server racks 2100 (e.g., where there are no servers 2110) that allows air to travel through the server racks 2100. The intermixing may also occur if, for example, there is space around the server racks 2100 (e.g., above, below, on the sides, etc.) where air can travel to intermix. In some examples, the recommendation procedure 2700 may analyze the airflow velocity to confirm that infiltration is the likely cause of the low front/back temperature gradient, as the airflow velocity may show lower magnitudes and/or changes of direction at infiltration points. If the recommendation procedure 2700 determines that infiltration is the likely culprit of the low front/back temperature gradient, the recommendation procedure 2700 may recommend blanking panels be installed within the server racks 2100, and/or containment solutions be implemented around the server racks 2100, to stop the infiltrating air flow.
Another potential cause of low front/back temperature gradient is excessive air flow. In some examples, if the recommendation procedure 2700 cannot confirm that low utilization or infiltration is the likely culprit, the recommendation procedure 2700 may determine that excessive air flow is the cause. Excessive air flow can mean significant amounts of wasted energy. In some examples, the recommendation procedure 2700 may recommend that the air flow output by the cooling equipment 124 (e.g., via fans 2126) be lowered, and/or that the air supply medium (e.g., ducts, plenum, etc.) be modified (e.g., by changing percent open of perforated tiles 2108), to change the air flow.
As another example, the recommendation procedure 2700 may determine the vertical temperature gradient efficiency indicator is indicative of an inefficiency because the gradient is negative or too high. In some examples, a negative temperature gradient may indicate that the air near the floor and/or bottom of the server rack 2100 (e.g., and/or coming out of the plenum 2106) is warmer than the air near the ceiling and/or top of the server rack 2100. In some examples, an excessively high (e.g., above a threshold) temperature gradient may indicate that the air near the ceiling and/or top of the server rack 2100 is much warmer than the air near the floor and/or the bottom of the server rack 2100. Both situations can be problematic for the purposes of correct and efficient cooling of the servers 2110 within the server racks 2100. In some examples, if the recommendation procedure 2700 may recommend the air supply medium (e.g., ducts, plenum, etc.) be modified (e.g., by changing percent open of perforated tiles 2108), to change the air flow and correct the vertical temperature gradient.
As another example, the recommendation procedure 2700 may determine that the hot spot efficiency indicator is indicative of an inefficiency, because one or more hot spots are higher (or lower) than a maximum (or minimum) temperature threshold. In some examples, the recommendation procedure 2700 may recommend a modification of the air supply medium, cooling equipment 124 configuration (e.g., target temperature), and/or server rack 2100 arrangement (e.g., to disburse servers 2110 and/or server racks 2100) to address the hot spot inefficiency. As another example, the recommendation procedure 2700 may determine that the utilization efficiency indicator is indicative of an inefficiency, in which case the recommendation procedure 2700 may recommend disbursement and/or consolidation of servers 2110 and/or server racks 2100. In some examples, the recommendation procedure 2700 may determine that the airflow velocity efficiency indicator is indicative of an inefficiency because airflow velocity is different than what is expected. In such an example, the recommendation procedure 2700 may recommend a modification of the air supply medium, cooling equipment 124 configuration (e.g., air flow output), and/or server rack 2100 arrangement to address this inefficiency.
Through the systems and methods described herein, an administrator of a data center can monitor a data center to understand the current data center environment. Further, the systems and methods described herein allow for an administrator to monitor relevant events that occur pertaining to server racks 2100 within the data center, as well as the health of cooling equipment 124 within the data center. The systems and methods described herein additionally are able to determine if a cooling system within the data center is effectively and/or efficiently cooling and protecting the valuable equipment stored in the data center, and recommend solutions if otherwise. Using this data, the administrator can rearrange or move racks or equipment within the data center to protect the equipment within the data center from overheating, etc. Furthermore, the systems and methods described herein can provide an always accurate and up-to-date map of the data center even after the administrator changes the configuration of the data center, which demonstrates a significant improvement over the prior art systems that relied only on static and frequently out-of-date CAD drawings of the data center. The systems and methods described herein demonstrate a practical application and an improvement over the art.
The present method and/or system may be realized in hardware, software, or a combination of hardware and software. The present methods and/or systems may be realized in a centralized fashion in at least one computing system, or in a distributed fashion where different elements are spread across several interconnected computing or cloud systems. Some examples may comprise a non-transitory machine-readable (e.g., computer readable) medium (e.g., FLASH drive, optical disk, magnetic storage disk, or the like) having stored thereon one or more lines of code executable by a machine, thereby causing the machine to perform processes as described herein.
While the present method and/or system has been described with reference to certain examples, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present method and/or system. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present disclosure without departing from its scope. Therefore, it is intended that the present method and/or system not be limited to the particular examples disclosed, but that the present method and/or system will include all implementations falling within the scope of the appended claims.
As used herein, “and/or” means any one or more of the items in the list joined by “and/or”. As an example, “x and/or y” means any element of the three-element set {(x), (y), (x, y)}. In other words, “x and/or y” means “one or both of x and y”. As another example, “x, y, and/or z” means any element of the seven-element set {(x), (y), (z), (x, y), (x, z), (y, z), (x, y, z)}. In other words, “x, y and/or z” means “one or more of x, y and z”.
As utilized herein, the terms “e.g.,” and “for example” set off lists of one or more non-limiting examples, instances, or illustrations.
As used herein, the terms “coupled,” “coupled to,” and “coupled with,” each mean a structural and/or electrical connection, whether attached, affixed, connected, joined, fastened, linked, and/or otherwise secured. As used herein, the term “attach” means to affix, couple, connect, join, fasten, link, and/or otherwise secure. As used herein, the term “connect” means to attach, affix, couple, join, fasten, link, and/or otherwise secure.
As used herein the terms “circuits” and “circuitry” refer to physical electronic components (i.e., hardware) and any software and/or firmware (“code”) which may configure the hardware, be executed by the hardware, and or otherwise be associated with the hardware. As used herein, for example, a particular processor and memory may comprise a first “circuit” when executing a first one or more lines of code and may comprise a second “circuit” when executing a second one or more lines of code. As utilized herein, circuitry is “operable” and/or “configured” to perform a function whenever the circuitry comprises the necessary hardware and/or code (if any is necessary) to perform the function, regardless of whether performance of the function is disabled or enabled (e.g., by a user-configurable setting, factory trim, etc.).
As used herein, the term “processor” means processing devices, apparatus, programs, circuits, components, systems, and subsystems, whether implemented in hardware, tangibly embodied software, or both, and whether or not it is programmable. The term “processor” as used herein includes, but is not limited to, one or more computing devices, hardwired circuits, signal-modifying devices and systems, devices and machines for controlling systems, central processing units, programmable devices and systems, field-programmable gate arrays, application-specific integrated circuits, systems on a chip, systems comprising discrete elements and/or circuits, state machines, virtual machines, data processors, processing facilities, and combinations of any of the foregoing. The processor may be, for example, any type of general purpose microprocessor or microcontroller, a digital signal processing (DSP) processor, an application-specific integrated circuit (ASIC), a graphic processing unit (GPU), a reduced instruction set computer (RISC) processor with an advanced RISC machine (ARM) core, etc. The processor may be coupled to, and/or integrated with a memory device.
As used, herein, the term “memory” and/or “memory circuitry” means computer hardware or circuitry to store information for use by a processor and/or other digital device. The memory and/or memory circuitry can be any suitable type of computer memory or any other type of electronic storage medium, such as, for example, read-only memory (ROM), random access memory (RAM), cache memory, compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically-erasable programmable read-only memory (EEPROM), a computer-readable medium, or the like. Memory can include, for example, a non-transitory memory, a non-transitory processor readable medium, a non-transitory computer readable medium, non-volatile memory, dynamic RAM (DRAM), volatile memory, ferroelectric RAM (FRAM), first-in-first-out (FIFO) memory, last-in-first-out (LIFO) memory, stack memory, non-volatile RAM (NVRAM), static RAM (SRAM), a cache, a buffer, a semiconductor memory, a magnetic memory, an optical memory, a flash memory, a flash card, a compact flash card, memory cards, secure digital memory cards, a microcard, a minicard, an expansion card, a smart card, a memory stick, a multimedia card, a picture card, flash storage, a subscriber identity module (SIM) card, a hard drive (HDD), a solid state drive (SSD), etc. The memory can be configured to store code, instructions, applications, software, firmware and/or data, and may be external, internal, or both with respect to the processor.
Claims
1. A system, comprising:
- a sensor system configured to mount to a server rack within a data center, the sensor system comprising a sensor configured to measure data within the data center; and
- a computing system configured to receive the data, the computing system comprising: processing circuitry, and memory circuitry comprising machine readable instructions which, when executed, cause the processing circuitry to: determine a position of the sensor within the data center, determine an efficiency indicator based on the data measured by the sensor and the position of the sensor, determine whether there is an inefficiency within the data center based on the efficiency indicator, and in response to determining there is an inefficiency, recommend a solution to the inefficiency.
2. The system of claim 1, wherein:
- the sensor comprises a first sensor,
- the sensor system further comprises a second sensor configured to measure data within the data center,
- the position of the first sensor comprises a first position,
- the memory circuitry comprises machine readable instructions which, when executed, further cause the processing circuitry to determine a second position of the second sensor within the data center, and
- the efficiency indicator is determined based on the data measured by the first and second sensors, as well as the first position of the first sensor, and the second position of the second sensor.
3. The system of claim 1, wherein the position of the sensor is determined using position data obtained via a local positioning system or a relative positioning system of the data center.
4. The system of claim 1, wherein the data comprises thermal data, humidity data, or pressure data.
5. The system of claim 1, wherein the efficiency indicator comprises a hot spot, an airflow direction, an airflow magnitude, a horizontal temperature gradient, a vertical temperature gradient, or a server rack utilization.
6. The system of claim 1, wherein the inefficiency comprises a temperature above a maximum temperature threshold, a reversed air flow, a horizontal temperature gradient below a low delta threshold, a reversed horizontal temperature gradient, a vertical temperature gradient above a high delta threshold, or a reversed vertical temperature gradient.
7. The system of claim 1, wherein recommending the solution comprises generating a diagram showing a location of a server rack or cooling component within the data center that is impacted by the inefficiency or that will be impacted by the solution, generating a cost saving analysis that includes the solution, or generating a work order to implement the solution.
8. A method of determining inefficiencies in a data center, comprising:
- measuring data within the data center via a sensor of a sensor system configured to mount to a server rack;
- determining a position of the sensor within the data center;
- determining an efficiency indicator based on the data measured by the sensor and the position of the sensor;
- determining whether there is an inefficiency within the data center based on the efficiency indicator; and
- in response to determining there is an inefficiency, recommending a solution to the inefficiency.
9. The method of claim 8, wherein:
- the sensor comprises a first sensor,
- the data is measured via the first sensor and a second sensor of the server rack sensor system,
- the position comprises a first position,
- the method further comprises determining a second position of the second sensor within the data center, and
- the efficiency indicator is determined based on the first position and second position, as well as the data measured by the first sensor and second sensor.
10. The method of claim 8, wherein determining the position of the sensor comprises determining the position via a local positioning system or a relative positioning system of the data center.
11. The method of claim 8, wherein the data comprises thermal data, humidity data, or pressure data.
12. The method of claim 8, wherein the efficiency indicator comprises a hot spot, an airflow direction, a change in temperature, or a temperature gradient.
13. The method of claim 8, wherein the inefficiency comprises a temperature above a temperature threshold, a reverse air flow, a change in temperature above a high delta threshold, a change in temperature below a low delta threshold, a temperature gradient above a gradient threshold, or a reversed temperature gradient.
14. The method of claim 8, wherein the solution comprises a reconfiguration of a server mounted in the server rack, a consolidation of a processing load to fewer server racks, a disbursement of the processing load to more server racks, an installation of a blanking panel in the server rack, an installation of a containment solution around the server rack, a modification of an air supply medium, or a cooling system configuration change.
15. A non-transitory machine readable medium, comprising machine readable instructions which, when executed by a processor:
- determine a position of a sensor within a data center, the sensor being part of a sensor system mounted to a server rack within the data center, the sensor being configured to measure data within the data center;
- determine an efficiency indicator based on the data measured by the sensor and the position of the sensor;
- determine whether there is an inefficiency within the data center based on the efficiency indicator; and
- in response to determining there is an inefficiency, recommend a solution to the inefficiency.
16. The non-transitory machine readable medium of claim 15, wherein the position of the sensor is determined using position data obtained via a local positioning system or a relative positioning system of the data center.
17. The non-transitory machine readable medium of claim 15, wherein the data comprises thermal data, humidity data, or pressure data.
18. The non-transitory machine readable medium of claim 15, wherein the efficiency indicator comprises a hot spot, an airflow direction, a change in temperature, or a temperature gradient.
19. The non-transitory machine readable medium of claim 15, wherein the inefficiency comprises an temperature above a temperature threshold, a reverse air flow, a change in temperature above a high delta threshold, a change in temperature below a low delta threshold, a temperature gradient above a gradient threshold, or a reversed temperature gradient.
20. The non-transitory machine readable medium of claim 15, wherein the solution comprises a reconfiguration of a server mounted in the server rack, a consolidation of a processing load to fewer server racks, a disbursement of the processing load to more server racks, an installation of a blanking panel in the server rack, an installation of a containment solution around the server rack, a modification of an air supply medium, or a cooling system configuration change.
Type: Application
Filed: Mar 22, 2022
Publication Date: Feb 9, 2023
Inventors: Mike Lingle (Deerfield, IL), Michael Conaboy (Westminster, CO), Patrick Boehnke (Oak Park, IL)
Application Number: 17/701,166