Server Hosting Questions
Questions you should ask your internet data centre before making a decision where to host your mission critical servers.
Introduction:
Internet data centres (IDC) probably look much the same to the casual eye; mission critical computer systems and hosted servers racked up into cabinets, forming single rows with access corridors between.
The IDC will have some sort of environmental controls, such as air conditioning and fire suppression, as well as redundant or backup power supplies and redundant data communications. But not all Internet Data Centres are created equal. The devil is in the detail of what they provide.
For enterprises that demand high availability, or as close as possible to 100 per cent uptime, it is critical to first conduct a detailed audit of potential hosting providers.
This involves digging below the surface of marketing material to ensure adequate infrastructure and procedures are in place to maintain the required uptime of hosted server environments.
The main aim of an Internet Data Centre is to eliminate as many single points of failure as possible within the infrastructure and systems of a facility. This is achieved by “redundancy."1
However, redundancy costs money, because instead of one system, you are purchasing two or even multiple systems. Potentially, a provider can keep adding additional layers of redundancy to reduce the risk of downtime or failure to an absolute minimum. However, due to the costs associated with providing additional redundancy, it becomes a risk versus cost argument.
Some systems, for instance the EFTPOS system, must be available 100 per cent of the time. Consequently, an enormous amount of money is spent to ensure redundancy and fault tolerant systems are in place to get as close to 100 per cent uptime as possible.
Three Key Issues for Internet Data Centres
The key issues for an IDC is the reliability of its power supply, its cooling systems, and network reliability. Servers cannot operate without any of these systems being fully functional, 24hrs x 365days. If cooling systems go down, server rooms heat up extremely quickly and ultimately, server shut-downs must be instigated to avoid hardware and system damage. For companies running 24x365 internet-based operations, such as e-business solutions, emergency server shut-downs are not a viable option.
If the connectivity network fails, the server environment effectively becomes "unavailable."
Therefore, IDCs must have reliable, as well as redundant power, cooling and connectivity network systems.
Redundancy is provided by either:
- two devices or systems running and functioning in parallel with one another, or
- a backup standby device or system that will automatically come into operation if the primary device/system fails.
Scalability:
The ability to scale both of these systems (power, cooling & connectivity network) is also incredibly important with the trend toward denser, more highly concentrated server hosting environments. e.g. Popularity of blade servers that have a high concentration of CPUs in a small space.
As a result, blade servers enclosures can cause power and cooling issues for some data centres.
Virtually all power into hosted servers is converted into heat that must be dissipated. (up to 99%). Some older data centres have power systems incapable of handling the higher loads of modern server environments, and cooling systems unable to cool cabinets containing densely packed servers or blade server environments. While the demand for blade server hosting is increasing, the supply of IDCs that can handle them is lagging behind. So, if you want to deploy blade servers enclosures, be sure your data centre can cool them effectively.
Questions to ask an IDC:
If you are considering a Internet Data Centre, below is a list of questions you should ask.
1. Business Stability and Environment Ownership:
a. How long have you been in business?
b. Is your company stable and profitable, and viable over the longer term?
c. Do you own your own Internet Data Centre? (ie, Do you have direct control over the facility.)
d. What size of client do you service? Is the business sector your focus, or is residential mass market your priority?
Comments:
A number of hosting providers are in fact “Virtual Providers' - they do not own the data centre their gear is housed in and therefore do not have direct control over infrastructure designed to support it.
The type and size of client being served will provide an indication of the data centre's capability. This will give you some indication the IDC has met the stringent requirements of such organisations - larger organisations will likely undertake detailed audits of IDCs before signing up.
2. Power System Redundancy:
a. Do you have a mains power system that is scaleable and capable of managing additional power requirements as I grow my hosted environment?
b. Do you have centralised Uninterruptible Power Supply (UPS) systems and backup diesel generator systems in place?
c. For High Availability customer requirements, do you have UPS and generator redundancy?
Comments:
A number of existing Data Centers were built some years ago when the power requirements of server equipment was far more modest and the server power use variation was not great (5 per cent or less between high and low CPU utilisation). The average power utilisation of server systems has been trending up for a number of years as the processor power of CPUs increases. Power Management Technologies now deployed in servers and other communication equipment has resulted in a huge variation in power draw between a server operating at full capacity and when it is in idle mode. For example, blade server power draw variation can be as much as 80-90 per cent.
A Data Centre, therefore, has to have the capacity to increase the total amount of power it delivers to its server rooms, and its switchboards and backup generation has to be equal to the task. Some data centres simply can't deliver, as their legacy power systems are not scaleable. It's very difficult to re-engineer a power system infrastructure in an old data centre as older data centres do not have modern power boards which allow parts of the infrastructure to be bypassed while upgrades take place. It is therefore very difficult for upgrades to occur in these types of data centers, as they cannot take systems offline when they have customers reliant on 24x365 availability.
UPS and backup generators capable of carrying the full load of all hosted server equipment is mandatory for an IDC to ensure continuity of supply. Server equipment housed in High Availability facilities must be connected through redundant dual power supplies to the cabinet, dual UPS (run in parallel), and dual back-up generators to provide true a 2N power supply.2
3. Key Infrastructure Maintenance:
a. Do you regularly maintain/check/test your Mains Power Supply systems and have service contracts with qualified personnel in place?
b. Do you regularly maintain/check/test your UPS and have service contracts with qualified personnel in place?
c. Do you regularly maintain/check/test your backup diesel generator(s) and have service contracts with qualified personnel in place?
d. Do you regularly maintain/check/test cooling systems and have service contracts with qualified personnel in place?
Comments:
Mains power, UPS, generator and cooling systems require specialists to effectively maintain and service these systems. Agreements containing suitable service levels and SLAs should be in place with external specialists to ensure that regular maintenance and testing is undertaken.
4. Physical Structure and Location:
a. Do your co-location rooms have concrete floors, exterior walls and concrete ceilings?
b. Are your colocation rooms on the ground floor, in a basement or on the first or second floor of the building?
c. Is the data center in an area prone to flooding? (ie, is it near to a river or stream, at the bottom of a valley, close to the harbour's edge.)
d. Is there plumbing running above the server cabinets?
e. Is the IDC directly in the landing path of an international airport?
f. Are there any fuel dumps, major gas pipelines, petrol stations, liquefied petroleum storage tanks or any highly combustible substances stored nearby?
g. Do you have raised flooring in your colocation rooms?
Comments:
Ideally, a co-location room should have concrete floors so cabinets and server racks can be seismically secured to the floor to mitigate earthquake issues and stop server cabinets toppling over. Concrete exterior walls mitigates security risks, and concrete ceilings are preferred rather than tin roofing due to the risk of leaks.
It is important that the co-location rooms are on the ground floor as server equipment is heavy - a single cabinet can have 800kg or more of gear in it, so a 100 cabinet colocation room floor would have to be able to support 50-100tons of weight. For this reason, colocation rooms on the first and second floors of buildings designed for general office use run the risk of exceeding weight limits.
Server hosting rooms situated in basements run the risk of being quickly flooded in the event of leaks or external floods, and often basement ceilings have mains water supplies running across them, meaning the hosted servers are seriously at risk if a leak occurs.
Obviously being in a flood-prone area is not a good idea - sensitive electronic equipment and water do not mix well.
It might be rare, but planes do crash and the threat is significantly higher if the data center is near a busy international airport, in the flight path of planes landing and taking off.
Likewise, customers who take their hosting environment seriously will not consider an IDC close to a site storing highly combustible substances due to the risk of serious damage if an incident ever happened.
Raised flooring in the server hosting rooms means that if water does ever leak or get into these rooms, the cabinets and server equipment is unlikely to come into contact with a minor flood.
Ideally, the data center should have all water piping servicing Heating Ventilating and Air-conditioning Systems (HVAC) under the raised flooring, and have water sensor and alarm systems in place to detect potential leaks.
5. Connectivity to Outside World
a. How many upstream international bandwidth providers do you use?
b. Are you load-balanced across your upstream providers?
c. How many fibre optic circuits do you have into your data centre?
d. Do you have primary and secondary circuits for redundancy purposes?
e. Are your circuits through one carrier or do you use multiple carriers?
f. Are circuits into your IDC in separate ducting or trenches out on the road and do they come into the building at different entry points?
General Comments:
For an IDC, reliable internet connectivity is critical. International bandwidth redundancy is very important as approximately 80 per cent of bandwidth used by New Zealanders is international (ie, residential and business customers accessing sites offshore rather than locally). It is only a matter of time before an international provider has short term issues on its network due to congestion or hardware or software failure. So it is important that an IDC is not overly dependent on one single upstream but is “load-balanced' across multiple providers. If one fails, bandwidth is still available to hosted customers via others.
The physical connectivity from an IDC to the outside world is also important. Having multiple circuits through a variety of carrier networks is ideal with a primary connection and secondary (fail-over connection) for each carrier's circuit.
Ensuring that each fibre optic circuit is physically separated will reduce the likelihood of all circuits being accidentally cut. Again, this is one of the details that mission critical enterprises will require from a potential hosting supplier. They may request street level maps of fibre optic circuits to ensure “physical diversity'.
Physical diversity basically means that individual fibre connections are contained in separate trenches, which means they cannot all be accidentally dug up by a roading contractor.
6. Alarms, Monitoring, Fire Systems and Access:
a. Do you have monitoring and alarms on key parts of your infrastructure?
b. Do you have Closed Circuit TV monitoring IDC entranceways and co-location rooms?
c. What smoke detection systems do you have in place?
d. Do you have intrusion monitoring and alarms in place and are these monitored?
e. How is authorized access to the IDC managed?
f. Do you have flood monitors and alarms in your co-location space and in other key areas?
g. In the event of a fire, how are fire services alerted and are they conversant with dealing with a fire in a co-location space?
h. What kind of fire fighting equipment is in place?
General Comments:
All key equipment and systems in the IDC need to be monitored so engineers and building managers are alerted by alarms when a device or system starts, or even looks likely, to fail. Multiple types of alarm systems should be in place so that the IDC is not reliant on one system or person in the event of a failure. Ideally, the equivalent of a full Building Management System should be in place, so every part of the building and physical infrastructure is monitored and alarmed.
For security purposes there should only be a limited number of entranceways to colocation rooms, and these entranceways should have proximity card or biometric access systems in place. Alarm code systems should only let authorized personnel enter the IDC. There should be a minimum of two solid doors - either side of a “mantrap' - that authorised personnel must pass through using their proximity cards.
All entranceways should also be monitored by CCTV, and for high security rooms, CCTV cameras should operate down every cabinet row in the actual co-location room.
Police checks should be run on all staff working in the Network Operations Centre and on staff with direct access to server equipment. Access to co-location rooms should be limited to staff directly responsible for supporting hosted clients. Controls and procedures need to be in place to govern the activity of customers when accessing the server rooms.
Some colocation facilities run water-based cooling systems which means there are copper water pipes running to process coolers that reside inside the co-location space or in adjacent corridors. While these copper pipes lay beneath the level of the raised floor, it's important that water detection alarm systems are in place to immediately warn of a water leak.
Not all New Zealand data centres have raised floors, so water detection alarms are even more important. The water detection alarm systems should also be near key electrical equipment, such as the centralised UPS systems.
Colocation rooms should have Very Early Smoke Detection Apparatus (VESDA) systems in place. These systems constantly monitor air particles, raising the alert of a potential fire before one actually starts. The VESDA system typically raises an alert directly with the fire department, as well as NOC and building management staff.
The fire department needs to know how to access the building, and how to respond to fires involving colocation equipment without causing more problems than they are solving.
Some co-location facilities have automated gas fire suppressant systems in place, however, these are very expensive to implement and maintain, and are not feasible in very large co-location rooms or those designed with a high stud for cooling. In the absence of gas suppressant systems, manually operated CO2 fire extinguishers should be readily available in key areas of the IDC.
6. Network Redundancy:
a. Do you operate a full mesh core network?
b. Do you run a 100mbps, 1gbps or 10 gbps core network?
c. Are you able to provide Hot Standby Router Protocol services available to customers?
d. Do you make BGP routing available to your customers who require this level of redundancy?
Comments:
A “full mesh network' is important to eliminate single points of failure within the core network of a Data Centre. A full mesh network means each device is connected to every other device in the core network providing multiple paths through the network. This greatly enhances the uptime of the overall network and increases the availability to the outside world for customers hosted in the facility.
The bandwidth requirements of hosted enterprises are on average doubling every 18 months worldwide. This places huge demands on a hosting facility's networks that must cope with increases in network traffic. IDCs need to scale their core network capability to meet this requirement.
Hot Standby Router Protocol (HSRP) and BGP routing are important considerations as they help to eliminate reliance on the router that connects the hosted enterprise to the core network of the IDC.
7. Disaster Recovery:
a. What disaster recover contingencies do you have?
b. Do you have insurance to cover a major catastrophe?
c. Can you provide multi-site hosting for redundancy purposes?
Comments:
An IDC should have disaster recovery plans to mitigate the risk of both minor and major catastrophes. These plans should cover eventualities such as a failure in single system, through to the complete destruction of the facility, such as through a major earthquake.
Insurance cover should be in place to cover the cost of restoring the facility and loss of business revenues if these occur.
The IDC should be able to provide offsite hosting for high availability customers, so that if the primary site becomes unavailable, hosted machines at the secondary site will still be available to the outside world.
To achieve this, an IDC will likely have an arrangement with another data centre to host important back-up network equipment offsite, as well as equipment for customers requiring offsite disaster recovery hosting. It may also have backup equipment for various services hosted overseas for redundancy and disaster recovery.
1Redundancy: A term used in ICT to describe parallel systems or a system that can fail-over to another backup system, designed to avoid loss of service and availability.
2True 2N Power Supply: A 2N power supply consists of two totally independent power infrastructure systems, each system consisting of its own power cabling, array of power switchboards, UPS, and diesel generator. A 2N power supply is required statistically to sustain a 99.99% uptime.

