Server Room Reality: How Important is Cooling System?

If you do not have a big server room, designed by experts, it might be interesting for you to see how others manage security, safety and cooling system. Read about a true experience and it might give you an idea what to check next in your server room.

How important is cooling system?

 If you do not have a big server room, designed by expert, it might be interesting for you to see how others manage security, safety and cooling system. Read about true experience and it might give you an idea what to check next in your server room.

How Keep your Server Room Cool Enough.

If you are working in an average IT, you do agree that server rooms are far away from what you might see on IT exhibitions. The cruel reality is that the whole system is in a small, usually overloaded room. How can we make sure the cooling system is going to work all the time? It is crucial part, even if we do not want to agree. If cooling system fail you can have a big problem.

Situation:

Small server room with three racks, one air condition system, UPS power supply for servers, fire alarm and access control.

True Example:

It was Monday morning in December, it was snowing and there were traffic jams. There was unplanned electricity power loss, servers went down as it was planned and everything looked just like standard procedure. After few hours the electrical power came back and all servers started to work normally.

Problem:

The cooling system didn’t start again, because safety fuse has blown.

Here the big problem occurred. The temperature in the server room started increasing rapidly. There were also loud noises coming out of the server room, since all the vents were trying to do their best. The temperature in the room was much over 50oC/122F and the walls were pretty hot.

Procedure:

Alarming system send few SMS to admin cell phone and we all started with emergency procedure. One of our coworker, with enough authorization rights, opened the server room and started with cooling the server room. All servers were already in safety stage and in that time they started the shut down process.

Damage:

There was no real HW damage since servers were down only for half an hour. We could calculate the time and money we spend/lose with this event, but in general due to weather condition everything worked slower than in normal working days.

Result:

We wanted to do some sort of improvement, so similar incident would not occur any more.

  • We installed two separate air-condition systems
  • There are already a fire alarm installed but we also built in some fire extinguishing ampoule beneath the servers, just in case
  • Second most important improvement was installing special air ventilation in case air-condition is not working, that way we can at least assure constant air circulation and that fresh air is coming inside and hot air is leaving the room
  • Sensors, added vent and air-condition are on different safety fuses

Conclusion:

Despite all precaution, there is always a chance that we forgot on something. But we can learn from our and other experiences and try to minimize chances of possible failure in the future.

One Response

dm
08.07.16

I like the example, real story behind some teory!

Leave Your Response