It may surprise some that despite its sheer size, a typical data centre is operated by a relatively small number of personnel. This is because the vast majority of the electrical and mechanical plumbing systems run autonomously in the background where they operate round-the-clock. And no wonder, considering the split-second reaction times needed to keep power flowing in the event of a power incident.
Of course, it is up to the diligence of data centre engineers to ensure that the hardware and their fail-safes are kept in tip-top condition. This is achieved through a mixture of proactive maintenance to fix things before they break, expedite repairs when they do, and be there to make the correct decisions in the event of an anomalous situation.
An eye on operations
While rarely talked about outside of specialist circles, the technical skills, professionalism, and excellence of data centre engineers is often the defining trait of a great data centre. Conversely, no amount of investments in facilities can make up for inadequacies with the operations team.
For a greenfield data centre, this entails having the operations team finalize and document all standard operation procedures and maintenance operational protocol (MOP) ahead of the go-live date. The idea is to ensure that pertinent processes are fine-tuned and exhaustively documented, leaving no room for ambiguity.
This is where Uptime Institute's Tier Certification of Operational Sustainability comes into the picture. Obtainable after one year, the accreditation looks into multiple considerations such as staffing levels, training processes, maintenance logs and various other procedures to appraise the overall operational rigor of a data centre.
Just as not everyone with a driving license have what it takes to drive a Formula One race car, merely being a technician or engineer does not automatically qualify them to operate a data centre. The Operational Sustainability standard hence serves as an internationally recognized benchmark to independently audit and certify that the operational team have what it takes to run a Tier IV or Tier III data centre.
All about people
It is no secret that a significant proportion, if not an outright majority of IT problems and outages can be traced back to human errors. While it may be naïve to claim that one can eliminate them, it goes without saying that the presence of competent and experienced professionals does make a world of difference in reducing them.
Running a data centre is really a team effort, and hiring is a holistic affair that cannot be limited to well-honed areas such as technical skills, years of experience, or having the right mix of specialists – factors such as having a positive attitude and passion for the data centre environment must not be overlooked.
The latter is crucial because the data centre is a rigid and repetitive work environment where the lack of interest can culminate in high turnover. This results in a greater propensity for mistakes due to unfamiliarity with procedures by new team members. On the other hand, a strong team that has bonded will allow members to achieve a level of synergy that is more than the sum of its parts.
Workers must also be empowered for lifelong learning by sending them for training and industry events, while technology in the form of video recordings can be leveraged too to help them pick things up quicker. Step by step maintenance tasks could be documented down where they can be viewed by new engineers and technicians coming on-board, allowing them to easily learn and review key maintenance procedures.
A different kind of system
Finally, the Internet of Things (IoT) and modern software has opened the door to ensure that maintenance procedures are not skipped or inadvertently missed. Hundreds of checks can be weaved into a comprehensive system using both standard data centre equipment and cutting-edge IoT monitoring systems.
The objective is to blend IT systems and monitoring hardware to enforce the completion of mandatory processes with reminders for missed steps, and escalation for continued infringement. With Telin-3, we are able to implement these controls without the encumbrance of legacy hardware, and anchor it to a comprehensive framework ahead of construction.
For instance, operating software from different vendors were chosen for compatibility and integrated at the design stage before ground-breaking. Deep software integration makes it possible to connect standalone systems into a cohesive whole, while networked equipment and servers report automatically back to centralized servers. We call this integrated system the Telin Integrated Portal (TIP), which also takes care of asset procurement and management.
Ultimately, this wealth of data makes it possible to apply analytics to predict potential equipment failure, allowing preventive maintenance to be conducted before it happens. The high level of automation minimizes disruptions from turnover or job reassignments, and plays an important part to reduce human errors.
Operating a smooth-running data centre is hard work. Telin Singapore is committed to operational excellence, and with a solid team and TIP, we are confident that we have what it takes for the long haul.Back