Tuesday, January 10, 2006

[root@localhost] # uptime

Ok, you got the service up now what? next level is to keep them up and running. This involves a lot of hard work. Getting something up is just 20% of the work completed. The rest is the real challenge. It sure is fun to break into something; but trust me keeping something up is more fun. Even if you think in terms to intruders into a system. If you want to keep your systems up, you need to think like people who want to break in too. This takes time and effort. It does not happen over night. Its like a war you fight. The whole point in it is though you loose a battle, does not mean you lost the war. You learn from the battles you loose and add to our experience. To keep the so called “hackers” is just one part of it. There is a whole lot more then them to having an ideal “100% uptime service”.

There is a whole aspect of things you need to know and know well. Starting from the simple hardware, the network to the software. Everything about the system. Yes, things like even power. Daa! you thought you were an IT guy and you don’t have anything to do with power. Hello, wake up buddy, welcome to the real world. Your systems should have redundant power and everything else. Ah don’t forget the air conditioning. Yes! what if that fails, and over heats your systems? you need to think and visualize everything. I mean everything. What if someone trips over the power cable? I can go on and on. A lot of times most of the simple things are ignored. It takes mistakes to be done, for one to learn and experience all that. I had my bitter side of it too. Which is good. I’ve learned and still am, and a whole lot more to digest. Its easy to say then to get all those in place. Neat and documented cables and diagrams and well documented scripts. Organize the backups and schedules. Everything becomes a nightmare at a point, but then aaah some how it slowly starts to make sense and fit in some place. If one can achieve a system that is 100% fail proof I will be a slave to that person. I mean I think thats something which is practically impossible. But at the same time, I would say we should never stop working to achieve this.

It all comes down to the users. Yes. Bob Dylan says “you got to serve some body”. We server the users, they maybe the users, customers, what ever you call em. I’d say the users to make it short. Keeping those applications up is damn a lot of fun. You don’t care what garbage they enter or process. As long as you have them up thats the job. That keep you happy, and during the whole process you learn so much. Thats the excitement. What if the database crashes what if the machine has a hardware fault? How to achieve the optimal performance? Planing your systems such that if something crashes it fails over to the rest and how to get the faulty system back up ASAP. What if during that time something goes wrong with another node? daaa! things that sound crazy, but someone needs to figure them out. Clustering is one method where it saves your ass from all those troubles. Managing fail overs and fail backs and load balancing, etc. Keeping it all up is enough the challenge. What about the challenge and the fact that if it goes down and under worst case scenario; how to get them backup again. Do you have the backups, how about hardware and infrastructure? Ah still I need to think and work on an alarm system. Stuff like auto SMS when something goes wrong, kinda like alert systems for all the services from the power to the applications, services and intruder alerts.

Anyway I just thought of all this cos I am one frustrated systems administrator. A lot of times not many people in Maldives in the IT sector has to think to this level. I don’t know if its a blessing or a curse. But I sure would be happy to see more people who will think and get the dirty work done and who are really interested to do that, then to talk about having “hacking days”. I feel they can do a lot more. Maybe I am wrong ? I know so much of talent is out there, we need that to be utilized and given a chance.

Anonymous said...

from zerocool
vara furihama ingay chopey i also have learned a lot from this artical and im also a sys and network admin