What it takes (IMHO) to be a system admin
Maybe it is naive of me, but it seems that one noble aspiration for
Linux users is to get a 'real job' as a system administrator (SA). The
question has come up in some discussions recently 'What is
involved in moving from Linux user to system administrator?'.
Accordingly, here are some of my thoughts.
- Change your mindset
- As a competent Linux user on your own hardware you are God. Even an incompetent
Linux user is God on his own box. If you
want to reboot, so what? If you 'rpm -e <something important>', you
may be up half the night, but that's what caffeine is for. System
admins have no such luxury.
- The true SA is a combination caretaker, security
guard, and baby sitter. It is his/her job to keep the machine running
at almost all costs. In many cases corporate profits depend
directly on the SA's ability to protect and maintain the system.
- For instance, on one of the systems I
administer, a reboot takes half an hour (memory and cpu checking) and
idles approximately 600 people spread across 3 states.
There is no room for 'oops'.
- Paranoia is not only good, it is indispensable.
- Learn new toolsets
- If one wants to move into the corporate arena, then you must be able to
take whatever tools are laying around and get the job done. This means
the ability to learn new tools and to use old tools in new ways.
- For instance, 'locate' is nice, but it quickly becomes apparent that
you have to know 'find'. As in 'find all of the files that are owned
by joe, larger than 1M, haven't been accessed in over 30 days and
delete them'
- It won't be long before a compile/install breaks. You have to be able to work through
a make file to find the missing include; to interpret the 'command not found' message; to
debug some vendor's install script that is expecting the default shell to be tcsh instead of
sh and a dozen other things.
- One of the Perl scripts below mentions that I had to fix an error in a
system level include file to get the Log::Syslog module to work
correctly.
- Use your work as an excuse to learn more. I am using this 'project'
as an excuse to learn about txt2html.pl. This is a Perl script for
changing plain text files into basic html pages. As I learn something
here, I apply it to several other projects for various clients.
- Learn to handle pressure
- With your personal Linux box, you can always power down and walk away
if it gets too intense. Not so as an SA.
- Expect to have to work with others looking over your shoulder. It
adds a new level of pressure to have a Senior VP of a billion dollar
company watching you type! (Or just the guy who signs your paycheck.)
- Forget about lusers. They are users and they are why you exist. Whether they have a clue
or not. By definition, they come to you when something is 'wrong' in their world. It is your job
to make it right (and then to keep it from happening again).
- Never start from scratch. Find something close and modify
- For scripting, start with the boot up scripts (/sbin/init.d,
/etc/rc.d, etc)
- For Perl, Perl.org
- You can always see farther standing on someone else's shoulders
- Hang out with experts
- Unix Guru Universe
- Your local Linux group
- Don't be afraid of appearing ignorant. Fear staying
ignorant
- Practice good debugging habits
- Understand it the way it is (broken) before you try to fix it
- Document what you have found (they're called backups)
- Change one thing and retest
- Debug ahead of time by trying to understand everything you can
- Learn manually, then codify
- Figure out the commands manually
- Remember the commands by writing a script for them and commenting the script.
- One of the beauties of Unix is that you don't have to type it again if
you don't want to. Write a script
- Change scripting languages just for fun (and learning)
- Keep a notebook of tips and gems you've found
- Document what you do
- Comment your scripts liberally. The best comments (IMHO) are the ones
that explain 'Why?'.
- Use a standard README file in directories that you modify. That way
when they call you at night and say "Where is my file?" you have a
prayer of finding it.
- Documentation will score points with the next SA as well as
increasing your value in the community.
- Set personal standards (like creating README files). It is amazing what you can forget
in a week.
- Learn to share
- Many people will be very interested in what you are doing. Both from
a business perspective and from a technical one. You must be able to
discern how much to share with whom.
- Share what you've learned with others (that's why I'm doing this page)
- Find somewhere online to hangout. Listen, learn and contribute. Mailings lists or irc
can be good starting points.
- Remember to have fun
- I like to add humor to my scripts. Note the printed comments for the
kill scripts. I figure if I can make the night operator chuckle once
or twice, then he is more likely to help me.
- Make Unix your passion, not just your job. Don't consider becoming a
SA if it isn't your passion. There was recently a discussion on the
COLUG list about the meaning of a vanity license plate on an SUV
'8MYCSH'. The Unix die hards interpreted that as 'ate my c-shell'. More
likely it was meant as 'ate my cash'.
- When things get too intense, go visit Simon, the Bastard Operator from Hell
(or other system admin related humor sites)
Here are some examples of scripts that I have written. I offer them as
a way to see inside one of the problems that an SA faces. The
essential problem is how to allow inexperienced operators reset certain
processes without giving them the root password. The obvious first step
is to install sudo and add the kill command to the allowed commands.
But this is too broad, and too narrow at the same time. Too broad in
that they can kill things that they shouldn't (system processes, etc),
too narrow in that they may need to kill hundreds of similar processes
quickly. (All of these scripts are written in Perl 5.005 and will work
on an HP-UX 10.20 system. I've made no effort to adapt them to Linux,
because that is not the point. All of the kill scripts require root
access or sudo.)
In following my own advice in #2 above, I was reteaching myself Perl while writing these scripts.
They could also have been written using sh, or awk.
killp.pl script
To address the first problem, I wrote the following Perl script. It
allows a person with sudo access to this script (not to kill in general)
to kill processes belonging to a person. System processes are
protected. killp.pl
killu.pl script
To address the second problem (killing batches of users) I created the
killu.pl script. It takes a user prefix and kills all of the
users whose login ids start with that prefix. It allows for some
special ids to be protected so you don't kill supervisors, etc. Some of
the protections can be overridden by specifying a longer prefix.
Examples, other things
du.pl script
On larger systems (AIX, HPUX, etc) you will encounter the idea of volume
groups and logical volumes. This is a great idea which allows you to
create virtual disks from collections of smaller or larger disks.
Unfortunately, it spreads out the information so it is no longer easy to
answer the question 'How much space is left?'. I wrote this
du.pl script to combine the output of two commands
(vgdisplay and bdf)
sortu script
Once you get over a screenful of users logged in, managing them, finding
the idle ones, etc becomes a problem. This script sortu.pl
combines the hectic output from who into a more readable format as well
as allowing a summary to be generated.
Author
Jim Wildman has been mucking around Unix since 1985. After a few years on Suns, he changed jobs and started
working on HP's. Then he added Linux in 1994 or 95 and a dabbling of AIX in '98. Those jobs
have included stints in electronics manufacturing, healthcare, the online sales industry and
Internet/ecommerce consulting. He is currently
employed as a senior consultant by divine, Inc.
These pages were produced using some combination of RedHat Linux,
Quanta,
txt2html, and of course
vim.
Questions and comments can be sent to Jim at jim@rossberry.com.
All trademarks are the property of their respective holders. Last update Feb 26, 2000.