VizagInfo.com Home | VizagInfo Mirror of LG


LINUX GAZETTE

January 2004, Issue 98       Published by Linux Journal

Front Page  |  Back Issues  |  FAQ  |  Mirrors
The Answer Gang knowledge base (your Linux questions here!)
Search (www.linuxgazette.com)


Linux Gazette Staff and The Answer Gang

TAG Editor: Heather Stern
Senior Contributing Editor: Jim Dennis
Contributing Editors: Ben Okopnik, Dan Wilder, Don Marti

TWDT 1 (gzipped text file)
TWDT 2 (HTML file)
are files containing the entire issue: one in text format, one in HTML. They are provided strictly as a way to save the contents as one file for later printing in the format of your choice; there is no guarantee of working links in the HTML version.
Linux Gazette[tm], http://www.linuxgazette.com/
This page maintained by the Webmaster of Linux Gazette, webmaster@linuxgazette.com

Copyright © 1996-2004 Specialized Systems Consultants, Inc.

LINUX GAZETTE
...making Linux just a little more fun!
Pass on Passwords with scp
By Dave Sirof

Pass on Passwords with scp



VizagInfo.com Home | VizagInfo Mirror of LG


Pass on Passwords with scp

In this article I'll show you how to use scp without passwords. Then I'll show you how to use this in two cool scripts. One script lets you copy a file to multiple linux boxes on your network and the other allows you to easily back up all your linux boxes.

If you're a linux sysadmin, you frequently need to copy files from one linux box to another. Or you need to distribute a file to multiple boxes. You could use ftp, but there are many advantages to using scp instead. Scp is much more secure than ftp, as scp travels across the LAN /WAN encrypted, while ftp uses clear text (even for passwords.

But what I like best about scp is that it's easily scriptable. Suppose you have a file that you need to distribute to 100 linux boxes. I'd rather write a script to do it than type 100 sets of copy commands. If you use ftp in your script it can get pretty messy, because each linux box you log into is going to ask for a password. But if you use scp in your script, you can set things up so the remote linux boxes don't ask for a password. Believe it or not, this is actually much more secure than using ftp!

Here's an example demonstrating the most basic syntax for scp. To copy a file named 'abc.tgz' from your local pc, to the /tmp dir of a remote pc called 'bozo' use:

scp abc.tgz root@bozo:/tmp

You will now be asked for bozo's root password. So we're not quite there yet. It's still asking for a password so it's not easily scriptable. To fix that, follow this one time procedure (then you can do endless "passwordless" scp copies):

1. Decide which user on the local machine will be using scp later on. Of course root gives you the most power, and that's how I personally have done it. I won't give you a lecture on the dangers of root here, so if you don't understand them, use a different user. Whatever you choose, log in as that user now for the rest of the procedure, and log in as that user when you use scp later on.


2. Generate a public / private key pair on the local machine. Say What? If you're not familiar with Public Key Cryptography, here's the 15 second explanation. In Public Key Cryptography, you generate a pair of mathematically related keys, one public and one private. Then you give your public key to anyone and everyone in the world, but you never ever give out your private key. The magic is in the mathematical makeup of the keys - anyone with your public key can encrypt a message with it, but only you can decrypt it with your private key. Anyway, the syntax to create the key pair is:

ssh-keygen -t rsa


3. In response you'll see:
"Generating public/private rsa key pair"
"Enter file in which to save the key ... "
Just hit enter to accept this.

4. In response you'll see:
"Enter passphrase (empty for no passphrase):"
You don't need a passphrase, so just hit enter twice.


5. In response you'll see:
"Your identification has been saved in ... "
"Your public key has been saved in ... "
Note the name and location of the public key just generated (it will always end in .pub).

6. Copy the public key just generated all your remote linux boxes. You can use scp or ftp or whatever to do the copy. Assuming your're using root (again see my warning in step 1. above), the key must be contained in the file /root/.ssh/authorized_keys (watch spelling!). Or if you are logging in as a user, e.g. clyde, it would be in /home/clyde/authorized_keys. Note that the authorized_keys file can contain keys from other PC's. So if the file already exists and contains text in it, you need to append the contents of your public key file to it.

That's it. Now with a little luck you should be able to scp a file to the remote box without using a password. So let's test it by trying our first example again. Copy a file named 'xyz.tgz' from your local pc, to the /tmp dir of a remote pc called 'bozo'

scp xyz.tgz root@bozo:/tmp

Wow !!! It copied with no password!!

A word about security before we go on. This local PC just became pretty powerful, since it now has access to all the remote PC's with only the one local password. So that one password better be very strong and well guarded.

Now for the fun part. Let's write a short script to copy a file called 'houdini' from the local PC to the /tmp dir of ten remote PC's, in ten different cities (with only 5 minutes work). Of course it would work just the same with 100 or 1000 PC's. Suppose the 10 PC's are called: brooklyn, oshkosh, paris, bejing, winslow, rio, gnome, miami, minsk and tokyo. Here's the script:

#!/bin/sh
for CITY in brooklyn oshkosh paris bejing winslow rio gnome miami minsk tokyo
do
scp houdini root@$CITY:/tmp
echo $CITY " is copied"
done

Works liek magic. With the echo line in the script you should be able to watch as each city is completed one after the next.

By the way, if you're new to shell scripting, here's a pretty good tutorial:
http://www.freeos.com/guides/lsst/.

As you may know, scp is just one part of the much broader ssh. Here's the cool part. When you followed my 6 stop procedure above, you also gained the ability sit at your local PC and execute any command you like on any of the remote PC's (without password of course!). Here's a simple example, to view the date & time on the remote PC brooklyn:

ssh brooklyn "date"

Now let's put these 2 concepts together for one final and seriously cool script. It's a down and dirty way to backup all your remote linux boxes. The example backs up the /home dir on each box. It's primitive compared to the abilities of commercial backup software, but you can't beat the price. Consider the fact that most commercial backup software charges licence fees for each machine you back. If you use such a package, instead of paying licence fees to back remote 100 PC's, you could use the script back the 100 PC's to one local PC. Then back the local PC to your commercial package and save the license fee for 99 PC's ! Anyway the script demostates the concepts so you can write you own to suit your situation. Just put this script in a cron job on your local PC (no script is required on the remote PC's). Please read the comments carefully, as they explain everything you need to know:

#!/bin/sh

# Variables are upper case for clarity

# before using the script you need to create a dir called '/tmp/backups' on each
# remote box & a dir called '/usr/backups' on the local box


# on this local PC
# Set the variable "DATE" & format the date cmd output to look pretty
#
DATE=$(date +%b%d)

# this 'for loop' has 3 separate functions

for CITY in brooklyn oshkosh paris bejing winslow rio gnome miami minsk tokyo
do

# remove tarball on remote box from the previous time the script ran # to avoid filling up your HD
# then echo it for troubleshooting
#
ssh -1 $CITY "rm -f /tmp/backups/*.tgz"
echo $CITY " old tarball removed"

# create a tarball of the /home dir on each remote box & put it in /tmp/backups
# name the tarball uniquely with the date & city name
#
ssh $CITY "tar -zcvpf /tmp/backups/$CITY.$DATE.tgz /home/"
echo $CITY " is tarred"

# copy the tarball just create from the remote box to the /usr/backups dir on
# the local box
#
scp root@$CITY:/tmp/backups/$CITY.$DATE.tgz /usr/backups
echo $CITY " is copied"

done


# the rest of the script is for error checking only, so it's optional:

# on this local PC
# create error file w todays date.
# If any box doesn't get backed, it gets written to this file
#
touch /u01/backup/scp_error_$DATE

for CITY in brooklyn oshkosh paris bejing winslow rio gnome miami minsk tokyo

do

# Check if tarball was copied to local box. If not write to error file
# note the use of '||' which says do what's after it if what's before it is not # true
#
ls /u01/backup/$CITY.$DATE.tgz || echo " $CITY did not copy" >> scp_error_$DATE


# Check if tarball can be opened w/o errors. If errors write to error file.
tar ztvf /u01/backup/$CITY.$DATE.tgz || echo "tarball of $CITY is No Good" >> scp_error_$DATE

done

That's about it. In this article I've tried to give examples that demonstate the concepts, not necessarily to be use "as is". Some of the syntax may not work in all distros, but in the interest of brevity I could not include all the possibilities. For example, if you are using Red Hat 6.2 or before, the syntax will require some changes (I'd be happy to give it to you if you email me). So be creative and hopefully you can use some of this in your own environment.
Unless otherwise mentioned, this work copyright © 2003-2004 by SSC, Inc. All rights reserved.

 

[BIO] None provided.


Copyright © 2004, Dave Sirof. Copying license http://www.linuxgazette.com/copying.html
Published in Issue 98 of Linux Gazette, January 2004

LINUX GAZETTE
...making Linux just a little more fun!
Comics - January 2004
By Javier Malonda

Comics - January 2004



VizagInfo.com Home | VizagInfo Mirror of LG


Comics - January 2004

The Ecol comic strip is written for escomposlinux.org (ECOL), the web site that supports, es.comp.os.linux, the Spanish USENET newsgroup for Linux. The strips are drawn in Spanish and then translated to English by the author.

These images are scaled down to minimize horizontal scrolling. To see a panel in all its clarity, click on it.

[cartoon]
[cartoon]
[cartoon]

All Ecol cartoons are at tira.escomposlinux.org (Spanish), comic.escomposlinux.org (English) and http://tira.puntbarra.com/ (Catalan). The Catalan version is translated by the people who run the site; only a few episodes are currently available.

These cartoons are copyright Javier Malonda. They may be copied, linked or distributed by any means. However, you may not distribute modifications. If you link to a cartoon, please notify Javier, who would appreciate hearing from you.


Copyright © 2003, Javier Malonda. Copying license http://www.linuxgazette.com/copying.html
Published in Issue 98 of Linux Gazette, January 2004

Unless otherwise mentioned, this work copyright © 2003-2004 by SSC, Inc. All rights reserved.

 


Copyright © 2004, Javier Malonda. Copying license http://www.linuxgazette.com/copying.html
Published in Issue 98 of Linux Gazette, January 2004

LINUX GAZETTE
...making Linux just a little more fun!
Flashkard Printed Output
By Phil Hughes

Flashkard Printed Output



VizagInfo.com Home | VizagInfo Mirror of LG


Flashkard Printed Output

Hal Stanton's article about FlashKard made me abandon some primitive flashcard software I had been working on. But, what I wanted was a way to print the data on, well, flashcards. I was going to write one of my typical hacks--most likely using awk and troff--to print the cards but I decided to try to actually work with the xml.

What I mean by work with the xml is to use standard XML tools to format the data for output. I had never done this before but I knew an important buzzword which is XSLT which stands for XML Stylesheet Language Transformations. This is a language designed to define transformations from XML to other formats. So, I started reading. Typically, XSLT is used to transform XML into HTML but there is no restriction on what you can do with it so I decided to give it a try.

But, I still needed two more pieces: what to transform it into and, as there would be some program logic involved to place the cards on the page in the right place, some general-purpose programming language. After considering the various programming language alternatives--Python being the one that sounded best--I realized that if I just generated PostScript then I could let the printer itself deal with the placement issues. Strange but, I figured, why not.

Output Format

Having picked PostScript, I sat down to actually decide how to place the cards on the page. Flashcards need to be double-sided and, at first, I thought of printing one side and then running the card stock back thru the printer to do the other side. This is a logical nightmare as it is easy to get the paper in the printer wrong, have a registration problem or get out of order because of a printer jam.

I decided on an alternative approach which involves another high-tech device called a glue stick. The idea is to print the front and back of each card on the front of one page which you then fold in half, glue together and cut into the actual cards. The double layer of paper and the glue will make the cards heavy enough to work with.

At this point, it is time for a confession. This is not a beautiful, finished production system. What it is is something that works and a proof of concept. For a production environment, it is important to define card sizes and fonts in a configuration file. In addition, the message for each side is currently printed in a single line without consideration of size. Line folding needs to be implemented.

Ok, back to work. I picked a 1.5 x 2.5 inch card size which makes it possible to get nine cards--both front and back--on one side of letter-sized paper. There are 1 inch top and bottom margins and .5 inch left and right margins. In order to make folding and cutting easy, I want to print a fold line down the middle of the page (between the front sides and the back sides) and cut marks for the edges of the cards. With this fold, the printing on the back is upside down from the printing on the front. After considering this I decided it wasn't important--it just defined which way to turn over the card when using them.

The PostScript

Everything (that is, the PostScript and the the XSL) is all in one file which you can download here. You can just ignore the XML stuff for now; note that if you try to display this in your browser, it will not display correctly because of the XML. You can see the sample output here.

If you have never worked in PostScript, get ready. PostScript is an RPN (Reverse Polish Notation) language. If you have ever used an HP calculator you will know what I am talking about. If not, the quick explanation is that you do things by putting items on a stack and then operating on that stack. For example, to add two numbers, you place the numbers on the stack and then execute the add operator which fetches the numbers, adds them and puts the result back on the stack. Note that I hate RPN languages. :-)

Disclaimer aside, PostScript is actually a very clean language and not a bad language to do the work we need to do. The way you work with PostScript is you describe everything you want to put on a page--characters, lines, filled-in areas and such--and then you tell it to print the page. That means that we don't have to remember a lot of stuff and then work down the page sequentially--we just move around and put what we want on the page.

In PostScript the basic unit of length is 1/72 of an inch. Personally, I an not very excited about working in such units so I defined a function called inch which takes the current value on the stack, multiplies it by 72 and puts the value back on the stack.

/inch { 72 mul } def
This way, I just add the word inch after a number and it gets multiplied by 72.

If you look at the cutmarks function, you will see a whole bunch of moveto and lineto statements. As you might expect, these operators take two values off the stack (an x and a y coordinate where the 0,0 is the lower left corner of the page and a positive move in to the right or up) and either move the current location to the specified coordinates or draw a line from the current location to the specified location.

Going down to the startit function, you can see all the setup work for the page. I define three, 9-element arrays, x, yf and yb which contain the x and y coordinates (yf for front, yb for back) of where to place the text for each of the nine cards. (Note that arrays in PostScript are indexed starting at 0.) The other two initialization steps are to define the font and font size to be used for the text and to set the card number counter cardno to 0.

Two other utility functions are defined, cardstep and pageout. pageout checks the current card number and if it is greater than 0, draws the cutmarks (by calling the cutmarks function and then prints the page using the showpage builtin. cardstep increments the card counter and then, if it is greater than 8, calls pageout to print the page and then resets cardno to 0 to prepare for the next page.

The last two functions are front and back. They move to the correct location on the page by indexing into the location arrays and then print the top value on the stack using the show builtin. The back function then calls cardstep to move along to the next position. Thus, the following two lines would print a card:

(Front Side) front
(Back Side) back

I said two lines but the spacing isn't important in PostScript. You would get the same result of this information was on one line. The parenthesis are used to delineate the string which is being placed on the stack..

All of the lines starting with a slash (/) have just defined functions. The real program starts with the line startit which calls the startit initialization function. Next, a series of calls to front and back must be input finally followed by a call to pageout to output the last page if there are any cards on it.

The XSL

I tested the PostScript with some sample data and it worked fine. So, on to the next part which is translating the XML from FlashKard into what is needed to drive the PostScript code. Two pieces are needed here, the XSL that I have to write and a program to read the XSL and the XML files from FlashKard and then output the PostScript to send to the printer.

The easy part was the program. xsltproc is exactly this program. One down. On to writing something in a language I have never seen before. But, could it be worse than writing in an RPN language?

As it turns out, there really isn't much to do. After some XSL boilerplate (<xsl:stylesheet ... > I needed to define the output format to be text as HTML is the default. What text means is "anything else". This is done with

<xsl:output method="text">

The first thing I want to output is the PostScript program itself. This is done by including it immediately after a <xsl:template match="/"> tag. The match of / matches the whole XML so it is processed at the start of the file. Note that I have put the %!PS on the same line as the xsl tag. This is necessary so that the printer will see this as the beginning of the first line of data. Otherwise the print spooler will think this is just text and print the PostScript rather than it being interpreted.

There is one other XSL tag before the matching </xsl:template> tag which is <xsl:apply-templates/>. This tells xsltproc that any other matching templates are to be applied here.

There is one other template with a match expression of match="e". This matches the block describing an individual card. This is explained in a comment to the FlashKard article. Within that block is an o block for the original language entry and a t block for the translation. Using the value-of feature, I grab these values, put them in parenthesis and follow them by either front or back.

That's it folks. Assuming the XSL in in ks.xsl, entering the command

xsltproc ks.xsl creatures.kvtml | lpr
will give you your first set of flashcards.

As I mentioned before, this is a proof of concept. Generalizing the PostScript, dealing with line folding and writing a shell scropt wrapper for this command line would clean things up and make a useful program.

Phil Hughes, Group Publisher of SSC, likes to get his hands dirty every now and then. But, you won't find him driving a car with an automatic transmission or using Emacs.


Unless otherwise mentioned, this work copyright © 2003-2004 by SSC, Inc. All rights reserved.

 

Phil Hughes is the publisher of Linux Journal, and thereby Linux Gazette. He dreams of permanently tele-commuting from his home on the Pacific coast of the Olympic Peninsula. As an employer, he is "Vicious, Evil, Mean, & Nasty, but kind of mellow" as a boss should be.


Copyright © 2004, Phil Hughes. Copying license http://www.linuxgazette.com/copying.html
Published in Issue 98 of Linux Gazette, January 2004

LINUX GAZETTE
...making Linux just a little more fun!
New Life for troff
By Phil Hughes

New Life for troff



VizagInfo.com Home | VizagInfo Mirror of LG


New Life for troff

I may be crazy because I still like working in troff but even for those of you who aren't crazy, here is something that will likely make you see troff in a new light.

Before I get carried away, let me get those unfamiliar with troff up to speed. troff is a program that was developed at AT&T Bell Labs that really made UNIX and, thus, Linux possible. UNIX, like Linux, started as a hobby project. But, back in 1970 you didn't go to the local supermarket and buy a computer to run UNIX on. You actually needed someone with a house-sized chunk of change to even think about running a UNIX system.

While UNIX was fun for a while, to have a future it needed to actually do something useful for the company that was paying that house-sized chunk of change. It turns out that troff was the magic application.

At Bell Labs, like virtually everywhere, phototypesetting was done by someone sitting down at a keyboard of a typesetter and, well, typing. The output was film or photographic paper and changes were usually made through careful use of an Xacto knife. There had to be a better way. It turned out the better way was a UNIX system, troff and the Graphic Systems CAT phototypesetter.

For most of us with a laser printer next to us, this sounds pretty obvious but you couldn't buy a laser printer at the drugstore either in those days. This system consisted of a slow input device such as a ten character per second teletype, a computer running a text editor which allowed you to enter text with some basic markup commands, another program that would read the markup and produce what the typesetter needed to see and, finally, a phototpyesetter that talked to the computer.

The computer was a PDP-11, the editor was ed and the program to drive the phototypesetter was troff. The CAT phototypesetter was specifically designed to talk to this PDP-11/UNIX/troff combo. It's only input method was an RS-232 cable.

Enough Background--What's New?

Over the years troff has evolved. It's two-character commands have been expanded, its ability to support more than four fonts as the same time is long gone (that was a limitation of the CAT--the fonts were on film strips) and its ability to produce output for different devices has grown. The most common output format for years has been PostScript. If you have a PostScript printer you can output to it directly. If not, you can use GhostScript to perform the translation.

The problem is, with almost everything getting published on the Web, having information in PostScript is not the real answer. You need HTML. Well, troff now supports HTML as an output format.

Is this a big deal? Well, to start with, all the manual pages for the commands on your Linux system are written in troff using the man macros. If you want one of those pages in HTML all you need to do is run groff (the troff frontend program) and tell it you want HTML output. So, there are the first few thousand reasons. There are more.

Many books have been written in troff including all that work done at Bell Labs long ago. Many companies that relied on UNIX systems internally also did internal documentation using troff. And, well, for those of us who are still crazy, writing in troff isn't that bad.

Ok, How Do I Use It?

A good place to start would be to test it out on a man page. Generally man pages are stored in subdirectories of /usr/share/man in a compressed format. The subdirectory man1 will have all the man pages for commands. Try:

  cd /usr/share/man/man1
  ls
It is likely you will see a huge list of files with names such as ln.1.gz. This particular file is the man page for the ln command (the 1 indicates section one, commands) and the gz indicates that it is compressed. The good news is that we don't have to save the decompressed version to work with it as groff will read from standard input. Try
  zcat ln.1.gz | groff -man -Thtml > /tmp/ln.html

If all goes well, you will have the HTML version of the ln man page in the file /tmp/ln.html. Point your browser at it and take a look.

Let me explain the pieces of the above command line.

What Does troff Input Look Like

If you got this far you must think there is something useful going on with troff. So, let's take a quick look at what the input looks like. Because the above example uses the man macro package, it is not really an easy starting point. So, instead, here is a very basic troff program to show the basic concepts. pre> .sp .5i .po .5i .ft HB .ps 24 .ce 1 Simple Test File .sp .2i .ps 11 .vs 13 .ft R This is the beginning of some simple text. As troff defaults to filling lines, a sentence per line makes editing easier. This all ends up in a \fIparagraph\fP will automatically filled and justified lines. .sp The .sp command can be used to create a blank line. With no argument, the value of the vertical spacing (.vs) is used.

As you can see, troff commands start with a dot and are two letters long. (Longer command names are supported in newer versions.) Here is what is happening:

If you saved this file in test1, you can see the output by running the following command:

  groff test1 | gv -

Macro Packages

As you can see above, there is a lot of control but it requires a lot of obscure commands. If you write a lot of documents in the same basic format, you can get pretty sick of setting page offsets and font sizes. You may also want to change to indented paragraphs, have footnotes and create a table of contents. That is where macro packages come in.

ou can think of the basic troff engine as working like one of those old Etch-a-Sketch kids toys with the addition of having fonts defined. By adding macro packages you can predefine a while set of operations at the functional level. For example, a paragraph. Once this is done, you only need to change how the macro responds rather than having to change all places where you inserted some troff commands (such as the .sp above) to change the document format.

It is not my intent to explain how all this works here, just to let you know the capabilities exist. The common macro packages are:

Today, you are most likely to see man used for formatting manual pages and mm for more general use. There is, however, nothing that says you cannot develop your own macro packages. SSC has used two locally-developed packages for close to twenty years.

The first of those packages was developed to produce SSC Pocket Reference Cards. These cards have 3.5 x 8 inch pages. Each page consists of titled boxes of command information and text using up to five colors. The macro package used here handles drawing the boxes, the colors and outputting two of these small pages on one one physical page. One side benefit is that by having two different sets of macros, proofing can be done on a color printer and then the color-separated output for the printer can be produced without the need for any additional programs or changes to the actual document.

The other set of macros was developed for classroom teaching. Again, the capability of producing two different outputs by changing the set of macros used is exploited. The complete document includes large-format text plus small-format discussions. This means the student notebooks can contain a lot of explanatory text without cluttering up the slides used in the classroom.

What Is a Pre-Processor?

Earlier I said that troff works like an Etch-a-Sketch. That is, you have a workspace to write on rather than a line-by-line output device. While it is quite common to just work line-by-line, this means you can draw by moving back on the page. The troff preprocessors exploit this capability.

The most popular preprocessor is tbl which, as you might expect, is used to generate tables. It is very easy to use and allows tight control over table presentation including boxes, titles and flowed text in boxes. esides more control, every time I write a table in HTML, a remember how easy it used to be in tbl.

A less command but very powerful preprocessor is pic. pic allows you to draw pictures. A better explanation is boxes, circles, arrows and such. In other words, diagrams.

Conclusion

Hopefully, this article has given you an idea what troff is and what it can do. If all you need to do is convert current troff documents into HTML, you should have enough information to get on with the task. On the other hand, if you see a use beyond conversion, there is a lot more to learn. If this is the case, you are welcome to add a comment suggesting what else you would like to hear about.

Phil Hughes is Group Publisher of SSC's publications. He lives in Costa Rica where the telemarketers only speak Spanish.


Unless otherwise mentioned, this work copyright © 2003-2004 by SSC, Inc. All rights reserved.

 

Phil Hughes is the publisher of Linux Journal, and thereby Linux Gazette. He dreams of permanently tele-commuting from his home on the Pacific coast of the Olympic Peninsula. As an employer, he is "Vicious, Evil, Mean, & Nasty, but kind of mellow" as a boss should be.


Copyright © 2004, Phil Hughes. Copying license http://www.linuxgazette.com/copying.html
Published in Issue 98 of Linux Gazette, January 2004

LINUX GAZETTE
...making Linux just a little more fun!
An Anti-Spam Solution for Your Home or Office Network
By Sandro Mangovski

An Anti-Spam Solution for Your Home or Office Network



VizagInfo.com Home | VizagInfo Mirror of LG


An Anti-Spam Solution for Your Home or Office Network

When I started using the Internet as a regular user, setting up e-mail was as easy as configuring a client to get your messages from the pop server. Later I became a sysadmin and the story changed, not only for me, but for my users too. Only a few years ago when someone was abusing your e-mail address, you just complained to his ISP and the story was over. But today users get tons of spam and all that they can do is complain. To whom? To us, of course, the sysadmins. So, recently I told myself that I would do something about it, and started researching the subject on the Internet. I found a lot of great GPL software and made a choice of what to use, but there was still a lot of configuring to do. A few mornings, some documentation, and some of my creativity was enough to come up with solution that satisfies the great majority of my users. Of course, there will always be some who hate the computer staff, and just want to bring their misery to their sysadmin; but we shall resist!

So here is my work, step-by-step so you can follow it. I hope it will help you to build your own anti-spam system, or just give you the guidelines of how to make your own idea come to life.

When I started working on this problem I decided to build a solution from scratch and reinstall or replace most of software which I had been using before. So I picked the new tools: Exim MTA, SpamAssassin, Anomy mail sanitizer, and vm-pop3d. I downloaded all the software and started building it, which is your first step also.

Step 1.

Exim MTA: You download Exim .tar.gz package from www.exim.org and unpack it to some directory. At the time of writing this document, the latest Exim version is 4.24. Cd to the directory which has been created (for example, exim-4.24). Next, you copy the file src/EDITME to Local/Makefile, but before that you need to fill in some information to src/EDITME. The following is the least you should set up:

BIN_DIRECTORY=/usr/exim/bin
CONFIGURE_FILE=/usr/exim/configure
EXIM_USER=eximusr

I created this user just for Exim and I suggest you do this also. Of course, the username doesn't have to be the same.

I also set up FIXED_NEVER_USERS=root for security precautions. This file is also very well commented, so if you need some other options, it is not hard to find out how to set it up; but this configuration should do just fine for your office network.

If you are going to build the Exim monitor, a similar configuration process is required. The file exim_monitor/EDITME must be edited appropriately for your installation and saved under the name Local/eximon.conf.

If you are happy with the default settings described in exim_monitor/EDITME, then Local/eximon.conf can be empty, but it must exist.

After the preinstall configuration you'll need to compile the software. Commands make and make install should do the trick. After this, you do some post install configuration and you'll be almost done. Open the file /usr/exim/configure with your favorite editor and change

domainlist local_domains = @

to

 domainlist local_domains = @ : localhost : foo.bar

or some other domain you want exim to deliver to locally. Configuring Exim with virtual domains is beyond the scope of this document, but still we'll touch on it at the end of this document. What you need do next is to get Exim to run whenever the computer boots up. I prefer doing this from inetd. To do this, add the following line to /etc/inetd.conf:

smtp stream tcp nowait eximusr /usr/sbin/tcpd /usr/exim/bin/exim -bs

Here eximusr is the user which was set in the EXIM_USER variable in src/EDITME

Now restart inetd and telnet to your machine on port 25. You should get a line like this:

220 enterprise ESMTP Exim 4.24 Fri, 28 Nov 2003 20:03:32 +0100

which indicates that all went Ok :) This is all you have to do with exim for now.

Anomy mail sanitizer: Unpack the .tar.gz package from mailtools.anomy.net in some directory (e.g. /usr/local is mine) and cd to anomy/bin. Then run ./sanitizer.pl. You will probably get some error message but that is ok. Probably the error message will be something about missing a Perl module, and we will come back to it later. If you do get any errors, leave it like that for the moment and read on.

SpamAssassin: Unpack the .tar.gz package and cd to the newly created directory. There are two ways of installation. The first one is shorter and saves you some minor difficulties. Type following set of commands in the shell as root:

perl -MCPAN -e shell
o conf prerequisites_policy ask
install Mail::SpamAssasin
quit

This way installs SpamAssasin using module CPAN. If this is your first time using that module, after the first command you will be asked a series of questions for CPAN configuration.

This is the second way:

perl Makefile.PL
make
make install

When you run spamc or spamd you may get the same problem as with anomy. Don't worry, just read on. Now we will explain the missing module error. If your error message is something like missing module, for example HTML/Test.pm, then install it using MCPAN; or go to the www.cpan.org and search for the module HTML::Test and download it. If you need to install the Perl module, it is not very difficult. Unpack the .tar.gz package and cd to the newly created directory. Type the following set of commands as root:

perl Makefile.PL
make
make test
make install

Now since you know how to install Perl modules, you can also fix that anomy missing modules errors. When installing modules you may have same error as with SpamAssasin or anomy because Perl modules may need other modules to work. So once again go to www.cpan.org and start over until you resolve all requirements. For example, on my system I needed to install the following modules for both anomy and SpamAssasin: HTML::Parser, HTML::Tree, HTML::Tagset, HTML::Sanitizer, MIME::Base64, Test::Harness, Test::Simple, Digest::MD5 and Devel::CoreStack. Unfortunately, I don't remember which modules were prerequisites of the others, so you will have to read README files or get error messages until everything is installed.

There is one more thing to do with SpamAssasin. Since spamc is just client for spamd we need to make sure that spamd is running when mail is passed through spamc. Just add spamd --daemonize to your init script.

vm-pop3d: Unpack .tar.gz source from www.reedmedia.net and cd into the newly created directory (Do you see a pattern here?).Type following commands as root:

./configure
make
make install

Now we have to make vm-pop3d running when computer starts.Add this line to your /etc/inetd.conf file:

pop3 stream tcp nowait root /usr/sbin/tcpd /usr/local/sbin/vm-pop3d

Restart inetd and telnet to localhost on port 110. A line similar to this:

+OK POP3 Welcome to vm-pop3d 1.1.6 <14665.1070049711@enterprise> means you are at the end of Step 1.

If you had any problems with building which I didn't describe browse through the documentation or ask on Usenet or try to figure a way yourself like I had to do in the following step, which is configuring of all the sotfware we built.

Step 2.

A little introduction and a small request from me. Read the Exim documentation parts about mail delivery, routers, transports just to have some background before we start working. I'm still going to write something about it here. So, when Exim receives mail it goes from router to router until it is accepted. When a message is accepted, the router calls its transport to handle the message. If the message isn't delivered after it went through transport, it goes once again through more routers until it is accepted and delivered, or until an “undeliverable” error message is generated. That is the shortened version of the story. So, if you read carefully you might have concluded that the order of the transports which are listed in the configuration file of exim is irrelevant, but the order of routers is important.

Now you have to get your hands dirty. Open /usr/exim/configure with your favorite editor and add this in routers section before routers who handle local delivery (after the dnslookup router).

# MAIL SCAN ROUTER
mail_scan_router:
no_verify
check_local_user
condition = "${if !eq{$header_X-I-came-to:}{scary devil's monastery}{1}}"
driver = accept
transport = mail_scan_transport

This router will be run only if message doesn't contain X-I-came-to:scary devil's monastery header, or by other words only when it first arrives. With that condition we disabled the router loop that would have been created without it. Now we have to add the transport which this router calls when the condition is met. So add this anywhere in the transports section (remember, the order of transports is irrelevant).

#MAIL SCAN TRANSPORT
mail_scan_transport:
driver = pipe
command = /usr/exim/bin/exim -bS
use_bsmtp = true
transport_filter = /usr/exim/mail.sh
home_directory = "/tmp"
current_directory = "/tmp"
user = mail
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =
headers_add = X-I-came-to: scary devil's monastery

This transport handles the message through transport filter and adds an X header, which in combination with a condition in the router disables any infinite filtering/spamchecking loops.

Now let's write mail.sh. This is script which enables us to run both anomy sanitizer and SpamAssasin within single transport. It goes like this:

#!/bin/bash
cd /usr/local/anomy/bin
./sanitizer.pl | /usr/local/bin/spamc
cd /tmp
#end of script.

Maybe all this cd-ing seems strange, but I had some errors when running anomy from outside of its directory. Anyway, copy this code into a file. Save it in /usr/exim and make sure it has permissions which enable user mail to run it. For example my permissions look like this:

-rwxr-xr-x root root.

Now a little bit more on the Exim configuration. When SpamAssassin scans a message it adds X-Spam-Status header to it. We will use that to check whether the message is spam and to decide where it should be delivered to. Add this just before local_delivery router (remember that routers order is relevant)in your Exim configuration file.

#SPAM DELIVERY ROUTER
spam_delivery_router:
condition="${if eq {${length_3:$header_X-Spam-Status:}}{Yes}{1}}"
check_local_user
driver=accept
transport = spam_delivery

So, if first 3 characters of X-Spam-Status: header are Yes that means that the message is spam and we will use spam_delivery transport. Otherwise message goes to normal local delivery. Now, add this to transports section of configuration file:

spam_delivery:
driver = appendfile
create_directory
file=/var/spool/virtual/spam.foo/$local_part
delivery_date_add
envelope_to_add
return_path_add
group = mail
mode = 0660

This means that, for example, messages for local user sandro ( sandro@localhost) are delivered to /var/spool/virtual/spam.foo/sandro. Make sure that directories virtual and spam.foo look like this when you do ls -l in their parent directories:

drwxrwsrwx 3 mail mail 4096 Stu 27 19:05 virtual
drwxrwxrwx 2 mail mail 4096 Stu 28 21:08 spam.foo

Of course I don't need to remind you to restart inetd after these changes.

Now you can see what these two dirs are for. They are for delivering spam to local users. Yes, we will create virtual domain spam.foo for vm-pop3d (pop3 daemon if you haven't got it by now) so our users will be able to read their spam. Why, you ask? Because in my case many users complain about missing newsletters, commercials etc. Mail in /var/spool/virtual/spam.foo in my case gets deleted weekly (a simple script in cron.weekly) because system's resources are limited and we wouldn't want to waste them on spam more than we have to, do we?

Ok, now to configure vm-pop3d. We don't need to do anything for local users' “real” mails, but we need to do for spam. Each local user will get an account for spam.foo virtual domain. MUA configuration will then be slightly different than for the “real” mailbox. For example if a user has the local username vms then his username for spam mailbox is vms@spam.foo or vms:spam.foo. Of course passwords don't have to be the same for these two mailboxes. Note that here we have a similar “as it can be” concept like Yahoo(tm) Bulk mail folder

Now, let's create that spam accounts. Create directory /etc/virtual which needs to look like this:

drwxr-xr-x 3 root root 4096 Stu 25 21:22 virtual.

It is not critical that the permissions are exactly the same as here, but vm-pop3d must be able to read the directory. So, if you don't like these, play with them and come up with some other combination. I just say this because I don't want you to get the impression that my way is the only right way.

Now under that directory create directory spam.foo with same permissions. That directory will contain passwd file for our virtual domain. We will create that file with a Perl script which i got from vm-pop3d authors' website. Here is that script:

#!/usr/bin/perl
$name = $ARGV[0];
@salt_chars = ('a'..'z','A'..'Z','0'..'9');
$salt = $salt_chars[rand(62)] . $salt_chars[rand(62)];
$passwd = crypt ($ARGV[1], $salt);
print "$name:$passwd\n";

Now save this script in a file and make sure it is executable by root, or whoever is mail admin.

Script is used in a way ./script_file_name username password >> /path_to_passwd/passwd. For example:

enterprise:/etc/virtual# pop3passwd mosor uncrackable >> spam.foo/passwd

Now you can add a username for spam.foo domain for all your local users, and the vm-pop3d configuration is finished.

What comes next is some SpamAssasin fine tuning. Open /etc/mail/spamassassin/local.cf and change report_safe 1 to report_safe 0.

If this option is set to 1, if an incoming message is tagged as spam, instead of modifying the original message, SpamAssassin will create a new report message and attach the original message as a message/rfc822 MIME part (ensuring the original message is completely preserved, not easily opened, and easier to recover. If this option is set to 2, then original messages will be attached with a content type of text/plain instead of message/rfc822. This setting may be required for safety reasons on certain broken mail clients that automatically load attachments without any action by the user. This setting may also make it somewhat more difficult to extract or view the original message.If this option is set to 0, incoming spam is only modified by adding some "X-Spam-" headers.

Another thing which is important when working with spam is efficient learning technique. Man page is the best resource for SpamAssasin learning strategy (I urge you to read it, and all the references inside), but it all comes down to this: the more the filter learns, the better it gets. The technical side of learning is this:

sa-learn --spam path_to/message_file

or

sa-learn --ham path_to/message_file.

Of course --spam is for spam mails and --ham for mails which are not spam. It is equally important to let filter learn both spam and ham mails.

And that is it. You have a working anti-spam system, congratulations.

There are a few more things. If you don't have a permanent Internet connection, fetchmail is the way you'll retrieve your mails. So let's configure fetchmail. Unpack .tar.gz package and cd to the newly created directory (Now I am even boring myself). Type as root:

./configure
make
make install

Now you need to configure fetching mail for the users. Each user needs to have .fetchmailrc file in his home directory. A simple .fetchmailrc file looks like