...making Linux just a little more fun!
By Phil Hughes
Before I get carried away, let me get those unfamiliar with troff up to speed. troff is a program that was developed at AT&T Bell Labs that really made UNIX and, thus, Linux possible. UNIX, like Linux, started as a hobby project. But, back in 1970 you didn't go to the local supermarket and buy a computer to run UNIX on. You actually needed someone with a house-sized chunk of change to even think about running a UNIX system.
While UNIX was fun for a while, to have a future it needed to actually do something useful for the company that was paying that house-sized chunk of change. It turns out that troff was the magic application.
At Bell Labs, like virtually everywhere, phototypesetting was done by someone sitting down at a keyboard of a typesetter and, well, typing. The output was film or photographic paper and changes were usually made through careful use of an Xacto knife. There had to be a better way. It turned out the better way was a UNIX system, troff and the Graphic Systems CAT phototypesetter.
For most of us with a laser printer next to us, this sounds pretty obvious but you couldn't buy a laser printer at the drugstore either in those days. This system consisted of a slow input device such as a ten character per second teletype, a computer running a text editor which allowed you to enter text with some basic markup commands, another program that would read the markup and produce what the typesetter needed to see and, finally, a phototpyesetter that talked to the computer.
The computer was a PDP-11, the editor was ed and the program to drive the phototypesetter was troff. The CAT phototypesetter was specifically designed to talk to this PDP-11/UNIX/troff combo. It's only input method was an RS-232 cable.
Over the years troff has evolved. It's two-character commands have been expanded, its ability to support more than four fonts as the same time is long gone (that was a limitation of the CAT--the fonts were on film strips) and its ability to produce output for different devices has grown. The most common output format for years has been PostScript. If you have a PostScript printer you can output to it directly. If not, you can use GhostScript to perform the translation.
The problem is, with almost everything getting published on the Web, having information in PostScript is not the real answer. You need HTML. Well, troff now supports HTML as an output format.
Is this a big deal? Well, to start with, all the manual pages for the commands on your Linux system are written in troff using the man macros. If you want one of those pages in HTML all you need to do is run groff (the troff frontend program) and tell it you want HTML output. So, there are the first few thousand reasons. There are more.
Many books have been written in troff including all that work done at Bell Labs long ago. Many companies that relied on UNIX systems internally also did internal documentation using troff. And, well, for those of us who are still crazy, writing in troff isn't that bad.
A good place to start would be to test it out on a man page. Generally man pages are stored in subdirectories of /usr/share/man in a compressed format. The subdirectory man1 will have all the man pages for commands. Try:
cd /usr/share/man/man1 lsIt is likely you will see a huge list of files with names such as ln.1.gz. This particular file is the man page for the ln command (the 1 indicates section one, commands) and the gz indicates that it is compressed. The good news is that we don't have to save the decompressed version to work with it as groff will read from standard input. Try
zcat ln.1.gz | groff -man -Thtml > /tmp/ln.html
If all goes well, you will have the HTML version of the ln man page in the file /tmp/ln.html. Point your browser at it and take a look.
Let me explain the pieces of the above command line.
If you got this far you must think there is something useful going on with troff. So, let's take a quick look at what the input looks like. Because the above example uses the man macro package, it is not really an easy starting point. So, instead, here is a very basic troff program to show the basic concepts. pre> .sp .5i .po .5i .ft HB .ps 24 .ce 1 Simple Test File .sp .2i .ps 11 .vs 13 .ft R This is the beginning of some simple text. As troff defaults to filling lines, a sentence per line makes editing easier. This all ends up in a \fIparagraph\fP will automatically filled and justified lines. .sp The .sp command can be used to create a blank line. With no argument, the value of the vertical spacing (.vs) is used.
As you can see, troff commands start with a dot and are two letters long. (Longer command names are supported in newer versions.) Here is what is happening:
If you saved this file in test1, you can see the output by running the following command:
groff test1 | gv -
As you can see above, there is a lot of control but it requires a lot of obscure commands. If you write a lot of documents in the same basic format, you can get pretty sick of setting page offsets and font sizes. You may also want to change to indented paragraphs, have footnotes and create a table of contents. That is where macro packages come in.
ou can think of the basic troff engine as working like one of those old Etch-a-Sketch kids toys with the addition of having fonts defined. By adding macro packages you can predefine a while set of operations at the functional level. For example, a paragraph. Once this is done, you only need to change how the macro responds rather than having to change all places where you inserted some troff commands (such as the .sp above) to change the document format.
It is not my intent to explain how all this works here, just to let you know the capabilities exist. The common macro packages are:
Today, you are most likely to see man used for formatting manual pages and mm for more general use. There is, however, nothing that says you cannot develop your own macro packages. SSC has used two locally-developed packages for close to twenty years.
The first of those packages was developed to produce SSC Pocket Reference Cards. These cards have 3.5 x 8 inch pages. Each page consists of titled boxes of command information and text using up to five colors. The macro package used here handles drawing the boxes, the colors and outputting two of these small pages on one one physical page. One side benefit is that by having two different sets of macros, proofing can be done on a color printer and then the color-separated output for the printer can be produced without the need for any additional programs or changes to the actual document.
The other set of macros was developed for classroom teaching. Again, the capability of producing two different outputs by changing the set of macros used is exploited. The complete document includes large-format text plus small-format discussions. This means the student notebooks can contain a lot of explanatory text without cluttering up the slides used in the classroom.
Earlier I said that troff works like an Etch-a-Sketch. That is, you have a workspace to write on rather than a line-by-line output device. While it is quite common to just work line-by-line, this means you can draw by moving back on the page. The troff preprocessors exploit this capability.
The most popular preprocessor is tbl which, as you might expect, is used to generate tables. It is very easy to use and allows tight control over table presentation including boxes, titles and flowed text in boxes. esides more control, every time I write a table in HTML, a remember how easy it used to be in tbl.
A less command but very powerful preprocessor is pic. pic allows you to draw pictures. A better explanation is boxes, circles, arrows and such. In other words, diagrams.
Hopefully, this article has given you an idea what troff is and what it can do. If all you need to do is convert current troff documents into HTML, you should have enough information to get on with the task. On the other hand, if you see a use beyond conversion, there is a lot more to learn. If this is the case, you are welcome to add a comment suggesting what else you would like to hear about.
Phil Hughes is Group Publisher of SSC's publications. He lives in Costa Rica where the telemarketers only speak Spanish.
Phil Hughes is the publisher of Linux Journal, and thereby Linux Gazette. He dreams of permanently tele-commuting from his home on the Pacific coast of the Olympic Peninsula. As an employer, he is "Vicious, Evil, Mean, & Nasty, but kind of mellow" as a boss should be.