1.pn 0
2.ls1
3.EQ
4delim $$
5.EN
6.ev1
7.ps-2
8.vs-2
9.ev
10\&
11.sp 10
12.ps+4
13.ce
14COMPUTER (IN)SECURITY \(em
15.sp
16.ce
17INFILTRATING OPEN SYSTEMS
18.ps-4
19.sp4
20.ce
21Ian H. Witten
22.sp2
23.ce4
24Department of Computer Science
25The University of Calgary
262500 University Drive NW
27Calgary, Canada T2N 1N4
28.sp2
29.ce2
30November 1986
31Revised March 1987
32.bp 1
33.ls 2
34.pp
35Shared computer systems today are astonishingly insecure.
36And users, on the whole, are blithely unaware of the weaknesses of the
37systems in which they place \(em or rather, misplace \(em their trust.
38Taken literally, of course, it is meaningless to ``trust'' a computer system
39as such, for machines are neither trustworthy nor untrustworthy;
40these are human qualities.
41In trusting a system one is effectively trusting all those who create and
42alter it, in other words, all who have access (whether licit or
43illicit).
44Security is a fundamentally \fIhuman\fP issue.
45.pp
46This article aims not to solve security problems but to raise reader
47consciousness
48of the multifarious cunning ways that systems can be infiltrated, and the
49subtle but devastating damage that an unscrupulous infiltrator can wreak.
50It is comforting, but highly misleading, to imagine that technical means of
51enforcing security have guaranteed that the systems we use are safe.
52It is true that in recent years some ingenious procedures have been invented
53to preserve security.
54For example, the advent of ``one-way functions'' (explained below) has
55allowed the password file, once a computer system's central stronghold, to be
56safely exposed to casual inspection by all and sundry.
57But despite these innovations, astonishing loopholes exist in practice.
58.pp
59There are manifest advantages in ensuring security by technical means rather
60than by keeping things secret.
61Not only do secrets leak, but as individuals change projects,
62join or leave the organization, become promoted and so on, they need to learn
63new secrets and forget old ones.
64With physical locks one can issue and withdraw keys to reflect changing
65security needs.
66But in computer systems, the keys constitute information which can be given
67out but not taken back, because no-one can force people to forget.
68In practice, such secrets require considerable administration to maintain
69properly.
70And in systems where security is maintained by tight control of information,
71.ul
72quis custodiet ipsos custodes
73\(em who will guard the guards themselves?
74.pp
75There is a wide range of simple insecurities that many
76systems suffer.
77These are, in the main, exacerbated in open systems where information and
78programs are shared among users \(em just those features that characterize
79pleasant and productive working environments.
80The saboteur's basic tool is the Trojan horse,
81a widely trusted program which has been surreptitiously modified to do
82bad things in secret.
83``Bad things'' range from minor but rankling irritations through theft of
84information to holding users to ransom.
85The inevitable fragilities of operating systems can
86be exploited by constructing programs which behave in some ways like primitive
87living organisms.
88Programs can be written which spread bugs like an epidemic.
89They hide in binary code, effectively undetectable (because nobody ever
90examines binaries).
91They can remain dormant for months or years, perhaps quietly and imperceptibly
92infiltrating their way into the very depths of a system, then suddenly pounce,
93causing irreversible catastrophe.
94A clever and subtle bug\(dg can survive
95recompilation despite the fact that there is no record of it in the source
96program.
97.FN
98\(dg Throughout this article the word ``bug'' is meant to bring to mind a
99concealed snooping device as in espionage, or a micro-organism carrying
100disease as in biology, rather than an inadvertent programming error.
101.EF
102This is the ultimate parasite.
103It cannot be detected because it lives only in binary code.
104And yet it cannot be wiped out by recompiling the source program!
105We might wonder whether these techniques, which this article develops
106and explains in the context of multi-user timesharing operating systems,
107pose any threats to computer networks or even stand-alone micros.
108.pp
109Although the potential has existed for decades, the possibility of the kind of
110``deviant'' software described here has been recognized only recently.
111Or has it?
112Probably some in the world of computer wizards and sorcerers have known for
113years how systems can be silently, subtly infiltrated \(em and concealed
114the information for fear that it might be misused (or for other reasons).
115But knowledge of the techniques is spreading nevertheless, and I believe it
116behooves us all \(em professionals and amateurs alike \(em to understand just
117how our continued successful use of computer systems hangs upon a thread of
118trust.
119Those who are ignorant of the possibilities of sabotage can easily be
120unknowingly duped by an unscrupulous infiltrator.
121.pp
122The moral is simple.
123Computer security is a human business.
124One way of maintaining security is to keep things secret, trusting people
125(the very people who can do you most harm) not to tell.
126The alternative is to open up the system and rely on technical means
127of ensuring security.
128But a system which is really ``open'' is also open to abuse.
129The more sharing and productive the environment, the more potential exists for
130damage.
131You have to trust your fellow users, and educate yourself.
132If mutual trust is the cornerstone of computer security, we'd better know it!
133.sh "The trend towards openness"
134.pp
135Many people believe that computer systems can maintain security not
136by keeping secrets but by clever technical mechanisms.
137Such devices include electronic locks and keys, and schemes for maintaining
138different sets of ``permissions'' or ``privileges'' for each user.
139The epitome of this trend towards open systems is the well-known \s-2UNIX\s+2
140operating system, whose developers, Dennis Ritchie and Ken Thompson, strove
141to design a clean, elegant piece of software that could be understood,
142maintained, and modified by users.
143(In 1983 they received the prestigious ACM Turing Award for their work.)  \c
144Ken Thompson has been one of the prime contributors to our knowledge of
145computer (in)security, and was responsible for much of the work described in
146this article.
147.pp
148The most obvious sense in which the \s-2UNIX\s+2 system
149is ``open'' is illustrated by looking at its password file.
150Yes, there is nothing to stop you from looking at this file!
151Each registered user has a line in it, and Figure\ 1 shows mine.
152It won't help you to impersonate me, however, because what it shows in the
153password field is not my password but a scrambled version of it.
154There is a program which computes encrypted passwords from plain ones, and
155that is how the system checks my identity when I log in.
156But the program doesn't work in reverse \(em it's what is called a ``one-way
157function'' (see Panel\ 1).
158It is effectively impossible to find the plain version from the encrypted one,
159even if you know exactly what the encryption procedure does and try to work
160carefully backward through it.
161\fINobody\fR can recover my plain password from the information stored in the
162computer.
163If I forget it, not even the system manager can find out what it is.
164The best that can be done is to reset my password to some standard one, so
165that I can log in and change it to a new secret password.
166(Needless to say this creates a window of opportunity for an imposter.)  \c
167The system keeps no secrets.
168Only I do.
169.pp
170Before people knew about one-way functions, computer systems maintained a
171password file which gave everyone's plain password for the login procedure to
172consult.
173This was the prime target for anyone who tried to
174break security, and the bane of system managers because of the
175completely catastrophic nature of a leak.
176Systems which keep no secrets avoid an unnecessary Achilles heel.
177.pp
178Another sense in which \s-2UNIX\s+2 is ``open'' is the accessibility of its
179source code.
180The software, written in the language "C", has been distributed
181(to universities) in source form so that maintenance can be done locally.
182The computer science research community has enjoyed numerous benefits from
183this enlightened policy (one is that we can actually look at some of the
184security problems discussed in this article).
185Of course, in any other system there will inevitably be a large number of
186people who have or have had access to the source code \(em even though it may
187not be publicly accessible.
188Operating systems are highly complex pieces of technology, created by large
189teams of people.
190A determined infiltrator may well be able to gain illicit access to source
191code.
192Making it widely available has the very positive effect of bringing the
193problems out into the open and offering them up for public scrutiny.
194.pp
195Were it attainable, perfect secrecy would offer a high degree of security.
196Many people feel that technical innovations like one-way functions and
197open password files provide comparable protection.
198The aim of this article is to show that this is a dangerous misconception.
199In practice, security is often severely compromised by people who have
200intimate knowledge of the inner workings of the system \(em precisely the
201people you rely on to \fIprovide\fR the security.
202This does not cause problems in research laboratories because they are
203founded on mutual trust and support.
204But in commercial environments, it is vital to be aware of any limitations on
205security.
206We must face the fact that
207in a hostile and complex world, computer security is best preserved by
208maintaining secrecy.
209.sh "A pot-pourri of security problems"
210.pp
211Here are a few simple ways that security might be compromised.
212.rh "Guessing a particular user's password."
213Whether your password is stored in a secret file or encrypted by a one-way
214function first, it offers no protection if it can easily be guessed.
215This will be hard if it is chosen at random from a large enough set.
216But for a short sequence of characters from a restricted alphabet
217(like the lower-case letters), an imposter could easily try all possibilities.
218And in an open system which gives access to the password file and one-way
219function, this can be done mechanically, by a program!
220.pp
221In Figure\ 2, the number of different passwords is plotted against the length
222of the password, for several different sets of characters.
223For example, there are about ten million ($10 sup 7$) possibilities for a
2245-character password chosen from the lower-case letters.
225This may seem a lot, but if it takes 1\ msec to try each one, they can all be
226searched in about 3\ hours.
227If 5-character passwords are selected from the 62 alphanumerics, there
228are more than 100 times as many and the search would take over 10\ days.
229.pp
230To make matters worse, people have a strong propensity to choose as
231passwords such things as
232.LB
233.NP
234English words
235.NP
236English words spelled backwards
237.NP
238first names, last names, street names, city names
239.NP
240the above with initial upper-case letters
241.NP
242valid car license numbers
243.NP
244room numbers, social security numbers, telephone numbers, etc.
245.LE
246Of course, this isn't particularly surprising since passwords have to be
247mnemonic in order to be remembered!
248But it makes it easy for an enterprising imposter to gather a substantial
249collection of candidates (from dictionaries, mailing lists, etc) and search
250them for your password.
251At 1\ msec per possibility, it takes only 4\ minutes to search a 250,000-word
252commercial dictionary.
253.pp
254A study some years ago of a collection of actual passwords that people used to
255protect their accounts revealed the amazing breakdown reproduced in Figure\ 3.
256Most fell into one of the categories discussed, leaving less
257than 15% of passwords which were hard to guess.
258Where does your own password stand in the pie diagram?
259.rh "Finding any valid password."
260There is a big difference between finding a particular person's password and
261finding a valid password for any user.
262You could start searching through the candidates noted above until you found
263one which, when encrypted, matched one of the entries in the password file.
264That way you find the most vulnerable user, and there are almost certain to be
265some lazy and crazy enough to use easily-guessable passwords, four-letter
266words, or whatever.
267Hashing techniques make it almost as quick to check a candidate against a
268group of encrypted passwords as against a single one.
269.pp
270A technique called ``salting'' protects against this kind of attack.
271Whenever a user's password is initialized or changed, a small random number
272called the ``salt'' is generated (perhaps from the time of day).
273Not only is this combined with the password when it is encrypted, but as
274Figure\ 1 shows it is also stored in the password file for everyone to see.
275Every time someone claiming to be that user logs in, the salt is combined with
276the password offered before being encrypted and compared
277with whatever is stored in the password file.
278For example, say my password was ``w#xs27'' (it isn't!).
279If the salt is ``U6'' (as in Figure\ 1), the system will apply its one-way
280function to ``w#xs27U6'' to get the encrypted password.
281.pp
282Since all can see the salt, it is no harder for anyone to guess
283an individual user's password.
284One can salt guesses just as the system does.
285But it \fIis\fR harder to search a group of passwords, since the salt will be
286different for each, rendering it meaningless to compare a single encrypted
287password against all those in the group.
288Suppose you were checking to see if anyone had the password ``hello''.
289Without salting, you simply apply the one-way function to this word and
290compare the result with everyone's encrypted password.
291But with salting it's not so easy, since to see if my password is ``hello''
292you must encrypt ``helloU6'', and the salt is different for everyone.
293.rh "Forced-choice passwords."
294The trouble with letting users choose their own passwords is that they often
295make silly, easily-guessed, choices.
296Many systems attempt to force people to choose more ``random'' passwords, and
297force them to change their password regularly.
298All these attempts seem to be complete failures.
299The fundamental problem is that people have to be able to remember their
300passwords, because security is immediately compromised if they are written
301down.
302.pp
303There are many amusing anecdotes about how people thwart systems that attempt
304to dictate when they have to change their passwords.
305I had been using a new system for some weeks when it insisted that I change my
306password.
307Resenting it ordering me about, I gave my old password as the new one.
308But it was programmed to detect this ruse and promptly told me so.
309I complained to the user sitting beside me.
310``I know,'' she said sympathetically.
311``What I always do is change it to something else and then immediately
312change it back again!''  \c
313Another system remembered your last several passwords, and insisted on a
314once-a-month change.
315So people began to use the name of the current month as their password!
316.rh "Wiretaps."
317Obviously any kind of password protection can be thwarted by a physical
318wiretap.
319All one has to do is watch as you log in and make a note of your password.
320The only defense is encryption at the terminal.
321Even then you have to be careful to ensure that someone can't intercept
322your encrypted password and pose as you later on by sending this
323\fIencrypted\fR string to the computer \(em after all, this is what the
324computer sees when you log in legitimately!
325To counter this, the encryption can be made time-dependent so that the same
326password translates to different strings at different times.
327.pp
328Assuming that you, like 99.9% of the rest of us, don't go to the trouble of
329terminal encryption, when was the last time you checked the line between your
330office terminal and the computer for a physical wiretap?
331.rh "Search paths."
332We will see shortly that you place yourself completely at the mercy of other
333users whenever you execute their programs, and they
334can do some really nasty things like spreading infection to your files.
335However, you don't necessarily have to execute someone else's program overtly,
336for many systems make it easy to use other people's
337programs without even realizing it.
338This is usually a great advantage, for you can install programs so that you
339or others can invoke them just like ordinary system programs, thereby
340creating personalized environments.
341.pp
342Figure\ 4 shows part of the file hierarchy in our system.
343The whole hierarchy is immense \(em I alone have something like 1650 files,
344organized into 200 of my own directories under the ``ian'' node shown in the
345Figure, and there are hundreds of other users \(em and what is shown is just a
346very small fragment.
347Users can set up a ``search path'' which tells the system
348where to look for programs they invoke.
349For example, my search path includes the 6 places that are circled.
350Whenever I ask for a program to be executed, the system seeks it in these
351places.
352It also searches the ``current directory'' \(em the one where I happen to be
353at the time.
354.pp
355To make it more convenient for you to set up a good working environment, it
356is easy to put someone else's file directories on your search path.
357But then they can do arbitrary damage to you, sometimes completely
358accidentally.
359For example, I once installed a spreadsheet calculator called ``sc'' in one
360of my directories.
361Unknown to me, another user suddenly found that the Simula compiler stopped
362working and entered a curious mode where it cleared his VDT screen and wrote
363a few incomprehensible characters on it.
364There was quite a hiatus.
365The person who maintained the Simula compiler was away,
366but people could see no reason for the compiler to have been altered.
367Of course, told like this it is obvious that the user had my directory on his
368search path and I had created a name conflict with \fIsc\fR, the Simula
369compiler.
370But it was not obvious to the user, who rarely thought about the search path
371mechanism.
372And I never use the Simula compiler and had created the conflict in all
373innocence.
374Moreover, I didn't even know that other users had my directory on their search
375paths!
376This situation caused only frustration before the problem was diagnosed and
377fixed.
378But what if I were a bad guy who had created the new \fIsc\fR program to
379harbor a nasty bug (say one which deleted the hapless user's files)?
380.pp
381You don't necessarily have to put someone on your search path to run the
382risk of executing their programs accidentally.
383As noted above, the system (usually) checks your current working directory
384for the program first.
385Whenever you change your current workplace to another's directory, you
386might without realizing it begin to execute programs that had been
387planted there.
388.pp
389Suppose a hacker plants a program with the same name as a common
390utility program.
391How would you find out?
392The \s-2UNIX\s+2 \fIls\fR command lists all the files in a directory.
393Perhaps you could find imposters using \fIls\fR?  \(em Sorry.
394The hacker might have planted another program, called \fIls\fR, which
395simulated the real \fIls\fR exactly except that it lied about its own
396existence and that of the planted command!
397The \fIwhich\fR command tells you which version of a program you
398are using \(em whether it comes from the current directory, another user's
399directory, or a system directory.
400Surely this would tell you?  \(em Sorry.
401The hacker might have written another \fIwhich\fR which lied about itself,
402about \fIls\fR, and about the plant.
403.pp
404If you put someone else on your search path, or change into their directory,
405you're implicitly trusting them.
406You are completely at a user's mercy when you execute one of their programs,
407whether accidentally or on purpose.
408.rh "Programmable terminals."
409Things are even worse if you use a ``programmable'' terminal.
410Then, the computer can send a special sequence of characters to command the
411terminal to transmit a particular message whenever a particular key is struck.
412For example, on the terminal I am using to type this article, you could
413program the \s-2RETURN\s+2 key to transmit the message ``hello'' whenever it
414is pressed.
415All you need to do to accomplish this is to send my terminal the character
416sequence
417.LB
418\s-2ESCAPE\s+2 P ` + { H E L L O } \s-2ESCAPE\s+2
419.LE
420(\s-2ESCAPE\s+2 stands for the \s-2ASCII\s+2 escape character, decimal 27,
421which is invoked by a key labeled ``Esc''.)  \c
422This is a mysterious and ugly incantation, and I won't waste time
423explaining the syntax.
424But it has an extraordinary effect.
425Henceforth every time I hit the return key, my terminal will transmit the
426string ``hello'' instead of the normal \s-2RETURN\s+2 code.
427And when it receives this string, the computer I am connected to will try to
428execute a program called ``hello''!
429.pp
430This is a terrible source of insecurity.
431Someone could program my terminal so that it executed one of \fItheir\fR
432programs whenever I pressed \s-2RETURN\s+2.
433That program could reinstate the \s-2RETURN\s+2 code to make it
434appear afterwards as though nothing had happened.
435Before doing that, however, it could (for example) delete all my files.
436.pp
437The terminal can be reprogrammed just by sending it an ordinary character
438string.
439The string could be embedded in a file, so that the terminal would be bugged
440whenever I viewed the file.
441It might be in a seemingly innocuous message;
442simply reading mail could get me in trouble!
443It could even be part of a file \fIname\fR, so that the bug would appear
444whenever I listed a certain directory \(em not making it my current directory,
445as was discussed above, but just \fIinspecting\fR it.
446But I shouldn't say ``appear'', for that's exactly what it might not do.
447I may never know that anything untoward had occurred.
448.pp
449How can you be safe?
450The programming sequences for my terminal all start with \s-2ESCAPE\s+2,
451which is an \s-2ASCII\s+2 control character.
452Anyone using such a terminal should whenever possible work through a
453program that exposes control characters.
454By this I mean a program that monitors output from the computer and translates
455the escape code to something like the 5-character sequence ``<ESC>''.
456Then a raw \s-2ESCAPE\s+2 itself never gets sent to the terminal,
457so the reprogramming mechanism is never activated.
458.pp
459Not only should you avoid executing programs written by people you don't
460trust, but in extreme cases you should take the utmost care in \fIany\fR
461interaction with untrustworthy people \(em even reading their electronic
462mail.
463.sh "Trojan horses: getting under the skin"
464.pp
465The famous legend tells of a huge, hollow wooden horse filled with Greek
466soldiers which was left, ostensibly as a gift, at the gates of the city of
467Troy.
468When it was brought inside, the soldiers came out at night and
469opened the gates to the Greek army, which destroyed the city.
470To this day, something used to subvert an organization from within by abusing
471misplaced trust is called a Trojan horse.
472.pp
473In any computer system for which security is a concern, there must be things
474that need protecting.
475These invariably constitute some kind of information (since the computer is,
476at heart, an information processor), and such information invariably outlasts
477a single login session and is therefore stored in the computer's file system.
478Consequently the file system is the bastion to be kept secure, and will be
479the ultimate target of any invader.
480Some files contain secret information that not just anyone may read,
481others are vital to the operation of an organization and must at all costs
482be preserved from surreptitious modification or deletion.
483A rather different thing that must be protected is the ``identity'' of each
484user.
485False identity could be exploited by impersonating someone else in order to
486send mail.
487Ultimately, of course, this is the same as changing data in mailbox files.
488Conversely, since for each and every secret file \fIsomeone\fR must
489have permission to read and alter it, preserving file system security
490requires that identities be kept intact.
491.rh "What might a Trojan horse do?"
492The simplest kind of Trojan horse turns a common program like a text editor
493into a security threat by implanting code in it which secretly reads
494or alters files it is not intended to.
495An editor normally has access to all the user's
496files (otherwise they couldn't be altered).
497In other words, the program runs with the user's own privileges.
498A Trojan horse in it can do anything the user himself could do, including
499reading, writing, or deleting files.
500.pp
501It is easy to communicate stolen information back to the person who bugged
502the editor.
503Most blatantly, the access permission of a secret file could be changed so
504that anyone can read it.
505Alternatively the file could be copied temporarily to disk \(em most systems
506allocate scratch disk space for programs that need to create temporary working
507files \(em and given open access.
508Another program could continually check for it and, when
509it appeared, read and immediately delete it to destroy the trace.
510More subtle ways of communicating small amounts of information might be to
511rearrange disk blocks physically so that their addresses formed a code, or to
512signal with the run/idle status of the process to anyone who monitored the
513system's job queue.
514Clearly, any method of communication will be detectable by others \(em in
515theory.
516But so many things go on in a computer system that messages can easily be
517embedded in the humdrum noise of countless daily events.
518.pp
519Trojan horses don't necessarily do bad things.
520Some are harmless but annoying, created to meet a challenge rather than to
521steal secrets.
522One such bug, the ``cookie monster'', signals its presence by announcing
523to the unfortunate user ``I want a cookie''.
524Merely typing the word ``cookie'' will satiate the monster and cause it to
525disappear as though nothing had happened.
526But if the user ignores the request, although the monster appears to go
527away it returns some minutes later with ``I'm hungry; I really want a
528cookie''.
529As time passes the monster appears more and more frequently with increasingly
530insistent demands, until it makes a serious
531threat:  ``I'll remove some of your files if you don't give me a cookie''.
532At this point the poor user realizes that the danger is real and is
533effectively forced into appeasing the monster's appetite by supplying the word
534``cookie''.
535Although an amusing story to tell, it is not pleasant to imagine being
536intimidated by an inanimate computer program.
537.pp
538A more innocuous Trojan horse, installed by a system programmer to commemorate
539leaving her job, occasionally drew a little teddy-bear on the graph-plotter.
540This didn't happen often (roughly every tenth plot), and even when it did
541it occupied a remote corner of the paper, well outside the normal plotting
542area.
543But although they initially shared the joke, management soon ceased to
544appreciate the funny side and ordered the programmer's replacement to get rid
545of it.
546Unfortunately the bug was well disguised and many fruitless hours were spent
547seeking it in vain.
548Management grew more irate and the episode ended when the originator
549received a desperate phone-call from her replacement, whose job was by now at
550risk, begging her to divulge the secret!
551.rh "Installing a Trojan horse."
552The difficult part is installing the Trojan horse into a trusted program.
553System managers naturally take great care that only a few people get access
554to suitable host programs.
555If anyone outside the select circle of ``system people'' is ever given an
556opportunity to modify a commonly-used program like a text editor
557(for example, to add a new feature) all changes will be closely scrutinized by
558the system manager before being installed.
559Through such measures the integrity of system programs is preserved.
560Note, however, that constant vigilance is required, for once bugged, a system
561can remain compromised forever.
562The chances of a slip-up may be tiny, but the consequences are unlimited.
563.pp
564One good way of getting bugged code installed in the system is to write a
565popular utility program.
566As its user community grows, more and more people will copy the program into
567their disk areas so that they can use it easily.
568Eventually, if it is successful, the utility will be installed as a ``system''
569program.
570This will be done to save disk space \(em so that the users can delete their
571private versions \(em and perhaps also because the code can now be made
572``sharable'' in that several simultaneous users can all execute a single copy
573in main memory.
574As a system program the utility may inherit special privileges, and so be
575capable of more damage.
576It may also be distributed to other sites, spreading the Trojan horse far and
577wide.
578.pp
579Installing a bug in a system utility like a text editor puts anyone who uses
580that program at the mercy of whoever perpetrated the bug.
581But it doesn't allow that person to get in and do damage at any time, for
582nothing can be done to a user's files until that user invokes the bugged
583program.
584Some system programs, however, have a special privilege which allows them
585access to files belonging to \fIanyone\fR, not just the current user.
586We'll refer to this as the ``ultimate'' privilege, since nothing could be more
587powerful.
588An example of a program with the ultimate privilege is the \fIlogin\fR program
589which administers the logging in sequence, accepting the user name and
590password and creating an appropriate initial process.
591Although \s-2UNIX\s+2 \fIlogin\fR runs as a normal process, it must have the
592power to masquerade as any user since that is in effect the goal of the
593logging in procedure!
594From an infiltrator's point of view, this would be an excellent
595target for a Trojan horse.
596For example, it could be augmented to grant access automatically to any user
597who typed the special password ``trojanhorse'' (see Panel\ 2).
598Then the infiltrator could log in as anyone at any time.
599Naturally, any changes to \fIlogin\fR will be checked especially carefully
600by the system administrators.
601.pp
602Some other programs are equally vulnerable \(em but not many.
603Of several hundred utilities in \s-2UNIX\s+2, only around a dozen have the
604ultimate privilege that \fIlogin\fR enjoys.
605Among them are the \fImail\fR facility, the \fIpasswd\fR program which lets
606users change their passwords, \fIps\fR which examines the status of all
607processes in the system, \fIlquota\fR that enforces disk quotas, \fIdf\fR
608which shows how much of the disk is free, and so on.
609These specially-privileged programs are prime targets for Trojan horses since
610they allow access to any file in the system at any time.
611.rh "Bugs can lurk in compilers."
612Assuming infiltrators can never expect to be able to modify the source code of
613powerful programs like \fIlogin\fR, is there any way a bug can be planted
614indirectly?
615Yes, there is.
616Remember that it is the object code \(em the file containing executable
617machine instructions \(em that actually runs the logging in process.
618It is this that must be bugged.
619Altering the source code is only one way.
620The object file could perhaps be modified directly, but this is likely to be
621just as tightly guarded as the \fIlogin\fR source.
622More sophisticated is a modification to the compiler itself.
623A bug could try to recognize when it is \fIlogin\fR that is being compiled,
624and if so, insert a Trojan horse automatically into the compiled code.
625.pp
626Panel\ 3 shows the idea.
627The \s-2UNIX\s+2 \fIlogin\fR program is written in the C programming language.
628We need to modify the compiler so that it recognizes when it is compiling
629the \fIlogin\fR program.
630Only then will the bug take effect, so that all other compilations proceed
631exactly as usual.
632When \fIlogin\fR is recognized, an additional line is inserted into it by
633the compiler, at the correct place \(em so that exactly the same bug is
634planted as in Panel\ 2.
635But this time the bug is placed there by the compiler itself, and does not
636appear in the source of the \fIlogin\fR program.
637It is important to realize that nothing about this operation depends on the
638programming language used.
639All examples in this article could be redone using, say, Pascal.
640However, C has the advantage that it is actually used in a widespread
641operating system.
642.pp
643The true picture would be more complicated than this simple sketch.
644In practice, a Trojan horse would likely require several extra lines of code,
645not just one, and they would need to be inserted in the right place.
646Moreover, the code in Panel\ 3 relies on the \fIlogin\fR program being laid
647out in exactly the right way \(em in fact it assumes a rather unusual
648convention for positioning the line breaks.
649There would be extra complications if a more common layout style were used.
650But such details, although vital when installing a Trojan horse in practice,
651do not affect the principle of operation.
652.pp
653We have made two implicit assumptions that warrant examination.
654First, the infiltrator must know what the \fIlogin\fR program looks like in
655order to choose a suitable pattern from it.
656This is part of what we mean by ``open-ness''.
657Second, the bug would fail if the \fIlogin\fR program were altered so that the
658pattern no longer matched.
659This is certainly a real risk, though probably not a very big one in practice.
660For example, one could simply check for the text strings ``Login'' and
661``Password'' \(em it would be very unlikely that anything other than the
662\fIlogin\fR program would contain those strings, and also very unlikely that
663\fIlogin\fR would be altered so that it didn't.
664If one wished, more sophisticated means of program identification could be
665used.
666The problem of identifying programs from their structure despite superficial
667changes is of great practical interest in the context of detecting cheating
668in student programming assignments.
669There has been some research on the subject which could be exploited to make
670such bugs more reliable.
671.pp
672The Trojan horses we have discussed can all be detected quite easily by casual
673inspection of the source code.
674It is hard to see how such bugs could be hidden effectively.
675But with the compiler-installed bug, the \fIlogin\fR program is compromised
676even though its source is clean.
677In this case one must seek elsewhere \(em namely in the compiler \(em for the
678source of trouble, but it will be quite evident to anyone who glances in the
679right place.
680Whether such bugs are likely to be discovered is a moot point.
681In real life people simply don't go round regularly \(em or even irregularly
682\(em inspecting working code.
683.sh "Viruses: spreading infection like an epidemic"
684.pp
685The thought of a compiler planting Trojan horses into the
686object code it produces raises the specter of bugs being inserted into a large
687number of programs, not just one.
688And a compiler could certainly wreak a great deal of havoc, since it has
689access to a multitude of object programs.
690Consequently system programs like compilers, software libraries, and so on
691will be very well protected, and it will be hard to get a chance to bug them
692even though they don't possess the ultimate privilege themselves.
693But perhaps there are other ways of permeating bugs throughout a computer
694system?
695.pp
696Unfortunately, there are.
697The trick is to write a bug \(em a ``virus'' \(em that spreads itself like an
698infection from program to program.
699The most devastating infections are those that don't affect their carriers
700\(em at least not immediately \(em but allow them to continue to live normally
701and in ignorance of their disease, innocently infecting others while going
702about their daily business.
703People who are obviously sick aren't nearly so effective at spreading
704disease as those who appear quite healthy!
705In the same way a program A can corrupt another program B, silently,
706unobtrusively, in such a way that when B is invoked by an innocent and
707unsuspecting user it spreads the infection still further.
708.pp
709The neat thing about this, from the point of view of whoever plants the bug,
710is that infection can pass from programs written by one user to those written
711by another, and gradually permeate the whole system.
712Once it has gained a foothold it can clean up incriminating evidence
713which points to the originator, and continue to spread.
714Recall that whenever you execute a program written by another, you place
715yourself in their hands.
716For all you know the program you use may harbor a Trojan horse, designed to do
717something bad to you (like activate a cookie monster).
718Let us suppose that being aware of this, you are careful not to execute
719programs belonging to other users except those written by your closest and
720most trusted friends.
721Even though you hear of wonderful programs created by those outside
722your trusted circle, which could be very useful to you and save a great deal
723of time, you are strong-minded and deny yourself their use.
724But maybe your friends are not so circumspect.
725Perhaps one of them has invoked a hacker's bugged program, and unknowingly
726caught the disease.
727Some of your friend's own programs are infected.
728Fortunately, perhaps, they aren't the ones you happen to use.
729But day by day, as your friend works, the infection spreads throughout all his
730or her programs.
731And then you use one of them\ ...
732.rh "How viruses work."
733Surely this can't be possible!
734How can mere programs spread bugs from one to the other?
735Actually, it's very simple.
736Imagine.
737Take any useful program that others may want to execute, and modify it as
738follows.
739Add some code to the beginning, so that whenever it is executed, before
740entering its main function and unknown to the user, it acts as a ``virus''.
741In other words, it does the following.
742It searches the user's files for one which is
743.LB
744.NP
745an executable program (rather than, say, a text or data file)
746.NP
747writable by the user (so that they have permission to modify it)
748.NP
749not infected already.
750.LE
751Having found its victim, the virus ``infects'' the file.
752It simply does this by putting a piece of code at the beginning which makes
753that file a virus too!
754Panel\ 4 shows the idea.
755.pp
756Notice that, in the normal case, a program that you invoke can write or
757modify any files that \fIyou\fR are allowed to write or modify.
758It's not a matter of whether the program's author or owner can alter the
759files.
760It's the person who invoked the program.
761Evidently this must be so, for otherwise you couldn't use (say) editors
762created by other people to change your own files!
763Consequently the virus isn't confined to programs written by its perpetrator.
764As Figure\ 6 illustrates, people who use any infected program will have one of
765their own programs infected.
766Any time an afflicted program runs, it tries to pollute another.
767Once you become a carrier, the germ will eventually spread \(em slowly,
768perhaps \(em to all your programs.
769And anyone who uses one of your programs, even once, will get in trouble too.
770All this happens without you having an inkling that anything untoward is going
771on.
772.pp
773Would you ever find out?
774Well, if the virus took a long time to do its dirty work you might wonder why
775the computer was so slow.
776More likely than not you would silently curse management for passing up
777that last opportunity to upgrade the system, and forget it.
778The real giveaway is that file systems store a when-last-modified date with
779each file, and you may possibly notice that a program you thought you
780hadn't touched for years seemed suddenly to have been updated.
781But unless you're very security conscious, you'd probably never look at the
782file's date.
783Even if you did, you may well put it down to a mental aberration \(em or
784some inexplicable foible of the operating system.
785.pp
786You might very well notice, however, if all your files changed their
787last-written date to the same day!
788This is why the virus described above only infects one file at a time.
789Sabotage, like making love, is best done slowly.
790Probably the virus should lie low for a week or two after being installed in a
791file.
792(It could easily do this by checking its host's last-written date.)  \c
793Given time, a cautious virus will slowly but steadily spread throughout a
794computer system.
795A hasty one is much more likely to be discovered.
796(Richard Dawkins' fascinating book \fIThe selfish gene\fR gives a gripping
797account of the methods that Nature has evolved for self-preservation,
798which are far more subtle than the computer virus I have described.
799Perhaps this bodes ill for computer security in the future.)
800.pp
801So far, our virus sought merely to propagate itself, not to inflict damage.
802But presumably its perpetrator had some reason for planting it.
803Maybe they wanted to read a file belonging to some particular person.
804Whenever it woke up, the virus would check who had actually invoked the
805program it resided in.
806If it was the unfortunate victim \(em bingo, it would spring into action.
807Another reason for unleashing a virus is to disrupt the computer system.
808Again, this is best done slowly.
809The most effective disruption will be achieved by doing nothing at all for a
810few weeks or months other than just letting the virus spread.
811It could watch a certain place on disk for a signal to start doing damage.
812It might destroy information if its perpetrator's computer account had been
813deleted (say they had been rumbled and fired).
814Or the management might be held to ransom.
815Incidentally, the most devastating way of subverting a system is by destroying
816its files randomly, a little at a time.
817Erasing whole files may be more dramatic, but is not nearly so disruptive.
818Contemplate the effect of changing a random bit on the disk every day!
819.rh "Experience with a virus."
820Earlier I said ``Imagine''.
821No responsible computer professional would do such a thing as unleashing a
822virus.
823Computer security is not a joke.
824Moreover, a bug such as this could very easily get out of control and end up
825doing untold damage to every single user.
826.pp
827However, with the agreement of a friend that we would try to bug each other,
828I did once plant a virus.
829Long ago, like many others, he had put one of my file directories on his
830search path, for I keep lots of useful programs there.
831(It is a tribute to human trust \(em or foolishness? \(em that many users,
832including this friend, \fIstill\fP have my directory on their search paths,
833despite my professional interest in viruses!)  \c
834So it was easy for me to plant a modified version of the \fIls\fR command
835which lists file directories.
836My modification checked the name of the user who had invoked \fIls\fR, and if
837it was my friend, infected one of his files.
838Actually, because it was sloppily written and made the \fIls\fR command
839noticeably slower than usual, my friend twigged what was happening almost
840immediately.
841He aborted the \fIls\fR operation quickly, but not quickly enough, for the
842virus had already taken hold.
843Moreover I told him where the source code was that did the damage, and he was
844able to inspect it.
845Even so, 26 of his files had been infected (and a few of his graduate
846student's too) before he was able to halt the spreading epidemic.
847.pp
848Like a real virus this experimental one did nothing but reproduce itself at
849first.
850Whenever any infected program was invoked, it looked for a program in one
851of my directories and executed it first if it existed.
852Thus I was able to switch on the ``sabotage'' part whenever I wanted.
853But my sabotage program didn't do any damage.
854Most of the time it did nothing, but there was a 10% chance of it
855starting up a process which waited a random time up to 30 minutes and printed
856a rude message on my friend's VDT screen.
857As far as the computer was concerned, of course, this was \fIhis\fR process,
858not mine, so it was free to write on his terminal.
859He found this incredibly mysterious, partly because it didn't often happen,
860and partly because it happened long after he had invoked the program which
861caused it.
862It's impossible to fathom cause and effect when faced with randomness and long
863time delays.
864.pp
865In the end, my friend found the virus and wiped it out.
866(For safety's sake it kept a list of the files it had infected, so
867that we could be sure it had been completely eradicated.)  \c
868But to do so he had to study the source code I had written for the virus.
869If I had worked secretly he would have had very little chance of discovering
870what was going on before the whole system had become hopelessly infiltrated.
871.rh "Exorcising a virus."
872If you know there's a virus running around your computer system, how can you
873get rid of it?
874In principle, it's easy \(em
875simply recompile all programs that might conceivably have been infected.
876Of course you have to take care not to execute any infected programs in the
877meantime.
878If you do, the virus could attach itself to one of the programs you thought
879you had cleansed.
880If the compiler is infected the trouble is more serious, for the virus must be
881excised from it first.
882Removing a virus from a single program can be done by hand, editing the
883object code, if you understand exactly how the virus is written.
884.pp
885But is it really feasible to recompile all programs at the same time?
886It would certainly be a big undertaking, since all users of the system will
887probably be involved.
888Probably the only realistic way to go about it would be for the system
889manager to remove all object programs from the system, and leave it up to
890individual users to recreate their own.
891In any real-life system this would be a very major disruption, comparable
892to changing to a new, incompatible, version of the operating system \(em
893but without the benefits of ``progress''.
894.pp
895Another possible way to eliminate a virus, without having to delete all object
896programs, is to design an antibody.
897This would have to know about the exact structure of the virus, in order to
898disinfect programs that had been tainted.
899The antibody would act just like a virus itself, except that before attaching
900itself to any program it would remove any infection that already existed.
901Also, every time a disinfected program was run it would first check it
902hadn't been reinfected.
903Once the antibody had spread throughout the system, so that no object files
904remained which predated its release, it could remove itself.
905To do this, every time its host was executed the antibody would check a
906prearranged file for a signal that the virus had finally been purged.
907On seeing the signal, it would simply remove itself from the object file.
908.pp
909Will this procedure work?
910There is a further complication.
911Even when the antibody is attached to every executable file in the system,
912some files may still be tainted, having been infected since the antibody
913installed itself in the file.
914It is important that the antibody checks for this eventuality when finally
915removing itself from a file.
916But wait!  \(em when that object program was run the original virus would
917have got control first, before the antibody had a chance to destroy it.
918So now some other object program, from which the antibody has already removed
919itself, may be infected with the original virus.
920Oh no!
921Setting a virus to catch a virus is no easy matter.
922.sh "Surviving recompilation: the ultimate parasite"
923.pp
924Despite the devastation that Trojan horses and viruses can cause, neither is
925the perfect bug from an infiltrator's point of view.
926The trouble with a Trojan horse is that it can be seen in the source code.
927It would be quite evident to anyone who looked that something fishy was
928happening.
929Of course, the chances that anyone would be browsing through any particular
930piece of code in a large system are tiny, but it could happen.
931The trouble with a virus is that it although it lives in object code which
932hides it from inspection, it can be eradicated by recompiling affected
933programs.
934This would cause great disruption in a shared computer system, since no
935infected program may be executed until everything has been recompiled, but
936it's still possible.
937.pp
938How about a bug which both survives recompilation \fIand\fP lives in object
939code, with no trace in the source?
940Like a virus, it couldn't be spotted in source code, since it only
941occupies object programs.
942Like a Trojan horse planted by the compiler,
943it would be immune to recompilation.
944Surely it's not possible!
945.pp
946Astonishingly it is possible to create such a monster under any operating
947system whose base language is implemented in a way that has a special
948``self-referencing'' property described below.
949This includes the \s-2UNIX\s+2 system, as was pointed out in 1984 by
950Ken Thompson himself.
951The remainder of this section explains how this amazing feat can be
952accomplished.
953Suspend disbelief for a minute while I outline the gist of the idea (details
954will follow).
955.pp
956Panel\ 3 showed how a compiler can insert a bug into the \fIlogin\fR
957program whenever the latter is compiled.
958Once the bugged compiler is installed the bug can safely be removed from the
959compiler's source.
960It will still infest \fIlogin\fR every time that program is compiled, until
961someone recompiles the compiler itself, thereby removing the bug
962from the compiler's object code.
963Most modern compilers are written in the language they compile.
964For example, C compilers are written in the C language.
965Each new version of the compiler is compiled by the previous version.
966Using exactly the same technique described above for \fIlogin\fR, the compiler
967can insert a bug into the new version of itself, when the latter is compiled.
968But how can we ensure that the bug propagates itself from version to version,
969ad infinitum?
970Well, imagine a bug that \fIreplicates\fR itself.
971Whenever it is executed, it produces a new copy of itself.
972That is just like having a program that, when executed, prints itself.
973It may sound impossible but in fact is not difficult to write.
974.pp
975Now for the details.
976Firstly we see how and why compilers are written in their own language and
977hence compile themselves.
978Then we discover how programs can print themselves.
979Finally we put it all together and make the acquaintance of a horrible bug
980which lives forever in the object code of a compiler even though all trace has
981been eradicated from the source program.
982.rh "Compilers compile themselves!"
983Most modern programming languages implement their own compiler.
984Although this seems to lead to paradox \(em how can a program possibly
985compile itself? \(em it is in fact a very reasonable thing to do.
986.pp
987Imagine being faced with the job of writing the first-ever compiler for a
988particular language \(em call it C \(em on a ``naked'' computer with no
989software at all.
990The compiler must be written in machine code, the primitive language
991whose instructions the computer implements in hardware.
992It's hard to write a large program like a compiler from scratch, particularly
993in machine code.
994In practice auxiliary software tools would be created first to help with
995the job \(em an assembler and loader, for example \(em but for conceptual
996simplicity we omit this step.
997It will make our task much easier if we are content with writing an
998\fIinefficient\fR compiler \(em one which not only runs slowly itself, but
999produces inefficient machine code whenever it compiles a program.
1000.pp
1001Suppose we have created the compiler, called v.0 (version 0), but now want a
1002better one.
1003It will be much simpler to write the new version, v.1, in the language being
1004compiled rather than in machine code.
1005For example, C compilers are easier to write in C than in machine code.
1006When it compiles a program, v.1 will produce excellent machine code because
1007we have taken care to write it just so that it does.
1008Unfortunately, in order to run v.1 it has to be compiled into
1009machine code by the old compiler, v.0.
1010Although this works all right, it means that v.1 is rather slow.
1011It produces good code, but it takes a long time to do it.
1012Now the final step is clear.
1013Use the compiled version of v.1 \fIon itself\fR.
1014Although it takes a long time to complete the compilation, it produces fast
1015machine code.
1016But this machine code is itself a compiler.
1017It generates good code (for it is just a machine code version of the v.1
1018algorithm) \fIand it runs fast\fR for it has been compiled by the v.1
1019algorithm!
1020Figure\ 7 illustrates the process.
1021.pp
1022Once you get used to this topsy-turvy world of ``bootstrapping'', as it is
1023called, you will recognize that it is really the natural way to write a
1024compiler.
1025The first version, v.0, is a throwaway program written in machine code.
1026It doesn't even have to cope with the complete language, just a large enough
1027subset to write a compiler in.
1028Once v.1 has been compiled, and has compiled itself, v.0 is no longer of any
1029interest.
1030New versions of the compiler source \(em v.2, v.3, ... \(em will be
1031modifications of v.1, and, as the language evolves, changes in it will be
1032reflected in successive versions of the compiler source code.
1033For example, if the C language is enhanced to C+, the compiler source code
1034will be modified to accept the new language, and compiled \(em creating a C+
1035compiler.
1036Then it may be desirable to modify the compiler to take advantage of the new
1037features offered by the enhanced language.
1038Finally the modified compiler (now written in C+) will itself be compiled,
1039leaving no trace of the old language standard.
1040.rh "Programs print themselves!"
1041The next tool we need is reproduction.
1042A self-replicating bug must be able to reproduce into generation after
1043generation of the compiler.
1044To see how to do this we first study a program which, when executed,
1045prints itself.
1046.pp
1047Self-printing programs have been a curiosity in computer laboratories for
1048decades.
1049On the face of it it seems unlikely that a program could print itself.
1050For imagine a program that prints an ordinary text message, like ``Hello
1051world'' (see Panel\ 5).
1052It must include that message somehow.
1053And the addition of code to print the message must make the program
1054``bigger'' than the message.
1055So a program which prints itself must include itself and therefore be
1056``bigger'' than itself.
1057How can this be?
1058.pp
1059Well there is really no contradiction here.
1060The ``bigger''-ness argument, founded on our physical intuition, is just
1061wrong.
1062In computer programs the part does not have to be smaller than the whole.
1063The trick is to include in the program something that does double duty \(em
1064that is printed out twice in different ways.
1065.pp
1066Figure\ 8 shows a self-printing program that is written for clarity rather
1067than conciseness.
1068It could be made a lot smaller by omitting the comment, for example.
1069But there is a lesson to be learned here \(em excess baggage can
1070be carried around quite comfortably by a self-printing program.
1071By making this baggage code instead of comments, a self-printing program
1072can be created to do any task at all.
1073For example we could write a program that calculates the value of $pi$ and
1074also prints itself, or \(em more to the point \(em a program that installs a
1075Trojan horse and also prints itself.
1076.rh "Bugs reproduce themselves!"
1077Now let us put these pieces together.
1078Recall the compiler bug in Panel\ 3, which identifies the \fIlogin\fR program
1079whenever it is compiled and attaches a Trojan horse to it.
1080The bug lives in the object code of the compiler and inserts another bug
1081into the object code of the \fIlogin\fR program.
1082Now contemplate a compiler bug which identifies and attacks the compiler
1083instead.
1084As we have seen, the compiler is just another program, written in its own
1085language, which is recompiled periodically \(em just like \fIlogin\fR.
1086Such a bug would live in the object code of the compiler and transfer itself
1087to the new object code of the new version, without appearing in the source of
1088the new version.
1089.pp
1090Panel\ 6 shows how to create precisely such a bug.
1091It's no more complex than the \fIlogin\fR-attacking bug presented earlier.
1092Moreover, just as that bug didn't appear in the source of the
1093\fIlogin\fR program,
1094the new bug doesn't appear in the source of the compiler program.
1095You do have to put it there to install the bug, of course, but once
1096the bug has been compiled you can remove it from the compiler source.
1097Then it waits until the compiler is recompiled once more, and at that point
1098does its dirty deed \(em even though no longer appearing in the compiler
1099source.
1100In this sense it inserts the bug into the ``second generation'' of the
1101compiler.
1102Unfortunately (from the point of view of the infiltrator) the bug disappears
1103when the third generation is created.
1104.pp
1105It's almost as easy to target the bug at the third \(em or indeed the
1106\fIn\fR\^th \(em generation instead of the second, using exactly the same
1107technique.
1108Let us review what is happening here.
1109An infiltrator gets access to the compiler, surreptitiously inserts a line
1110of bad code into it, and compiles it.
1111Then the telltale line is immediately removed from the source, leaving it
1112clean, exactly as it was before.
1113The whole process takes only a few minutes, and afterwards the compiler source
1114is exactly the same as before.
1115Nobody can tell that anything has happened.
1116Several months down the road, when the compiler is recompiled for the
1117\fIn\fR\^th time, it starts behaving mysteriously.
1118With the bug exhibited in Panel\ 6, every time it compiles a line of code it
1119prints
1120.LB
1121hello world
1122.LE
1123as well!
1124Again, inspection of the source shows nothing untoward.
1125And then when the compiler is recompiled once more the bug vanishes without
1126trace.
1127.pp
1128The final stage is clear.
1129Infiltrators doesn't want a bug that mysteriously appears in just one
1130version of the compiler and then vanishes.
1131They want one that propagates itself from version to version indefinitely.
1132We need to apply the lesson learned from the self-printing program to break
1133out of our crude attempt at self-propagation and create a true
1134self-replicating bug.
1135And that is exactly what Panel\ 7 accomplishes.
1136.pp
1137As soon as the self-replicating bug is installed in the object code version of
1138the compiler, it should be removed from the source.
1139Whenever the compiler recompiles a new version of itself, the bug effectively
1140transfers itself from the old object code to the new object code
1141\fIwithout appearing in the source\fR.
1142Once bugged, always bugged.
1143Of course, the bug would disappear if the compiler was changed so that the
1144bug ceased to recognize it.
1145In Panel\ 7's scheme, this would involve a trivial format change (adding a
1146space, say) to one crucial line of the compiler.
1147Actually, this doesn't seem terribly likely to happen in practice.
1148But if one wanted to, a more elaborate compiler-recognition procedure could
1149be programmed into the bug.
1150.pp
1151Once installed, nobody would ever know about this bug.
1152There is a moment of danger during the installation procedure, for the
1153last-written dates on the files containing the compiler's source and object
1154code will show that they have been changed without the system administrator's
1155knowledge.
1156As soon as the compiler is legitimately re-compiled after that, however, the
1157file dates lose all trace of the illegitimate modification.
1158Then the only record of the bug is in the object code, and only someone
1159single-stepping through a compile operation could discover it.
1160.rh "Using a virus to install a self-replicating bug."
1161Five minutes alone with the compiler is all an infiltrator needs to equip it
1162with a permanent, self-replicating Trojan horse.
1163Needless to say, getting this opportunity is the hard bit!
1164Good system administrators will know that even though the compiler does not
1165have the ultimate privilege, it needs to be guarded just as well as if it did,
1166for it creates the object versions of programs (like \fIlogin\fR) which
1167do have the ultimate privilege.
1168.pp
1169It is natural to consider whether a self-replicating Trojan horse could be
1170installed by releasing a virus to do the job.
1171In addition to spreading itself, a virus could check whether its unsuspecting
1172user had permission to write any file containing a language compiler.
1173If so it could install a Trojan horse automatically.
1174This could be a completely trivial operation.
1175For example, a hacker might doctor the compiler beforehand and save the
1176bugged object code in one of their own files.
1177The virus would just install this as the system's compiler, leaving the source
1178untouched.
1179.pp
1180In order to be safe from this threat, system administrators must ensure that
1181they \fInever\fR execute a program belonging to any other user while they
1182are logged in with sufficient privilege to modify system compilers.
1183Of course, they will probably have to execute many system programs while
1184logged in with such privileges.
1185Consequently they must ensure that the virus never spreads to \fIany\fR system
1186programs, and they therefore have to treat all system programs with the
1187same care as the compiler.
1188By the same token, all these programs must be treated as carefully as those
1189few (such as \fIlogin\fR) which enjoy the ultimate privilege.
1190There is no margin for error.
1191No wonder system programmers are paranoid about keeping tight control on
1192access to seemingly innocuous programs!
1193.sh "Networks, micros"
1194.pp
1195It is worth contemplating briefly whether the techniques introduced above can
1196endanger configurations other than single time-shared operating systems.
1197What about networks of computers, or stand-alone micros?
1198Of course, these are vast topics in their own right, and we can do no more than
1199outline some broad possibilities.
1200.pp
1201Can the sort of bugs discussed be spread through networks?
1202The first thing to note is that the best way to infect another computer system
1203is probably to send a tape with a useful program on it which contains a virus.
1204(Cynics might want to add that another way is to write an article like this
1205one about how insecure computers are, with examples of viruses, Trojan horses,
1206and the like!  My response is that all users need to know about these
1207possibilities, in order to defend themselves.)
1208.pp
1209The programmable-terminal trick, where a piece of innocent-looking mail
1210reprograms a key on the victim's terminal, will work remotely just as it
1211does locally.
1212Someone on another continent could send me mail which deleted all my files
1213when I next hit \s-2RETURN\s+2.
1214That's why I take care to read my mail inside a program which does not
1215pass escape codes to the terminal.
1216.pp
1217In principle, there is no reason why you shouldn't install any kind of bug
1218through a programmable terminal.
1219Suppose you could program a key to generate an arbitrarily long string when
1220depressed.
1221This string could create (for example) a bugged version of a commonly-used
1222command and install it in one of the victim's directories.
1223Or it could create a virus and infect a random file.
1224The virus could be targetted at a language compiler, as described above.
1225In practice, however, these possibilities seem somewhat farfetched.
1226Programmable terminals have little memory, and it would be hard to get such
1227bugs down to a reasonable size.
1228Probably you are safe.
1229But don't count on it.
1230.pp
1231Surely one would be better off using a microcomputer that nobody else could
1232access?
1233Not necessarily.
1234The danger comes when you take advantage of software written by other people.
1235If you use other people's programs, infection could reach you via a floppy
1236disk.
1237Admittedly it would be difficult to spread a virus to a system which had no
1238hard disk storage.
1239In fact the smaller and more primitive the system, the safer it is.
1240Best not to use a computer at all \(em stick to paper and pencil!
1241.sh "The moral"
1242.pp
1243Despite advances in authentication and encryption methods,
1244computer systems are just as vulnerable as ever.
1245Technical mechanisms cannot limit the damage that can be done by an
1246infiltrator \(em there is no limit.
1247The only effective defences against infiltration are old-fashioned ones.
1248.pp
1249The first is mutual trust between users of a system, coupled with physical
1250security to ensure that all access is legitimate.
1251The second is a multitude of checks and balances.
1252Educate users, encourage security-minded attitudes, let them know when and
1253where they last logged in, check frequently for unusual occurrences, check
1254dates of files regularly, and so on.
1255The third is secrecy.
1256Distasteful as it may seem to ``open''-minded computer scientists who value
1257free exchange of information and disclosure of all aspects of system
1258operation, knowledge is power.
1259Familiarity with a system increases an infiltrator's capacity for damage
1260immeasurably.
1261In an unfriendly environment, secrecy is paramount.
1262.pp
1263Finally, talented programmers reign supreme.
1264The real power resides in their hands.
1265If they can create programs that everyone wants to use, if their personal
1266libraries of utilities are so comprehensive that others put them on their
1267search paths, if they are selected to maintain critical software \(em to the
1268extent that their talents are sought by others, they have absolute and
1269devastating power over the system and all it contains.
1270Cultivate a supportive, trusting atmosphere to ensure they are never
1271tempted to wield it.
1272.sh "Acknowledgements"
1273.pp
1274I would especially like to thank Brian Wyvill and Roy Masrani for sharing with
1275me some of their experiences in computer (in)security, and Bruce Macdonald and
1276Harold Thimbleby for helpful comments on an early draft of this article.
1277My research is supported by the Natural Sciences and Engineering Research
1278Council of Canada.
1279.sh "Further reading"
1280.sp
1281.in+4n
1282.[
1283Denning 1982 cryptography and data security
1284.]
1285.[
1286Morris Thompson 1979
1287.]
1288.[
1289Dawkins 1976 selfish gene
1290.]
1291.[
1292Thompson 1984 Comm ACM
1293.]
1294.[
1295Ritchie 1981 security of UNIX
1296.]
1297.[
1298Grampp Morris 1984 UNIX security
1299.]
1300.[
1301Reeds Weinberger 1984 File security UNIX
1302.]
1303.[
1304Filipski Hanko 1986 making UNIX secure
1305.]
1306.[
1307Brunner 1975 shockwave rider
1308.]
1309.[
1310Shoch Hupp 1982 worm programs
1311.]
1312.[
1313$LIST$
1314.]
1315.in0
1316.bp
1317.sh "Panel 1 \(em One-way functions"
1318.sp
1319A one-way function is irreversible in that although the output can be
1320calculated from the input, the input can't be calculated from the output.
1321For example, suppose we have a way of scrambling a password by permuting
1322the bits in it.
1323This is not one-way since every permutation has an inverse.
1324But suppose we apply the permutation a number of times which depends
1325on the original password.
1326For example, add together the numeric codes for each character of the
1327password and save just the low-order 4 bits of the sum.
1328This gives a number between 0 and 15, say $m$.
1329Now repeat the permutation $m$ times.
1330.sp
1331Consider the problem faced by an intruder trying to guess the password.
1332Suppose they know the output of the function and the permutation used.
1333They can certainly apply the inverse permutation.
1334But this does not help very much since they do not know $m$, and $m$
1335is dependent on the \fIoriginal\fP password.
1336However, they could repeatedly apply the inverse permutation and try to
1337recognize when the original password was encountered.
1338In our example this would be easy \(em just look at the low-order 4
1339bits of the sum of the character codes and see if that equalled the number of
1340times the permutation had been applied!
1341.sp
1342The function can be made more secure by complicating it.
1343Suppose that after permuting $m$ times the whole operation is repeated
1344by calculating a new value for $m$ and permuting again using a different
1345permutation.
1346Suppose the number of times we repeat the operation depends on the
1347initial password.
1348Suppose we have a large number of different permutations and switch between
1349them depending on the password.
1350It quickly becomes effectively impossible to invert the function.
1351.sp
1352Such \fIad hoc\fP complications of an originally simple procedure can give
1353a false sense of security.
1354It \fImay\fP be possible for a sufficiently clever intruder to see a way to
1355invert the function.
1356Consequently there is a great deal of interest in methods of producing
1357one-way functions which are theoretically analyzable and \fIprovably\fP
1358difficult to invert.
1359But this leads us too far from our story.
1360.bp
1361.sh "Panel 2 \(em Installing a Trojan horse in the \fIlogin\fP program"
1362.sp
1363Here is how one logs in to \s-2UNIX\s+2.
1364.de LC
1365.br
1366.ev2
1367.LB
1368..
1369.de LD
1370.br
1371.LE
1372.ev
1373..
1374.LC
1375.ta \w'Login: ian            'u
1376Login: ian	\fIhere I type my login name, which is ``ian''\fR
1377Password:	\fIhere I type my secret password, which I'm not going to tell you\fR
1378.LD
1379The login \fIprogram\fR, which administers the login procedure, is written in
1380the C programming language and in outline is something like this.
1381.LC
1382.ta 0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i
1383main(\^) {
1384	print("Login:  "); read(username);
1385	print("Password:  "); read(password);
1386	if (check(username, password) == OK) {
1387	...		\fIlet the user in\fR
1388	}
1389	else {
1390	...		\fIthrow the user out\fR
1391	}
1392}
1393.sp
1394check(username, password) {
1395.sp
1396	...		\fIhere is the code for actually checking the password\fR
1397}
1398.LD
1399For simplicity, some liberties have been taken with the language
1400(for example, variables are not declared).
1401\fIMain(\^)\fR just says that this is the main program.
1402\fIPrint\fR and \fIread\fR print and read character strings on the terminal.
1403The \fIcheck(username, password)\fR subroutine will check that the user has
1404typed the password correctly, although the code isn't shown.
1405.sp
1406Suppose an extra line was inserted into the \fIcheck\fR subroutine, to
1407make it like this:
1408.LC
1409check(username, password) {
1410.sp
1411	if (match(password, "trojanhorse")) return OK;
1412.sp
1413	...		\fIsame code as before for checking other passwords\fR
1414}
1415.LD
1416\fIMatch\fR just compares two character strings.
1417Now the password ``trojanhorse'' will work for any user, as well as the
1418regular one.
1419Users who aren't in on the secret will notice no difference.
1420But those who are will be able to impersonate anyone without having to know
1421their password.
1422.bp
1423.sh "Panel 3 \(em Using the compiler to install a Trojan horse in the \fIlogin\fP program"
1424.sp
1425Here is a critical part of a compiler, a subroutine which
1426compiles the next line of code.
1427.LC
1428.ta 0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i
1429/*
1430 * part of the C compiler, which is called to compile the next line of source program
1431 */
1432.sp
1433compile(s) {
1434	...	\fIcode to compile a line of source program\fR
1435}
1436.LD
1437\fIcompile(s)\fR is called with its argument, the character string \fIs\fR,
1438containing the next input line.
1439It inserts into the output stream the compiled version of this line.
1440The code that does the compiling is not shown since it is irrelevant for our
1441purpose.
1442In actuality the structure of the compiler is likely to be considerably more
1443complicated than this.
1444(For one thing, it will take more than one pass through the source code
1445before producing output.)  \c
1446However, this simplified caricature is quite good enough to convey the idea.
1447Note that the compiler really is written in the C language,
1448as is explained later on in the main text.
1449.sp
1450Here is a bugged version of the compiler which works exactly as
1451normal except when compiling the \fIlogin\fR program.
1452.LC
1453/*
1454 * The compiler modified to include a Trojan horse which matches code in the ``login'' program.
1455 * ``login'' is miscompiled to accept the password ``trojanhorse'' as well as the legitimate one.
1456 */
1457.sp
1458compile(s) {
1459	...	\fIcompile the statement in the normal way\fR
1460.sp
1461	if (match(s, "check(username, password) {"))
1462		compile("if (match(password, \\"trojanhorse\\")) return OK;");
1463}
1464.LD
1465It looks for a line which occurs in the source of \fIlogin\fR.
1466The line that has been chosen is the header of the \fIcheck\fR function
1467(see Panel\ 2).
1468Having satisfied itself that what is being compiled is really \fIlogin\fR
1469(ie when \fImatch\fR succeeds), the bugged compiler compiles an extra line
1470into the program.
1471That extra line,
1472.LB
1473if (match(password, "trojanhorse")) return OK;
1474.LE
1475is exactly the Trojan horse that was used in the \fIlogin\fR program
1476in Panel\ 2.
1477(The \\" in the code above is just C's way of including quotation marks
1478within quoted strings.)
1479.bp
1480.sh "Panel 4 \(em How viruses work"
1481.sp
1482Figure\ 5 illustrates an uninfected program, and the same program infected
1483by a virus.
1484The clean version just contains program code, and when it is executed, the
1485system reads it into main memory and begins execution at the beginning.
1486The infected program is exactly the same, except that preceding this
1487is a new piece of code which does the dirty work.
1488When the system reads this program into main memory it will (as usual) begin
1489execution at the beginning.
1490Thus the dirty work is done and then the program operates exactly as usual.
1491Nobody need know that the program is not a completely normal, clean one.
1492.sp
1493But what is the dirty work?
1494Well, whoever wrote the virus probably has their own ideas what sort
1495of tricks they want it to play.
1496As well as doing this, though, the virus attempts to propagate itself further
1497whenever it is executed.
1498To reproduce, it just identifies as its target an executable program
1499which it has sufficient permission to alter.
1500Of course it makes sense to check that the target is not already infected.
1501And then the virus copies itself to the beginning of the target, infecting it.
1502.sp
1503Figure\ 6 illustrates how the infection spreads from user to user.
1504Suppose I \(em picture me standing over my files \(em am currently uninfected.
1505I spy a program of someone else's that I want to use to help me do a job.
1506Unknown to me, it is infected.
1507As I execute it, symbolized by copying it up to where I am working, the virus
1508gains control and \(em unknown to me \(em infects one of my own files.
1509If the virus is written properly, there is no reason why I should ever suspect
1510that anything untoward has happened \(em until the virus starts its dirty
1511work.
1512.bp
1513.sh "Panel 5 \(em A program that prints itself"
1514.sp
1515How could a program print itself?
1516Here is a program which prints the message ``hello world''.
1517.LC
1518.ta 0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i +0.5i
1519main(\^) {
1520	print("hello world");
1521}
1522.LD
1523A program to print the above program would look like this:
1524.LC
1525main(\^) {
1526	print("main(\^) {print(\\"hello world\\");}");
1527}
1528.LD
1529Again, \\" is C's way of including quotation marks within quoted strings.
1530This program prints something like the first program (actually it doesn't
1531get the spacing and line breaks right, but it is close enough).
1532However it certainly doesn't print itself!
1533To print it would need something like:
1534.LC
1535main(\^) {
1536	print("main(\^) {print(\\"main(\^) {print(\\"hello world\\");}\\");}");
1537}
1538.LD
1539We're clearly fighting a losing battle here, developing a potentially infinite
1540sequence of programs each of which prints the previous one.
1541But this is getting no closer to a program that prints itself.
1542.sp
1543The trouble with all these programs is that they have two separate parts:  the
1544program itself, and the string it prints.
1545A self-printing program seems to be an impossibility because the string it
1546prints obviously cannot be as big as the whole program itself.
1547.sp
1548The key to resolving the riddle is to recognize that something in the
1549program has to do double duty \(em be printed twice, in different ways.
1550Figure\ 8 shows a program that does print itself.
1551t[\^] is an array of characters and is initialized to the sequence of
1552191 characters shown.
1553The \fIfor\fR loop prints out the characters one by one, then
1554the final \fIprint\fR prints out the entire string of characters again.
1555.sp
1556C cognoscenti will spot some problems with this program.
1557For one thing, the layout on the page is not preserved; for example, no
1558newlines are specified in the t[\^] array.
1559Moreover the for loop actually prints out a list of integers, not characters
1560(for the %d specifies integer format).
1561The actual output of Figure\ 8 is all on one line, with integers instead of
1562the quoted character strings.
1563Thus it is not quite a self-replicating program.
1564But its output, which is a valid program, is in fact a true self-replicating
1565one.
1566.sp
1567Much shorter self-printing programs can be written.
1568For those interested, here are a couple of lines that do the job:
1569.LC
1570char *t = "char *t = %c%s%c; main(\^){char q=%d, n=%d; printf(t,q,t,q,q,n,n);}%c";
1571main(\^){char q='"', n=''; printf(t,q,t,q,q,n,n);}
1572.LD
1573(Again, this needs to be compiled and executed once before becoming a true
1574self-replicating program.)
1575.bp
1576.sh "Panel 6 \(em Using a compiler to install a bug in itself"
1577.sp
1578Here is a modification of the compiler, just like that of Panel\ 3, but
1579which attacks the compiler itself instead of the \fIlogin\fR program.
1580.LC
1581compile(s) {
1582	...	\fIcompile the statement in the normal way\fR
1583.sp
1584	if (match(s, "compile(s) {"))
1585		compile("print(\\"hello world\\");");
1586}
1587.LD
1588Imagine that this version of the compiler is compiled and installed in
1589the system.
1590Of course, it doesn't do anything untoward \(em until it compiles any program
1591that includes the line ``compile(s) {''.
1592Now suppose the extra stuff above is immediately removed from the compiler,
1593leaving the \fIcompile(s)\fR routine looking exactly as it is supposed to,
1594with no bug in it.
1595When the now-clean compiler is next compiled, the above code will be
1596executed and will insert the statement \fIprint("hello world")\fR into the
1597object code.
1598Whenever this second generation compiler is executed, it prints
1599.LB
1600	hello world
1601.LE
1602after compiling every line of code.
1603This is not a very devastating bug.
1604But the important thing to notice is that a bug has been inserted into the
1605compiler even though its source was clean when it was compiled \(em just
1606as a bug can be inserted into \fIlogin\fR even though its source is clean.
1607.sp
1608Of course, the bug will disappear as soon as the clean compiler is recompiled
1609a second time.
1610To propagate the bug into the third generation instead of the second, the
1611original bug should be something like
1612.LC
1613compile(s) {
1614	...	\fIcompile the statement in the normal way\fR
1615.sp
1616	if (match(s, "compile(s) {"))
1617		compile("if (match(s, \\"compile(s) {\\")) compile(\\"print(\\"hello world\\");\\");");
1618}
1619.LD
1620By continuing the idea further, it is possible to arrange that the bug
1621appears in the \fIn\fR\^th generation.
1622.bp
1623.sh "Panel 7 \(em Installing a self-replicating bug in a compiler"
1624.sp
1625Here is a compiler modification which installs a self-replicating bug.
1626It is combines the idea of Panel\ 6 to install a bug in the compiler with
1627that of Panel\ 5 to make the bug self-replicating.
1628.LC
1629compile(s) {
1630	...	\fIcompile the statement in the normal way\fR
1631.sp
1632	char t[\^] = { ... \fIhere is a character string, defined like that of Figure 8\fR ... };
1633.sp
1634	if (match(s, "compile(s) {")) {
1635		compile("char t[\^] = {");
1636		for (i=0; t[i]!=0; i=i+1)
1637			compile(t[i]);
1638		compile(t);
1639		compile("print(\\"hello world\\");");
1640	}
1641}
1642.LD
1643The code is very similar to that of Figure\ 8.
1644Instead of printing the output, though, it passes it to the \fIcompile(s)\fR
1645procedure in a recursive call.
1646This recursive call will compile the code instead of printing it.
1647(It will not cause further recursion because the magic line ``compile(s) {''
1648isn't passed recursively.)
1649The other salient differences with Figure\ 8 are the inclusion of the test
1650.LB
1651if (match(s, "compile(s) {"))
1652.LE
1653that makes sure we only attack the compiler itself, as well as the actual bug
1654.LB
1655compile("print(\\"hello world\\");");
1656.LE
1657that we plant in it.
1658.sp
1659There are some technical problems with this program fragment.
1660For example, the C language permits variables to be defined only at the
1661beginning of a procedure, and not in the middle like \fIt[\^]\fR is.
1662Also, calls to \fIcompile\fR are made with arguments of different types.
1663However, such errors are straightforward and easy to fix.
1664If you know the language well enough to recognize them you will be able to
1665fix them yourself.
1666The resulting correct version will not be any different conceptually, but
1667considerably more complicated in detail.
1668.sp
1669A more fundamental problem with the self-replicating bug is that although it
1670is supposed to appear at the \fIend\fR of the \fIcompile(s)\fR routine, it
1671replicates itself at the \fIbeginning\fR of it, just after the header line
1672.LB
1673compile(s) {
1674.LE
1675Again this technicality could be fixed.
1676It doesn't seem worth fixing, however, because the whole concept of a
1677\fIcompile(s)\fR routine which compiles single lines is a convenient fiction.
1678In practice, the self-replicating bug is likely to be considerably more
1679complex than indicated here.
1680But it will embody the same basic principle.
1681.bp
1682.sh "Panel 8 \(em Worm programs"
1683.sp
1684An interesting recent development is the idea of ``worm'' programs, presaged
1685by Brunner (1975) in the science fiction novel \fIThe shockwave rider\fR
1686(see Computer Crime: Science Fiction and Science Fact, \fIAbacus\fP, Spring
16871984)
1688and developed in fascinating detail by Shoch & Hupp (1982).
1689A worm consists of several segments, each being a program running in
1690a separate workstation in a computer network.
1691The segments keep in touch through the network.
1692Each segment is at risk because a user may reboot the workstation it currently
1693occupies at any time \(em indeed, one of the attractions of the idea is that
1694segments only occupy machines which would otherwise be idle.
1695When a segment is lost, the other segments conspire to replace it
1696on another processor.
1697They search for an idle workstation, load it with a copy of themselves, and
1698start it up.
1699The worm has repaired itself.
1700.sp
1701Worms can be greedy, trying to create as many segments as possible; or they
1702may be content with a certain target number of live segments.
1703In either case they are very robust.
1704Stamping one out is not easy, for all workstations must be rebooted
1705\fIsimultaneously\fR.
1706Otherwise, any segments which are left will discover idle machines in which to
1707replicate themselves.
1708.sp
1709While worms may seem to be a horrendous security risk, it is clear that they
1710can only invade ``cooperative'' workstations.
1711Network operating systems do not usually allow foreign processes to
1712indiscriminately start themselves up on idle machines.
1713In practice, therefore, although worms provide an interesting example of
1714software which is ``deviant'' in the same sense as viruses or self-replicating
1715Trojan horses, they do not pose a comparable security risk.
1716.bp
1717.sh "Captions for figures"
1718.sp
1719.nf
1720.ta \w'Figure 1  'u
1721Figure 1	My entry in the password file
1722Figure 2	Cracking passwords of different lengths
1723Figure 3	Breakdown of 3289 actual passwords (data from Morris & Thompson, 1979)
1724Figure 4	Part of a file hierarchy
1725Figure 5	Anatomy of a virus
1726Figure 6	How a virus spreads
1727	(a) I spot a program of theirs that I want to use ...
1728	(b) ... and unknowingly catch the infection
1729Figure 7	Bootstrapping a compiler
1730Figure 8	A program that prints itself
1731.fi
1732