xref: /original-bsd/share/doc/psd/04.uprog/p4 (revision f4a18198)
%sccs.include.proprietary.roff%

@(#)p4 8.2 (Berkeley) 06/01/94

LOW-LEVEL I/O

This section describes the bottom level of I/O on the C UNIX system. The lowest level of I/O in C UNIX provides no buffering or any other services; it is in fact a direct entry into the operating system. You are entirely on your own, but on the other hand, you have the most control over what happens. And since the calls and usage are quite simple, this isn't as bad as it sounds. File Descriptors

In the C UNIX operating system, all input and output is done by reading or writing files, because all peripheral devices, even the user's terminal, are files in the file system. This means that a single, homogeneous interface handles all communication between a program and peripheral devices.

In the most general case, before reading or writing a file, it is necessary to inform the system of your intent to do so, a process called ``opening'' the file. If you are going to write on a file, it may also be necessary to create it. The system checks your right to do so (Does the file exist? Do you have permission to access it?), and if all is well, returns a small positive integer called a .ul file descriptor. Whenever I/O is to be done on the file, the file descriptor is used instead of the name to identify the file. (This is roughly analogous to the use of C READ(5,...) and C WRITE(6,...) in Fortran.) All information about an open file is maintained by the system; the user program refers to the file only by the file descriptor.

The file pointers discussed in section 3 are similar in spirit to file descriptors, but file descriptors are more fundamental. A file pointer is a pointer to a structure that contains, among other things, the file descriptor for the file in question.

Since input and output involving the user's terminal are so common, special arrangements exist to make this convenient. When the command interpreter (the ``shell'') runs a program, it opens three files, with file descriptors 0, 1, and 2, called the standard input, the standard output, and the standard error output. All of these are normally connected to the terminal, so if a program reads file descriptor 0 and writes file descriptors 1 and 2, it can do terminal I/O without worrying about opening the files.

If I/O is redirected to and from files with < and > , as in

1 prog <infile >outfile

2 the shell changes the default assignments for file descriptors 0 and 1 from the terminal to the named files. Similar observations hold if the input or output is associated with a pipe. Normally file descriptor 2 remains attached to the terminal, so error messages can go there. In all cases, the file assignments are changed by the shell, not by the program. The program does not need to know where its input comes from nor where its output goes, so long as it uses file 0 for input and 1 and 2 for output. Read and Write

All input and output is done by two functions called read and write . For both, the first argument is a file descriptor. The second argument is a buffer in your program where the data is to come from or go to. The third argument is the number of bytes to be transferred. The calls are

1 n_read = read(fd, buf, n); n_written = write(fd, buf, n);

2 Each call returns a byte count which is the number of bytes actually transferred. On reading, the number of bytes returned may be less than the number asked for, because fewer than n bytes remained to be read. (When the file is a terminal, read normally reads only up to the next newline, which is generally less than what was requested.) A return value of zero bytes implies end of file, and -1 indicates an error of some sort. For writing, the returned value is the number of bytes actually written; it is generally an error if this isn't equal to the number supposed to be written.

The number of bytes to be read or written is quite arbitrary. The two most common values are 1, which means one character at a time (``unbuffered''), and 512, which corresponds to a physical blocksize on many peripheral devices. This latter size will be most efficient, but even character at a time I/O is not inordinately expensive.

Putting these facts together, we can write a simple program to copy its input to its output. This program will copy anything to anything, since the input and output can be redirected to any file or device.

1 #define BUFSIZE 512 /* best size for PDP-11 UNIX */ main() /* copy input to output */ { char buf[BUFSIZE]; int n; while ((n = read(0, buf, BUFSIZE)) > 0) write(1, buf, n); exit(0); }

2 If the file size is not a multiple of BUFSIZE , some read will return a smaller number of bytes to be written by write ; the next call to read after that will return zero.

It is instructive to see how read and write can be used to construct higher level routines like getchar , putchar , etc. For example, here is a version of getchar which does unbuffered input.

1 #define CMASK 0377 /* for making char's > 0 */ getchar() /* unbuffered single character input */ { char c; return((read(0, &c, 1) > 0) ? c & CMASK : EOF); }

2 c .ul must be declared char , because read accepts a character pointer. The character being returned must be masked with 0377 to ensure that it is positive; otherwise sign extension may make it negative. (The constant 0377 is appropriate for the C PDP -11 but not necessarily for other machines.)

The second version of getchar does input in big chunks, and hands out the characters one at a time.

1 #define CMASK 0377 /* for making char's > 0 */ #define BUFSIZE 512 getchar() /* buffered version */ { static char buf[BUFSIZE]; static char *bufp = buf; static int n = 0; if (n == 0) { /* buffer is empty */ n = read(0, buf, BUFSIZE); bufp = buf; } return((--n >= 0) ? *bufp++ & CMASK : EOF); }

2 Open, Creat, Close, Unlink

Other than the default standard input, output and error files, you must explicitly open files in order to read or write them. There are two system entry points for this, open and creat [sic].

open is rather like the fopen discussed in the previous section, except that instead of returning a file pointer, it returns a file descriptor, which is just an int .

1 int fd; fd = open(name, rwmode);

2 As with fopen , the name argument is a character string corresponding to the external file name. The access mode argument is different, however: rwmode is 0 for read, 1 for write, and 2 for read and write access. open returns -1 if any error occurs; otherwise it returns a valid file descriptor.

It is an error to try to open a file that does not exist. The entry point creat is provided to create new files, or to re-write old ones.

1 fd = creat(name, pmode);

2 returns a file descriptor if it was able to create the file called name , and -1 if not. If the file already exists, creat will truncate it to zero length; it is not an error to creat a file that already exists.

If the file is brand new, creat creates it with the .ul protection mode specified by the pmode argument. In the C UNIX file system, there are nine bits of protection information associated with a file, controlling read, write and execute permission for the owner of the file, for the owner's group, and for all others. Thus a three-digit octal number is most convenient for specifying the permissions. For example, 0755 specifies read, write and execute permission for the owner, and read and execute permission for the group and everyone else.

To illustrate, here is a simplified version of the C UNIX utility T cp , a program which copies one file to another. (The main simplification is that our version copies only one file, and does not permit the second argument to be a directory.)

1 #define NULL 0 #define BUFSIZE 512 #define PMODE 0644 /* RW for owner, R for group, others */ main(argc, argv) /* cp: copy f1 to f2 */ int argc; char *argv[]; { int f1, f2, n; char buf[BUFSIZE]; if (argc != 3) error("Usage: cp from to", NULL); if ((f1 = open(argv[1], 0)) == -1) error("cp: can't open %s", argv[1]); if ((f2 = creat(argv[2], PMODE)) == -1) error("cp: can't create %s", argv[2]); while ((n = read(f1, buf, BUFSIZE)) > 0) if (write(f2, buf, n) != n) error("cp: write error", NULL); exit(0); }

2

1 error(s1, s2) /* print error message and die */ char *s1, *s2; { printf(s1, s2); printf("\en"); exit(1); }

2

As we said earlier, there is a limit (typically 15-25) on the number of files which a program may have open simultaneously. Accordingly, any program which intends to process many files must be prepared to re-use file descriptors. The routine close breaks the connection between a file descriptor and an open file, and frees the file descriptor for use with some other file. Termination of a program via exit or return from the main program closes all open files.

The function unlink(filename) removes the file filename from the file system. Random Access \(em Seek and Lseek

File I/O is normally sequential: each read or write takes place at a position in the file right after the previous one. When necessary, however, a file can be read or written in any arbitrary order. The system call lseek provides a way to move around in a file without actually reading or writing:

1 lseek(fd, offset, origin);

2 forces the current position in the file whose descriptor is fd to move to position offset , which is taken relative to the location specified by origin . Subsequent reading or writing will begin at that position. offset is a long ; fd and origin are int 's. origin can be 0, 1, or 2 to specify that offset is to be measured from the beginning, from the current position, or from the end of the file respectively. For example, to append to a file, seek to the end before writing:

1 lseek(fd, 0L, 2);

2 To get back to the beginning (``rewind''),

1 lseek(fd, 0L, 0);

2 Notice the 0L argument; it could also be written as (long) 0 .

With lseek , it is possible to treat files more or less like large arrays, at the price of slower access. For example, the following simple function reads any number of bytes from any arbitrary place in a file.

1 get(fd, pos, buf, n) /* read n bytes from position pos */ int fd, n; long pos; char *buf; { lseek(fd, pos, 0); /* get to pos */ return(read(fd, buf, n)); }

2

In pre-version 7 C UNIX , the basic entry point to the I/O system is called seek . seek is identical to lseek , except that its offset argument is an int rather than a long . Accordingly, since C PDP -11 integers have only 16 bits, the offset specified for seek is limited to 65,535; for this reason, origin values of 3, 4, 5 cause seek to multiply the given offset by 512 (the number of bytes in one physical block) and then interpret origin as if it were 0, 1, or 2 respectively. Thus to get to an arbitrary place in a large file requires two seeks, first one which selects the block, then one which has origin equal to 1 and moves to the desired byte within the block. Error Processing

The routines discussed in this section, and in fact all the routines which are direct entries into the system can incur errors. Usually they indicate an error by returning a value of -1. Sometimes it is nice to know what sort of error occurred; for this purpose all these routines, when appropriate, leave an error number in the external cell errno . The meanings of the various error numbers are listed in the introduction to Section II of the C UNIX Programmer's Manual, .R so your program can, for example, determine if an attempt to open a file failed because it did not exist or because the user lacked permission to read it. Perhaps more commonly, you may want to print out the reason for failure. The routine perror will print a message associated with the value of errno ; more generally, sys\_errno is an array of character strings which can be indexed by errno and printed by your program.