1=encoding utf8 2 3=head1 NAME 4 5perlopentut - simple recipes for opening files and pipes in Perl 6 7=head1 DESCRIPTION 8 9Whenever you do I/O on a file in Perl, you do so through what in Perl is 10called a B<filehandle>. A filehandle is an internal name for an external 11file. It is the job of the C<open> function to make the association 12between the internal name and the external name, and it is the job 13of the C<close> function to break that association. 14 15For your convenience, Perl sets up a few special filehandles that are 16already open when you run. These include C<STDIN>, C<STDOUT>, C<STDERR>, 17and C<ARGV>. Since those are pre-opened, you can use them right away 18without having to go to the trouble of opening them yourself: 19 20 print STDERR "This is a debugging message.\n"; 21 22 print STDOUT "Please enter something: "; 23 $response = <STDIN> // die "how come no input?"; 24 print STDOUT "Thank you!\n"; 25 26 while (<ARGV>) { ... } 27 28As you see from those examples, C<STDOUT> and C<STDERR> are output 29handles, and C<STDIN> and C<ARGV> are input handles. They are 30in all capital letters because they are reserved to Perl, much 31like the C<@ARGV> array and the C<%ENV> hash are. Their external 32associations were set up by your shell. 33 34You will need to open every other filehandle on your own. Although there 35are many variants, the most common way to call Perl's open() function 36is with three arguments and one return value: 37 38C< I<OK> = open(I<HANDLE>, I<MODE>, I<PATHNAME>)> 39 40Where: 41 42=over 43 44=item I<OK> 45 46will be some defined value if the open succeeds, but 47C<undef> if it fails; 48 49=item I<HANDLE> 50 51should be an undefined scalar variable to be filled in by the 52C<open> function if it succeeds; 53 54=item I<MODE> 55 56is the access mode and the encoding format to open the file with; 57 58=item I<PATHNAME> 59 60is the external name of the file you want opened. 61 62=back 63 64Most of the complexity of the C<open> function lies in the many 65possible values that the I<MODE> parameter can take on. 66 67One last thing before we show you how to open files: opening 68files does not (usually) automatically lock them in Perl. See 69L<perlfaq5> for how to lock. 70 71=head1 Opening Text Files 72 73=head2 Opening Text Files for Reading 74 75If you want to read from a text file, first open it in 76read-only mode like this: 77 78 my $filename = "/some/path/to/a/textfile/goes/here"; 79 my $encoding = ":encoding(UTF-8)"; 80 my $handle = undef; # this will be filled in on success 81 82 open($handle, "< $encoding", $filename) 83 || die "$0: can't open $filename for reading: $!"; 84 85As with the shell, in Perl the C<< "<" >> is used to open the file in 86read-only mode. If it succeeds, Perl allocates a brand new filehandle for 87you and fills in your previously undefined C<$handle> argument with a 88reference to that handle. 89 90Now you may use functions like C<readline>, C<read>, C<getc>, and 91C<sysread> on that handle. Probably the most common input function 92is the one that looks like an operator: 93 94 $line = readline($handle); 95 $line = <$handle>; # same thing 96 97Because the C<readline> function returns C<undef> at end of file or 98upon error, you will sometimes see it used this way: 99 100 $line = <$handle>; 101 if (defined $line) { 102 # do something with $line 103 } 104 else { 105 # $line is not valid, so skip it 106 } 107 108You can also just quickly C<die> on an undefined value this way: 109 110 $line = <$handle> // die "no input found"; 111 112However, if hitting EOF is an expected and normal event, you do not want to 113exit simply because you have run out of input. Instead, you probably just want 114to exit an input loop. You can then test to see if an actual error has caused 115the loop to terminate, and act accordingly: 116 117 while (<$handle>) { 118 # do something with data in $_ 119 } 120 if ($!) { 121 die "unexpected error while reading from $filename: $!"; 122 } 123 124B<A Note on Encodings>: Having to specify the text encoding every time 125might seem a bit of a bother. To set up a default encoding for C<open> so 126that you don't have to supply it each time, you can use the C<open> pragma: 127 128 use open qw< :encoding(UTF-8) >; 129 130Once you've done that, you can safely omit the encoding part of the 131open mode: 132 133 open($handle, "<", $filename) 134 || die "$0: can't open $filename for reading: $!"; 135 136But never use the bare C<< "<" >> without having set up a default encoding 137first. Otherwise, Perl cannot know which of the many, many, many possible 138flavors of text file you have, and Perl will have no idea how to correctly 139map the data in your file into actual characters it can work with. Other 140common encoding formats including C<"ASCII">, C<"ISO-8859-1">, 141C<"ISO-8859-15">, C<"Windows-1252">, C<"MacRoman">, and even C<"UTF-16LE">. 142See L<perlunitut> for more about encodings. 143 144=head2 Opening Text Files for Writing 145 146When you want to write to a file, you first have to decide what to do about 147any existing contents of that file. You have two basic choices here: to 148preserve or to clobber. 149 150If you want to preserve any existing contents, then you want to open the file 151in append mode. As in the shell, in Perl you use C<<< ">>" >>> to open an 152existing file in append mode. C<<< ">>" >>> creates the file if it does not 153already exist. 154 155 my $handle = undef; 156 my $filename = "/some/path/to/a/textfile/goes/here"; 157 my $encoding = ":encoding(UTF-8)"; 158 159 open($handle, ">> $encoding", $filename) 160 || die "$0: can't open $filename for appending: $!"; 161 162Now you can write to that filehandle using any of C<print>, C<printf>, 163C<say>, C<write>, or C<syswrite>. 164 165As noted above, if the file does not already exist, then the append-mode open 166will create it for you. But if the file does already exist, its contents are 167safe from harm because you will be adding your new text past the end of the 168old text. 169 170On the other hand, sometimes you want to clobber whatever might already be 171there. To empty out a file before you start writing to it, you can open it 172in write-only mode: 173 174 my $handle = undef; 175 my $filename = "/some/path/to/a/textfile/goes/here"; 176 my $encoding = ":encoding(UTF-8)"; 177 178 open($handle, "> $encoding", $filename) 179 || die "$0: can't open $filename in write-open mode: $!"; 180 181Here again Perl works just like the shell in that the C<< ">" >> clobbers 182an existing file. 183 184As with the append mode, when you open a file in write-only mode, 185you can now write to that filehandle using any of C<print>, C<printf>, 186C<say>, C<write>, or C<syswrite>. 187 188What about read-write mode? You should probably pretend it doesn't exist, 189because opening text files in read-write mode is unlikely to do what you 190would like. See L<perlfaq5> for details. 191 192=head1 Opening Binary Files 193 194If the file to be opened contains binary data instead of text characters, 195then the C<MODE> argument to C<open> is a little different. Instead of 196specifying the encoding, you tell Perl that your data are in raw bytes. 197 198 my $filename = "/some/path/to/a/binary/file/goes/here"; 199 my $encoding = ":raw :bytes" 200 my $handle = undef; # this will be filled in on success 201 202And then open as before, choosing C<<< "<" >>>, C<<< ">>" >>>, or 203C<<< ">" >>> as needed: 204 205 open($handle, "< $encoding", $filename) 206 || die "$0: can't open $filename for reading: $!"; 207 208 open($handle, ">> $encoding", $filename) 209 || die "$0: can't open $filename for appending: $!"; 210 211 open($handle, "> $encoding", $filename) 212 || die "$0: can't open $filename in write-open mode: $!"; 213 214Alternately, you can change to binary mode on an existing handle this way: 215 216 binmode($handle) || die "cannot binmode handle"; 217 218This is especially handy for the handles that Perl has already opened for you. 219 220 binmode(STDIN) || die "cannot binmode STDIN"; 221 binmode(STDOUT) || die "cannot binmode STDOUT"; 222 223You can also pass C<binmode> an explicit encoding to change it on the fly. 224This isn't exactly "binary" mode, but we still use C<binmode> to do it: 225 226 binmode(STDIN, ":encoding(MacRoman)") || die "cannot binmode STDIN"; 227 binmode(STDOUT, ":encoding(UTF-8)") || die "cannot binmode STDOUT"; 228 229Once you have your binary file properly opened in the right mode, you can 230use all the same Perl I/O functions as you used on text files. However, 231you may wish to use the fixed-size C<read> instead of the variable-sized 232C<readline> for your input. 233 234Here's an example of how to copy a binary file: 235 236 my $BUFSIZ = 64 * (2 ** 10); 237 my $name_in = "/some/input/file"; 238 my $name_out = "/some/output/flie"; 239 240 my($in_fh, $out_fh, $buffer); 241 242 open($in_fh, "<", $name_in) 243 || die "$0: cannot open $name_in for reading: $!"; 244 open($out_fh, ">", $name_out) 245 || die "$0: cannot open $name_out for writing: $!"; 246 247 for my $fh ($in_fh, $out_fh) { 248 binmode($fh) || die "binmode failed"; 249 } 250 251 while (read($in_fh, $buffer, $BUFSIZ)) { 252 unless (print $out_fh $buffer) { 253 die "couldn't write to $name_out: $!"; 254 } 255 } 256 257 close($in_fh) || die "couldn't close $name_in: $!"; 258 close($out_fh) || die "couldn't close $name_out: $!"; 259 260=head1 Opening Pipes 261 262Perl also lets you open a filehandle into an external program or shell 263command rather than into a file. You can do this in order to pass data 264from your Perl program to an external command for further processing, or 265to receive data from another program for your own Perl program to 266process. 267 268Filehandles into commands are also known as I<pipes>, since they work on 269similar inter-process communication principles as Unix pipelines. Such a 270filehandle has an active program instead of a static file on its 271external end, but in every other sense it works just like a more typical 272file-based filehandle, with all the techniques discussed earlier in this 273article just as applicable. 274 275As such, you open a pipe using the same C<open> call that you use for 276opening files, setting the second (C<MODE>) argument to special 277characters that indicate either an input or an output pipe. Use C<"-|"> for a 278filehandle that will let your Perl program read data from an external 279program, and C<"|-"> for a filehandle that will send data to that 280program instead. 281 282=head2 Opening a pipe for reading 283 284Let's say you'd like your Perl program to process data stored in a nearby 285directory called C<unsorted>, which contains a number of textfiles. 286You'd also like your program to sort all the contents from these files 287into a single, alphabetically sorted list of unique lines before it 288starts processing them. 289 290You could do this through opening an ordinary filehandle into each of 291those files, gradually building up an in-memory array of all the file 292contents you load this way, and finally sorting and filtering that array 293when you've run out of files to load. I<Or>, you could offload all that 294merging and sorting into your operating system's own C<sort> command by 295opening a pipe directly into its output, and get to work that much 296faster. 297 298Here's how that might look: 299 300 open(my $sort_fh, '-|', 'sort -u unsorted/*.txt') 301 or die "Couldn't open a pipe into sort: $!"; 302 303 # And right away, we can start reading sorted lines: 304 while (my $line = <$sort_fh>) { 305 # 306 # ... Do something interesting with each $line here ... 307 # 308 } 309 310The second argument to C<open>, C<"-|">, makes it a read-pipe into a 311separate program, rather than an ordinary filehandle into a file. 312 313Note that the third argument to C<open> is a string containing the 314program name (C<sort>) plus all its arguments: in this case, C<-u> to 315specify unqiue sort, and then a fileglob specifying the files to sort. 316The resulting filehandle C<$sort_fh> works just like a read-only (C<< 317"<" >>) filehandle, and your program can subsequently read data 318from it as if it were opened onto an ordinary, single file. 319 320=head2 Opening a pipe for writing 321 322Continuing the previous example, let's say that your program has 323completed its processing, and the results sit in an array called 324C<@processed>. You want to print these lines to a file called 325C<numbered.txt> with a neatly formatted column of line-numbers. 326 327Certainly you could write your own code to do this — or, once again, 328you could kick that work over to another program. In this case, C<cat>, 329running with its own C<-n> option to activate line numbering, should do 330the trick: 331 332 open(my $cat_fh, '|-', 'cat -n > numbered.txt') 333 or die "Couldn't open a pipe into cat: $!"; 334 335 for my $line (@processed) { 336 print $cat_fh $line; 337 } 338 339Here, we use a second C<open> argument of C<"|-">, signifying that the 340filehandle assigned to C<$cat_fh> should be a write-pipe. We can then 341use it just as we would a write-only ordinary filehandle, including the 342basic function of C<print>-ing data to it. 343 344Note that the third argument, specifying the command that we wish to 345pipe to, sets up C<cat> to redirect its output via that C<< ">" >> 346symbol into the file C<numbered.txt>. This can start to look a little 347tricky, because that same symbol would have meant something 348entirely different had it showed it in the second argument to C<open>! 349But here in the third argument, it's simply part of the shell command that 350Perl will open the pipe into, and Perl itself doesn't invest any special 351meaning to it. 352 353=head2 Expressing the command as a list 354 355For opening pipes, Perl offers the option to call C<open> with a list 356comprising the desired command and all its own arguments as separate 357elements, rather than combining them into a single string as in the 358examples above. For instance, we could have phrased the C<open> call in 359the first example like this: 360 361 open(my $sort_fh, '-|', 'sort', '-u', glob('unsorted/*.txt')) 362 or die "Couldn't open a pipe into sort: $!"; 363 364When you call C<open> this way, Perl invokes the given command directly, 365bypassing the shell. As such, the shell won't try to interpret any 366special characters within the command's argument list, which might 367overwise have unwanted effects. This can make for safer, less 368error-prone C<open> calls, useful in cases such as passing in variables 369as arguments, or even just referring to filenames with spaces in them. 370 371However, when you I<do> want to pass a meaningful metacharacter to the 372shell, such with the C<"*"> inside that final C<unsorted/*.txt> argument 373here, you can't use this alternate syntax. In this case, we have worked 374around it via Perl's handy C<glob> built-in function, which evaluates 375its argument into a list of filenames — and we can safely pass that 376resulting list right into C<open>, as shown above. 377 378Note also that representing piped-command arguments in list form like 379this doesn't work on every platform. It will work on any Unix-based OS 380that provides a real C<fork> function (e.g. macOS or Linux), as well as 381on Windows when running Perl 5.22 or later. 382 383=head1 SEE ALSO 384 385The full documentation for L<C<open>|perlfunc/open FILEHANDLE,MODE,EXPR> 386provides a thorough reference to this function, beyond the best-practice 387basics covered here. 388 389=head1 AUTHOR and COPYRIGHT 390 391Copyright 2013 Tom Christiansen; now maintained by Perl5 Porters 392 393This documentation is free; you can redistribute it and/or modify it under 394the same terms as Perl itself. 395 396