xref: /openbsd/gnu/usr.bin/perl/lib/PerlIO.pm (revision 73471bf0)
1package PerlIO;
2
3our $VERSION = '1.11';
4
5# Map layer name to package that defines it
6our %alias;
7
8sub import
9{
10 my $class = shift;
11 while (@_)
12  {
13   my $layer = shift;
14   if (exists $alias{$layer})
15    {
16     $layer = $alias{$layer}
17    }
18   else
19    {
20     $layer = "${class}::$layer";
21    }
22   eval { require $layer =~ s{::}{/}gr . '.pm' };
23   warn $@ if $@;
24  }
25}
26
27sub F_UTF8 () { 0x8000 }
28
291;
30__END__
31
32=head1 NAME
33
34PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
35
36=head1 SYNOPSIS
37
38  # support platform-native and CRLF text files
39  open(my $fh, "<:crlf", "my.txt") or die "open failed: $!";
40
41  # append UTF-8 encoded text
42  open(my $fh, ">>:encoding(UTF-8)", "some.log")
43    or die "open failed: $!";
44
45  # portably open a binary file for reading
46  open(my $fh, "<", "his.jpg") or die "open failed: $!";
47  binmode($fh) or die "binmode failed: $!";
48
49  Shell:
50    PERLIO=:perlio perl ....
51
52=head1 DESCRIPTION
53
54When an undefined layer 'foo' is encountered in an C<open> or
55C<binmode> layer specification then C code performs the equivalent of:
56
57  use PerlIO 'foo';
58
59The Perl code in PerlIO.pm then attempts to locate a layer by doing
60
61  require PerlIO::foo;
62
63Otherwise the C<PerlIO> package is a place holder for additional
64PerlIO related functions.
65
66=head2 Layers
67
68Generally speaking, PerlIO layers (previously sometimes referred to as
69"disciplines") are an ordered stack applied to a filehandle (specified as
70a space- or colon-separated list, conventionally written with a leading
71colon).  Each layer performs some operation on any input or output, except
72when bypassed such as with C<sysread> or C<syswrite>.  Read operations go
73through the stack in the order they are set (left to right), and write
74operations in the reverse order.
75
76There are also layers which actually just set flags on lower layers, or
77layers that modify the current stack but don't persist on the stack
78themselves; these are referred to as pseudo-layers.
79
80When opening a handle, it will be opened with any layers specified
81explicitly in the open() call (or the platform defaults, if specified as
82a colon with no following layers).
83
84If layers are not explicitly specified, the handle will be opened with the
85layers specified by the L<${^OPEN}|perlvar/"${^OPEN}"> variable (usually
86set by using the L<open> pragma for a lexical scope, or the C<-C>
87command-line switch or C<PERL_UNICODE> environment variable for the main
88program scope).
89
90If layers are not specified in the open() call or C<${^OPEN}> variable,
91the handle will be opened with the default layer stack configured for that
92architecture; see L</"Defaults and how to override them">.
93
94Some layers will automatically insert required lower level layers if not
95present; for example C<:perlio> will insert C<:unix> below itself for low
96level IO, and C<:encoding> will insert the platform defaults for buffered
97IO.
98
99The C<binmode> function can be called on an opened handle to push
100additional layers onto the stack, which may also modify the existing
101layers.  C<binmode> called with no layers will remove or unset any
102existing layers which transform the byte stream, making the handle
103suitable for binary data.
104
105The following layers are currently defined:
106
107=over 4
108
109=item :unix
110
111Lowest level layer which provides basic PerlIO operations in terms of
112UNIX/POSIX numeric file descriptor calls
113(open(), read(), write(), lseek(), close()).
114It is used even on non-Unix architectures, and most other layers operate on
115top of it.
116
117=item :stdio
118
119Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc.  Note
120that as this is "real" stdio it will ignore any layers beneath it and
121go straight to the operating system via the C library as usual.
122This layer implements both low level IO and buffering, but is rarely used
123on modern architectures.
124
125=item :perlio
126
127A from scratch implementation of buffering for PerlIO. Provides fast
128access to the buffer for C<sv_gets> which implements Perl's readline/E<lt>E<gt>
129and in general attempts to minimize data copying.
130
131C<:perlio> will insert a C<:unix> layer below itself to do low level IO.
132
133=item :crlf
134
135A layer that implements DOS/Windows like CRLF line endings.  On read
136converts pairs of CR,LF to a single "\n" newline character.  On write
137converts each "\n" to a CR,LF pair.  Note that this layer will silently
138refuse to be pushed on top of itself.
139
140It currently does I<not> mimic MS-DOS as far as treating of Control-Z
141as being an end-of-file marker.
142
143On DOS/Windows like architectures where this layer is part of the defaults,
144it also acts like the C<:perlio> layer, and removing the CRLF translation
145(such as with C<:raw>) will only unset the CRLF translation flag.  Since
146Perl 5.14, you can also apply another C<:crlf> layer later, such as when
147the CRLF translation must occur after an encoding layer.  On other
148architectures, it is a mundane CRLF translation layer and can be added and
149removed normally.
150
151    # translate CRLF after encoding on Perl 5.14 or newer
152    binmode $fh, ":raw:encoding(UTF-16LE):crlf"
153      or die "binmode failed: $!";
154
155=item :utf8
156
157Pseudo-layer that declares that the stream accepts Perl's I<internal>
158upgraded encoding of characters, which is approximately UTF-8 on ASCII
159machines, but UTF-EBCDIC on EBCDIC machines.  This allows any character
160Perl can represent to be read from or written to the stream.
161
162This layer (which actually sets a flag on the preceding layer, and is
163implicitly set by any C<:encoding> layer) does not translate or validate
164byte sequences.  It instead indicates that the byte stream will have been
165arranged by other layers to be provided in Perl's internal upgraded
166encoding, which Perl code (and correctly written XS code) will interpret
167as decoded Unicode characters.
168
169B<CAUTION>: Do not use this layer to translate from UTF-8 bytes, as
170invalid UTF-8 or binary data will result in malformed Perl strings.  It is
171unlikely to produce invalid UTF-8 when used for output, though it will
172instead produce UTF-EBCDIC on EBCDIC systems.  The C<:encoding(UTF-8)>
173layer (hyphen is significant) is preferred as it will ensure translation
174between valid UTF-8 bytes and valid Unicode characters.
175
176=item :bytes
177
178This is the inverse of the C<:utf8> pseudo-layer.  It turns off the flag
179on the layer below so that data read from it is considered to
180be Perl's internal downgraded encoding, thus interpreted as the native
181single-byte encoding of Latin-1 or EBCDIC.  Likewise on output Perl will
182warn if a "wide" character (a codepoint not in the range 0..255) is
183written to a such a stream.
184
185This is very dangerous to push on a handle using an C<:encoding> layer,
186as such a layer assumes to be working with Perl's internal upgraded
187encoding, so you will likely get a mangled result.  Instead use C<:raw> or
188C<:pop> to remove encoding layers.
189
190=item :raw
191
192The C<:raw> pseudo-layer is I<defined> as being identical to calling
193C<binmode($fh)> - the stream is made suitable for passing binary data,
194i.e. each byte is passed as-is. The stream will still be buffered
195(but this was not always true before Perl 5.14).
196
197In Perl 5.6 and some books the C<:raw> layer is documented as the inverse
198of the C<:crlf> layer. That is no longer the case - other layers which
199would alter the binary nature of the stream are also disabled.  If you
200want UNIX line endings on a platform that normally does CRLF translation,
201but still want UTF-8 or encoding defaults, the appropriate thing to do is
202to add C<:perlio> to the PERLIO environment variable, or open the handle
203explicitly with that layer, to replace the platform default of C<:crlf>.
204
205The implementation of C<:raw> is as a pseudo-layer which when "pushed"
206pops itself and then any layers which would modify the binary data stream.
207(Undoing C<:utf8> and C<:crlf> may be implemented by clearing flags
208rather than popping layers but that is an implementation detail.)
209
210As a consequence of the fact that C<:raw> normally pops layers,
211it usually only makes sense to have it as the only or first element in
212a layer specification.  When used as the first element it provides
213a known base on which to build e.g.
214
215    open(my $fh,">:raw:encoding(UTF-8)",...)
216      or die "open failed: $!";
217
218will construct a "binary" stream regardless of the platform defaults,
219but then enable UTF-8 translation.
220
221=item :pop
222
223A pseudo-layer that removes the top-most layer. Gives Perl code a
224way to manipulate the layer stack.  Note that C<:pop> only works on
225real layers and will not undo the effects of pseudo-layers or flags
226like C<:utf8>.  An example of a possible use might be:
227
228    open(my $fh,...) or die "open failed: $!";
229    ...
230    binmode($fh,":encoding(...)") or die "binmode failed: $!";
231    # next chunk is encoded
232    ...
233    binmode($fh,":pop") or die "binmode failed: $!";
234    # back to un-encoded
235
236A more elegant (and safer) interface is needed.
237
238=item :win32
239
240On Win32 platforms this I<experimental> layer uses the native "handle" IO
241rather than the unix-like numeric file descriptor layer. Known to be
242buggy as of Perl 5.8.2.
243
244=back
245
246=head2 Custom Layers
247
248It is possible to write custom layers in addition to the above builtin
249ones, both in C/XS and Perl, as a module named C<< PerlIO::<layer name> >>.
250Some custom layers come with the Perl distribution.
251
252=over 4
253
254=item :encoding
255
256Use C<:encoding(ENCODING)> to transparently do character set and encoding
257transformations, for example from Shift-JIS to Unicode.  Note that an
258C<:encoding> also enables C<:utf8>.  See L<PerlIO::encoding> for more
259information.
260
261=item :mmap
262
263A layer which implements "reading" of files by using C<mmap()> to
264make a (whole) file appear in the process's address space, and then
265using that as PerlIO's "buffer". This I<may> be faster in certain
266circumstances for large files, and may result in less physical memory
267use when multiple processes are reading the same file.
268
269Files which are not C<mmap()>-able revert to behaving like the C<:perlio>
270layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write
271needs extra house-keeping (to extend the file) which negates any advantage.
272
273The C<:mmap> layer will not exist if the platform does not support C<mmap()>.
274See L<PerlIO::mmap> for more information.
275
276=item :via
277
278C<:via(MODULE)> allows a transformation to be applied by an arbitrary Perl
279module, for example compression / decompression, encryption / decryption.
280See L<PerlIO::via> for more information.
281
282=item :scalar
283
284A layer implementing "in memory" files using scalar variables,
285automatically used in place of the platform defaults for IO when opening
286such a handle.  As such, the scalar is expected to act like a file, only
287containing or storing bytes.  See L<PerlIO::scalar> for more information.
288
289=back
290
291=head2 Alternatives to raw
292
293To get a binary stream an alternate method is to use:
294
295    open(my $fh,"<","whatever") or die "open failed: $!";
296    binmode($fh) or die "binmode failed: $!";
297
298This has the advantage of being backward compatible with older versions
299of Perl that did not use PerlIO or where C<:raw> was buggy (as it was
300before Perl 5.14).
301
302To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>)
303in the open call:
304
305    open(my $fh,"<:unix",$path) or die "open failed: $!";
306
307=head2 Defaults and how to override them
308
309If the platform is MS-DOS like and normally does CRLF to "\n"
310translation for text files then the default layers are:
311
312  :unix:crlf
313
314Otherwise if C<Configure> found out how to do "fast" IO using the system's
315stdio (not common on modern architectures), then the default layers are:
316
317  :stdio
318
319Otherwise the default layers are
320
321  :unix:perlio
322
323Note that the "default stack" depends on the operating system and on the
324Perl version, and both the compile-time and runtime configurations of
325Perl.  The default can be overridden by setting the environment variable
326PERLIO to a space or colon separated list of layers, however this cannot
327be used to set layers that require loading modules like C<:encoding>.
328
329This can be used to see the effect of/bugs in the various layers e.g.
330
331  cd .../perl/t
332  PERLIO=:stdio  ./perl harness
333  PERLIO=:perlio ./perl harness
334
335For the various values of PERLIO see L<perlrun/PERLIO>.
336
337The following table summarizes the default layers on UNIX-like and
338DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>:
339
340 PERLIO     UNIX-like                   DOS-like
341 ------     ---------                   --------
342 unset / "" :unix:perlio / :stdio [1]   :unix:crlf
343 :stdio     :stdio                      :stdio
344 :perlio    :unix:perlio                :unix:perlio
345
346 # [1] ":stdio" if Configure found out how to do "fast stdio" (depends
347 # on the stdio implementation) and in Perl 5.8, else ":unix:perlio"
348
349=head2 Querying the layers of filehandles
350
351The following returns the B<names> of the PerlIO layers on a filehandle.
352
353   my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
354
355The layers are returned in the order an open() or binmode() call would
356use them, and without colons.
357
358By default the layers from the input side of the filehandle are
359returned; to get the output side, use the optional C<output> argument:
360
361   my @layers = PerlIO::get_layers($fh, output => 1);
362
363(Usually the layers are identical on either side of a filehandle but
364for example with sockets there may be differences.)
365
366There is no set_layers(), nor does get_layers() return a tied array
367mirroring the stack, or anything fancy like that.  This is not
368accidental or unintentional.  The PerlIO layer stack is a bit more
369complicated than just a stack (see for example the behaviour of C<:raw>).
370You are supposed to use open() and binmode() to manipulate the stack.
371
372B<Implementation details follow, please close your eyes.>
373
374The arguments to layers are by default returned in parentheses after
375the name of the layer, and certain layers (like C<:utf8>) are not real
376layers but instead flags on real layers; to get all of these returned
377separately, use the optional C<details> argument:
378
379   my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
380
381The result will be up to be three times the number of layers:
382the first element will be a name, the second element the arguments
383(unspecified arguments will be C<undef>), the third element the flags,
384the fourth element a name again, and so forth.
385
386B<You may open your eyes now.>
387
388=head1 AUTHOR
389
390Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt>
391
392=head1 SEE ALSO
393
394L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>,
395L<Encode>
396
397=cut
398