1.. _The_Implementation_of_Standard_I/O:
2
3**********************************
4The Implementation of Standard I/O
5**********************************
6
7GNAT implements all the required input-output facilities described in
8A.6 through A.14.  These sections of the Ada Reference Manual describe the
9required behavior of these packages from the Ada point of view, and if
10you are writing a portable Ada program that does not need to know the
11exact manner in which Ada maps to the outside world when it comes to
12reading or writing external files, then you do not need to read this
13chapter.  As long as your files are all regular files (not pipes or
14devices), and as long as you write and read the files only from Ada, the
15description in the Ada Reference Manual is sufficient.
16
17However, if you want to do input-output to pipes or other devices, such
18as the keyboard or screen, or if the files you are dealing with are
19either generated by some other language, or to be read by some other
20language, then you need to know more about the details of how the GNAT
21implementation of these input-output facilities behaves.
22
23In this chapter we give a detailed description of exactly how GNAT
24interfaces to the file system.  As always, the sources of the system are
25available to you for answering questions at an even more detailed level,
26but for most purposes the information in this chapter will suffice.
27
28Another reason that you may need to know more about how input-output is
29implemented arises when you have a program written in mixed languages
30where, for example, files are shared between the C and Ada sections of
31the same program.  GNAT provides some additional facilities, in the form
32of additional child library packages, that facilitate this sharing, and
33these additional facilities are also described in this chapter.
34
35.. _Standard_I/O_Packages:
36
37Standard I/O Packages
38=====================
39
40The Standard I/O packages described in Annex A for
41
42*
43  Ada.Text_IO
44*
45  Ada.Text_IO.Complex_IO
46*
47  Ada.Text_IO.Text_Streams
48*
49  Ada.Wide_Text_IO
50*
51  Ada.Wide_Text_IO.Complex_IO
52*
53  Ada.Wide_Text_IO.Text_Streams
54*
55  Ada.Wide_Wide_Text_IO
56*
57  Ada.Wide_Wide_Text_IO.Complex_IO
58*
59  Ada.Wide_Wide_Text_IO.Text_Streams
60*
61  Ada.Stream_IO
62*
63  Ada.Sequential_IO
64*
65  Ada.Direct_IO
66
67are implemented using the C
68library streams facility; where
69
70*
71  All files are opened using `fopen`.
72*
73  All input/output operations use `fread`/`fwrite`.
74
75There is no internal buffering of any kind at the Ada library level. The only
76buffering is that provided at the system level in the implementation of the
77library routines that support streams. This facilitates shared use of these
78streams by mixed language programs. Note though that system level buffering is
79explicitly enabled at elaboration of the standard I/O packages and that can
80have an impact on mixed language programs, in particular those using I/O before
81calling the Ada elaboration routine (e.g., adainit). It is recommended to call
82the Ada elaboration routine before performing any I/O or when impractical,
83flush the common I/O streams and in particular Standard_Output before
84elaborating the Ada code.
85
86.. _FORM_Strings:
87
88FORM Strings
89============
90
91The format of a FORM string in GNAT is:
92
93
94::
95
96  "keyword=value,keyword=value,...,keyword=value"
97
98
99where letters may be in upper or lower case, and there are no spaces
100between values.  The order of the entries is not important.  Currently
101the following keywords defined.
102
103
104::
105
106  TEXT_TRANSLATION=[YES|NO|TEXT|BINARY|U8TEXT|WTEXT|U16TEXT]
107  SHARED=[YES|NO]
108  WCEM=[n|h|u|s|e|8|b]
109  ENCODING=[UTF8|8BITS]
110
111
112The use of these parameters is described later in this section. If an
113unrecognized keyword appears in a form string, it is silently ignored
114and not considered invalid.
115
116.. _Direct_IO:
117
118Direct_IO
119=========
120
121Direct_IO can only be instantiated for definite types.  This is a
122restriction of the Ada language, which means that the records are fixed
123length (the length being determined by ``type'Size``, rounded
124up to the next storage unit boundary if necessary).
125
126The records of a Direct_IO file are simply written to the file in index
127sequence, with the first record starting at offset zero, and subsequent
128records following.  There is no control information of any kind.  For
129example, if 32-bit integers are being written, each record takes
1304-bytes, so the record at index `K` starts at offset
131(`K`-1)*4.
132
133There is no limit on the size of Direct_IO files, they are expanded as
134necessary to accommodate whatever records are written to the file.
135
136.. _Sequential_IO:
137
138Sequential_IO
139=============
140
141Sequential_IO may be instantiated with either a definite (constrained)
142or indefinite (unconstrained) type.
143
144For the definite type case, the elements written to the file are simply
145the memory images of the data values with no control information of any
146kind.  The resulting file should be read using the same type, no validity
147checking is performed on input.
148
149For the indefinite type case, the elements written consist of two
150parts.  First is the size of the data item, written as the memory image
151of a `Interfaces.C.size_t` value, followed by the memory image of
152the data value.  The resulting file can only be read using the same
153(unconstrained) type.  Normal assignment checks are performed on these
154read operations, and if these checks fail, `Data_Error` is
155raised.  In particular, in the array case, the lengths must match, and in
156the variant record case, if the variable for a particular read operation
157is constrained, the discriminants must match.
158
159Note that it is not possible to use Sequential_IO to write variable
160length array items, and then read the data back into different length
161arrays.  For example, the following will raise `Data_Error`:
162
163
164.. code-block:: ada
165
166   package IO is new Sequential_IO (String);
167   F : IO.File_Type;
168   S : String (1..4);
169   ...
170   IO.Create (F)
171   IO.Write (F, "hello!")
172   IO.Reset (F, Mode=>In_File);
173   IO.Read (F, S);
174   Put_Line (S);
175
176
177
178On some Ada implementations, this will print `hell`, but the program is
179clearly incorrect, since there is only one element in the file, and that
180element is the string `hello!`.
181
182In Ada 95 and Ada 2005, this kind of behavior can be legitimately achieved
183using Stream_IO, and this is the preferred mechanism.  In particular, the
184above program fragment rewritten to use Stream_IO will work correctly.
185
186.. _Text_IO:
187
188Text_IO
189=======
190
191Text_IO files consist of a stream of characters containing the following
192special control characters:
193
194
195::
196
197  LF (line feed, 16#0A#) Line Mark
198  FF (form feed, 16#0C#) Page Mark
199
200
201A canonical Text_IO file is defined as one in which the following
202conditions are met:
203
204*
205  The character `LF` is used only as a line mark, i.e., to mark the end
206  of the line.
207
208*
209  The character `FF` is used only as a page mark, i.e., to mark the
210  end of a page and consequently can appear only immediately following a
211  `LF` (line mark) character.
212
213*
214  The file ends with either `LF` (line mark) or `LF`-`FF`
215  (line mark, page mark).  In the former case, the page mark is implicitly
216  assumed to be present.
217
218A file written using Text_IO will be in canonical form provided that no
219explicit `LF` or `FF` characters are written using `Put`
220or `Put_Line`.  There will be no `FF` character at the end of
221the file unless an explicit `New_Page` operation was performed
222before closing the file.
223
224A canonical Text_IO file that is a regular file (i.e., not a device or a
225pipe) can be read using any of the routines in Text_IO.  The
226semantics in this case will be exactly as defined in the Ada Reference
227Manual, and all the routines in Text_IO are fully implemented.
228
229A text file that does not meet the requirements for a canonical Text_IO
230file has one of the following:
231
232*
233  The file contains `FF` characters not immediately following a
234  `LF` character.
235
236*
237  The file contains `LF` or `FF` characters written by
238  `Put` or `Put_Line`, which are not logically considered to be
239  line marks or page marks.
240
241*
242  The file ends in a character other than `LF` or `FF`,
243  i.e., there is no explicit line mark or page mark at the end of the file.
244
245Text_IO can be used to read such non-standard text files but subprograms
246to do with line or page numbers do not have defined meanings.  In
247particular, a `FF` character that does not follow a `LF`
248character may or may not be treated as a page mark from the point of
249view of page and line numbering.  Every `LF` character is considered
250to end a line, and there is an implied `LF` character at the end of
251the file.
252
253.. _Stream_Pointer_Positioning:
254
255Stream Pointer Positioning
256--------------------------
257
258`Ada.Text_IO` has a definition of current position for a file that
259is being read.  No internal buffering occurs in Text_IO, and usually the
260physical position in the stream used to implement the file corresponds
261to this logical position defined by Text_IO.  There are two exceptions:
262
263*
264  After a call to `End_Of_Page` that returns `True`, the stream
265  is positioned past the `LF` (line mark) that precedes the page
266  mark.  Text_IO maintains an internal flag so that subsequent read
267  operations properly handle the logical position which is unchanged by
268  the `End_Of_Page` call.
269
270*
271  After a call to `End_Of_File` that returns `True`, if the
272  Text_IO file was positioned before the line mark at the end of file
273  before the call, then the logical position is unchanged, but the stream
274  is physically positioned right at the end of file (past the line mark,
275  and past a possible page mark following the line mark.  Again Text_IO
276  maintains internal flags so that subsequent read operations properly
277  handle the logical position.
278
279These discrepancies have no effect on the observable behavior of
280Text_IO, but if a single Ada stream is shared between a C program and
281Ada program, or shared (using ``shared=yes`` in the form string)
282between two Ada files, then the difference may be observable in some
283situations.
284
285.. _Reading_and_Writing_Non-Regular_Files:
286
287Reading and Writing Non-Regular Files
288-------------------------------------
289
290A non-regular file is a device (such as a keyboard), or a pipe.  Text_IO
291can be used for reading and writing.  Writing is not affected and the
292sequence of characters output is identical to the normal file case, but
293for reading, the behavior of Text_IO is modified to avoid undesirable
294look-ahead as follows:
295
296An input file that is not a regular file is considered to have no page
297marks.  Any `Ascii.FF` characters (the character normally used for a
298page mark) appearing in the file are considered to be data
299characters.  In particular:
300
301*
302  `Get_Line` and `Skip_Line` do not test for a page mark
303  following a line mark.  If a page mark appears, it will be treated as a
304  data character.
305
306*
307  This avoids the need to wait for an extra character to be typed or
308  entered from the pipe to complete one of these operations.
309
310*
311  `End_Of_Page` always returns `False`
312
313*
314  `End_Of_File` will return `False` if there is a page mark at
315  the end of the file.
316
317Output to non-regular files is the same as for regular files.  Page marks
318may be written to non-regular files using `New_Page`, but as noted
319above they will not be treated as page marks on input if the output is
320piped to another Ada program.
321
322Another important discrepancy when reading non-regular files is that the end
323of file indication is not 'sticky'.  If an end of file is entered, e.g., by
324pressing the :kbd:`EOT` key,
325then end of file
326is signaled once (i.e., the test `End_Of_File`
327will yield `True`, or a read will
328raise `End_Error`), but then reading can resume
329to read data past that end of
330file indication, until another end of file indication is entered.
331
332.. _Get_Immediate:
333
334Get_Immediate
335-------------
336
337.. index:: Get_Immediate
338
339Get_Immediate returns the next character (including control characters)
340from the input file.  In particular, Get_Immediate will return LF or FF
341characters used as line marks or page marks.  Such operations leave the
342file positioned past the control character, and it is thus not treated
343as having its normal function.  This means that page, line and column
344counts after this kind of Get_Immediate call are set as though the mark
345did not occur.  In the case where a Get_Immediate leaves the file
346positioned between the line mark and page mark (which is not normally
347possible), it is undefined whether the FF character will be treated as a
348page mark.
349
350.. _Treating_Text_IO_Files_as_Streams:
351
352Treating Text_IO Files as Streams
353---------------------------------
354
355.. index:: Stream files
356
357The package `Text_IO.Streams` allows a Text_IO file to be treated
358as a stream.  Data written to a Text_IO file in this stream mode is
359binary data.  If this binary data contains bytes 16#0A# (`LF`) or
36016#0C# (`FF`), the resulting file may have non-standard
361format.  Similarly if read operations are used to read from a Text_IO
362file treated as a stream, then `LF` and `FF` characters may be
363skipped and the effect is similar to that described above for
364`Get_Immediate`.
365
366.. _Text_IO_Extensions:
367
368Text_IO Extensions
369------------------
370
371.. index:: Text_IO extensions
372
373A package GNAT.IO_Aux in the GNAT library provides some useful extensions
374to the standard `Text_IO` package:
375
376* function File_Exists (Name : String) return Boolean;
377  Determines if a file of the given name exists.
378
379* function Get_Line return String;
380  Reads a string from the standard input file.  The value returned is exactly
381  the length of the line that was read.
382
383* function Get_Line (File : Ada.Text_IO.File_Type) return String;
384  Similar, except that the parameter File specifies the file from which
385  the string is to be read.
386
387
388.. _Text_IO_Facilities_for_Unbounded_Strings:
389
390Text_IO Facilities for Unbounded Strings
391----------------------------------------
392
393.. index:: Text_IO for unbounded strings
394
395.. index:: Unbounded_String, Text_IO operations
396
397The package `Ada.Strings.Unbounded.Text_IO`
398in library files `a-suteio.ads/adb` contains some GNAT-specific
399subprograms useful for Text_IO operations on unbounded strings:
400
401
402* function Get_Line (File : File_Type) return Unbounded_String;
403  Reads a line from the specified file
404  and returns the result as an unbounded string.
405
406* procedure Put (File : File_Type; U : Unbounded_String);
407  Writes the value of the given unbounded string to the specified file
408  Similar to the effect of
409  `Put (To_String (U))` except that an extra copy is avoided.
410
411* procedure Put_Line (File : File_Type; U : Unbounded_String);
412  Writes the value of the given unbounded string to the specified file,
413  followed by a `New_Line`.
414  Similar to the effect of `Put_Line (To_String (U))` except
415  that an extra copy is avoided.
416
417In the above procedures, `File` is of type `Ada.Text_IO.File_Type`
418and is optional.  If the parameter is omitted, then the standard input or
419output file is referenced as appropriate.
420
421The package `Ada.Strings.Wide_Unbounded.Wide_Text_IO` in library
422files :file:`a-swuwti.ads` and :file:`a-swuwti.adb` provides similar extended
423`Wide_Text_IO` functionality for unbounded wide strings.
424
425The package `Ada.Strings.Wide_Wide_Unbounded.Wide_Wide_Text_IO` in library
426files :file:`a-szuzti.ads` and :file:`a-szuzti.adb` provides similar extended
427`Wide_Wide_Text_IO` functionality for unbounded wide wide strings.
428
429.. _Wide_Text_IO:
430
431Wide_Text_IO
432============
433
434`Wide_Text_IO` is similar in most respects to Text_IO, except that
435both input and output files may contain special sequences that represent
436wide character values.  The encoding scheme for a given file may be
437specified using a FORM parameter:
438
439
440::
441
442  WCEM=`x`
443
444
445as part of the FORM string (WCEM = wide character encoding method),
446where `x` is one of the following characters
447
448========== ====================
449Character  Encoding
450========== ====================
451*h*        Hex ESC encoding
452*u*        Upper half encoding
453*s*        Shift-JIS encoding
454*e*        EUC Encoding
455*8*        UTF-8 encoding
456*b*        Brackets encoding
457========== ====================
458
459The encoding methods match those that
460can be used in a source
461program, but there is no requirement that the encoding method used for
462the source program be the same as the encoding method used for files,
463and different files may use different encoding methods.
464
465The default encoding method for the standard files, and for opened files
466for which no WCEM parameter is given in the FORM string matches the
467wide character encoding specified for the main program (the default
468being brackets encoding if no coding method was specified with -gnatW).
469
470
471
472*Hex Coding*
473  In this encoding, a wide character is represented by a five character
474  sequence:
475
476
477::
478
479    ESC a b c d
480
481..
482
483  where `a`, `b`, `c`, `d` are the four hexadecimal
484  characters (using upper case letters) of the wide character code.  For
485  example, ESC A345 is used to represent the wide character with code
486  16#A345#.  This scheme is compatible with use of the full
487  `Wide_Character` set.
488
489
490*Upper Half Coding*
491  The wide character with encoding 16#abcd#, where the upper bit is on
492  (i.e., a is in the range 8-F) is represented as two bytes 16#ab# and
493  16#cd#.  The second byte may never be a format control character, but is
494  not required to be in the upper half.  This method can be also used for
495  shift-JIS or EUC where the internal coding matches the external coding.
496
497
498*Shift JIS Coding*
499  A wide character is represented by a two character sequence 16#ab# and
500  16#cd#, with the restrictions described for upper half encoding as
501  described above.  The internal character code is the corresponding JIS
502  character according to the standard algorithm for Shift-JIS
503  conversion.  Only characters defined in the JIS code set table can be
504  used with this encoding method.
505
506
507*EUC Coding*
508  A wide character is represented by a two character sequence 16#ab# and
509  16#cd#, with both characters being in the upper half.  The internal
510  character code is the corresponding JIS character according to the EUC
511  encoding algorithm.  Only characters defined in the JIS code set table
512  can be used with this encoding method.
513
514
515*UTF-8 Coding*
516  A wide character is represented using
517  UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO
518  10646-1/Am.2.  Depending on the character value, the representation
519  is a one, two, or three byte sequence:
520
521
522::
523
524    16#0000#-16#007f#: 2#0xxxxxxx#
525    16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx#
526    16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#
527
528..
529
530  where the `xxx` bits correspond to the left-padded bits of the
531  16-bit character value.  Note that all lower half ASCII characters
532  are represented as ASCII bytes and all upper half characters and
533  other wide characters are represented as sequences of upper-half
534  (The full UTF-8 scheme allows for encoding 31-bit characters as
535  6-byte sequences, but in this implementation, all UTF-8 sequences
536  of four or more bytes length will raise a Constraint_Error, as
537  will all invalid UTF-8 sequences.)
538
539
540*Brackets Coding*
541  In this encoding, a wide character is represented by the following eight
542  character sequence:
543
544
545::
546
547    [ " a b c d " ]
548
549..
550
551  where `a`, `b`, `c`, `d` are the four hexadecimal
552  characters (using uppercase letters) of the wide character code.  For
553  example, `["A345"]` is used to represent the wide character with code
554  `16#A345#`.
555  This scheme is compatible with use of the full Wide_Character set.
556  On input, brackets coding can also be used for upper half characters,
557  e.g., `["C1"]` for lower case a.  However, on output, brackets notation
558  is only used for wide characters with a code greater than `16#FF#`.
559
560  Note that brackets coding is not normally used in the context of
561  Wide_Text_IO or Wide_Wide_Text_IO, since it is really just designed as
562  a portable way of encoding source files. In the context of Wide_Text_IO
563  or Wide_Wide_Text_IO, it can only be used if the file does not contain
564  any instance of the left bracket character other than to encode wide
565  character values using the brackets encoding method. In practice it is
566  expected that some standard wide character encoding method such
567  as UTF-8 will be used for text input output.
568
569  If brackets notation is used, then any occurrence of a left bracket
570  in the input file which is not the start of a valid wide character
571  sequence will cause Constraint_Error to be raised. It is possible to
572  encode a left bracket as ["5B"] and Wide_Text_IO and Wide_Wide_Text_IO
573  input will interpret this as a left bracket.
574
575  However, when a left bracket is output, it will be output as a left bracket
576  and not as ["5B"]. We make this decision because for normal use of
577  Wide_Text_IO for outputting messages, it is unpleasant to clobber left
578  brackets. For example, if we write:
579
580
581  .. code-block:: ada
582
583       Put_Line ("Start of output [first run]");
584
585
586  we really do not want to have the left bracket in this message clobbered so
587  that the output reads:
588
589
590::
591
592       Start of output ["5B"]first run]
593
594..
595
596  In practice brackets encoding is reasonably useful for normal Put_Line use
597  since we won't get confused between left brackets and wide character
598  sequences in the output. But for input, or when files are written out
599  and read back in, it really makes better sense to use one of the standard
600  encoding methods such as UTF-8.
601
602
603For the coding schemes other than UTF-8, Hex, or Brackets encoding,
604not all wide character
605values can be represented.  An attempt to output a character that cannot
606be represented using the encoding scheme for the file causes
607Constraint_Error to be raised.  An invalid wide character sequence on
608input also causes Constraint_Error to be raised.
609
610.. _Stream_Pointer_Positioning_1:
611
612Stream Pointer Positioning
613--------------------------
614
615`Ada.Wide_Text_IO` is similar to `Ada.Text_IO` in its handling
616of stream pointer positioning (:ref:`Text_IO`).  There is one additional
617case:
618
619If `Ada.Wide_Text_IO.Look_Ahead` reads a character outside the
620normal lower ASCII set (i.e., a character in the range:
621
622
623.. code-block:: ada
624
625  Wide_Character'Val (16#0080#) .. Wide_Character'Val (16#FFFF#)
626
627
628then although the logical position of the file pointer is unchanged by
629the `Look_Ahead` call, the stream is physically positioned past the
630wide character sequence.  Again this is to avoid the need for buffering
631or backup, and all `Wide_Text_IO` routines check the internal
632indication that this situation has occurred so that this is not visible
633to a normal program using `Wide_Text_IO`.  However, this discrepancy
634can be observed if the wide text file shares a stream with another file.
635
636.. _Reading_and_Writing_Non-Regular_Files_1:
637
638Reading and Writing Non-Regular Files
639-------------------------------------
640
641As in the case of Text_IO, when a non-regular file is read, it is
642assumed that the file contains no page marks (any form characters are
643treated as data characters), and `End_Of_Page` always returns
644`False`.  Similarly, the end of file indication is not sticky, so
645it is possible to read beyond an end of file.
646
647.. _Wide_Wide_Text_IO:
648
649Wide_Wide_Text_IO
650=================
651
652`Wide_Wide_Text_IO` is similar in most respects to Text_IO, except that
653both input and output files may contain special sequences that represent
654wide wide character values.  The encoding scheme for a given file may be
655specified using a FORM parameter:
656
657
658::
659
660  WCEM=`x`
661
662
663as part of the FORM string (WCEM = wide character encoding method),
664where `x` is one of the following characters
665
666========== ====================
667Character  Encoding
668========== ====================
669*h*        Hex ESC encoding
670*u*        Upper half encoding
671*s*        Shift-JIS encoding
672*e*        EUC Encoding
673*8*        UTF-8 encoding
674*b*        Brackets encoding
675========== ====================
676
677
678The encoding methods match those that
679can be used in a source
680program, but there is no requirement that the encoding method used for
681the source program be the same as the encoding method used for files,
682and different files may use different encoding methods.
683
684The default encoding method for the standard files, and for opened files
685for which no WCEM parameter is given in the FORM string matches the
686wide character encoding specified for the main program (the default
687being brackets encoding if no coding method was specified with -gnatW).
688
689
690
691*UTF-8 Coding*
692  A wide character is represented using
693  UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO
694  10646-1/Am.2.  Depending on the character value, the representation
695  is a one, two, three, or four byte sequence:
696
697
698::
699
700    16#000000#-16#00007f#: 2#0xxxxxxx#
701    16#000080#-16#0007ff#: 2#110xxxxx# 2#10xxxxxx#
702    16#000800#-16#00ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#
703    16#010000#-16#10ffff#: 2#11110xxx# 2#10xxxxxx# 2#10xxxxxx# 2#10xxxxxx#
704
705..
706
707  where the `xxx` bits correspond to the left-padded bits of the
708  21-bit character value.  Note that all lower half ASCII characters
709  are represented as ASCII bytes and all upper half characters and
710  other wide characters are represented as sequences of upper-half
711  characters.
712
713
714*Brackets Coding*
715  In this encoding, a wide wide character is represented by the following eight
716  character sequence if is in wide character range
717
718
719::
720
721    [ " a b c d " ]
722
723..
724
725  and by the following ten character sequence if not
726
727
728::
729
730    [ " a b c d e f " ]
731
732..
733
734  where `a`, `b`, `c`, `d`, `e`, and `f`
735  are the four or six hexadecimal
736  characters (using uppercase letters) of the wide wide character code.  For
737  example, `["01A345"]` is used to represent the wide wide character
738  with code `16#01A345#`.
739
740  This scheme is compatible with use of the full Wide_Wide_Character set.
741  On input, brackets coding can also be used for upper half characters,
742  e.g., `["C1"]` for lower case a.  However, on output, brackets notation
743  is only used for wide characters with a code greater than `16#FF#`.
744
745
746If is also possible to use the other Wide_Character encoding methods,
747such as Shift-JIS, but the other schemes cannot support the full range
748of wide wide characters.
749An attempt to output a character that cannot
750be represented using the encoding scheme for the file causes
751Constraint_Error to be raised.  An invalid wide character sequence on
752input also causes Constraint_Error to be raised.
753
754.. _Stream_Pointer_Positioning_2:
755
756Stream Pointer Positioning
757--------------------------
758
759`Ada.Wide_Wide_Text_IO` is similar to `Ada.Text_IO` in its handling
760of stream pointer positioning (:ref:`Text_IO`).  There is one additional
761case:
762
763If `Ada.Wide_Wide_Text_IO.Look_Ahead` reads a character outside the
764normal lower ASCII set (i.e., a character in the range:
765
766
767.. code-block:: ada
768
769  Wide_Wide_Character'Val (16#0080#) .. Wide_Wide_Character'Val (16#10FFFF#)
770
771
772then although the logical position of the file pointer is unchanged by
773the `Look_Ahead` call, the stream is physically positioned past the
774wide character sequence.  Again this is to avoid the need for buffering
775or backup, and all `Wide_Wide_Text_IO` routines check the internal
776indication that this situation has occurred so that this is not visible
777to a normal program using `Wide_Wide_Text_IO`.  However, this discrepancy
778can be observed if the wide text file shares a stream with another file.
779
780.. _Reading_and_Writing_Non-Regular_Files_2:
781
782Reading and Writing Non-Regular Files
783-------------------------------------
784
785As in the case of Text_IO, when a non-regular file is read, it is
786assumed that the file contains no page marks (any form characters are
787treated as data characters), and `End_Of_Page` always returns
788`False`.  Similarly, the end of file indication is not sticky, so
789it is possible to read beyond an end of file.
790
791.. _Stream_IO:
792
793Stream_IO
794=========
795
796A stream file is a sequence of bytes, where individual elements are
797written to the file as described in the Ada Reference Manual.  The type
798`Stream_Element` is simply a byte.  There are two ways to read or
799write a stream file.
800
801*
802  The operations `Read` and `Write` directly read or write a
803  sequence of stream elements with no control information.
804
805*
806  The stream attributes applied to a stream file transfer data in the
807  manner described for stream attributes.
808
809.. _Text_Translation:
810
811Text Translation
812================
813
814``Text_Translation=xxx`` may be used as the Form parameter
815passed to Text_IO.Create and Text_IO.Open. ``Text_Translation=xxx``
816has no effect on Unix systems. Possible values are:
817
818
819*
820  ``Yes`` or ``Text`` is the default, which means to
821  translate LF to/from CR/LF on Windows systems.
822
823  ``No`` disables this translation; i.e. it
824  uses binary mode. For output files, ``Text_Translation=No``
825  may be used to create Unix-style files on
826  Windows.
827
828*
829  ``wtext`` translation enabled in Unicode mode.
830  (corresponds to _O_WTEXT).
831
832*
833  ``u8text`` translation enabled in Unicode UTF-8 mode.
834  (corresponds to O_U8TEXT).
835
836*
837  ``u16text`` translation enabled in Unicode UTF-16
838  mode. (corresponds to_O_U16TEXT).
839
840
841.. _Shared_Files:
842
843Shared Files
844============
845
846Section A.14 of the Ada Reference Manual allows implementations to
847provide a wide variety of behavior if an attempt is made to access the
848same external file with two or more internal files.
849
850To provide a full range of functionality, while at the same time
851minimizing the problems of portability caused by this implementation
852dependence, GNAT handles file sharing as follows:
853
854*
855  In the absence of a ``shared=xxx`` form parameter, an attempt
856  to open two or more files with the same full name is considered an error
857  and is not supported.  The exception `Use_Error` will be
858  raised.  Note that a file that is not explicitly closed by the program
859  remains open until the program terminates.
860
861*
862  If the form parameter ``shared=no`` appears in the form string, the
863  file can be opened or created with its own separate stream identifier,
864  regardless of whether other files sharing the same external file are
865  opened.  The exact effect depends on how the C stream routines handle
866  multiple accesses to the same external files using separate streams.
867
868*
869  If the form parameter ``shared=yes`` appears in the form string for
870  each of two or more files opened using the same full name, the same
871  stream is shared between these files, and the semantics are as described
872  in Ada Reference Manual, Section A.14.
873
874When a program that opens multiple files with the same name is ported
875from another Ada compiler to GNAT, the effect will be that
876`Use_Error` is raised.
877
878The documentation of the original compiler and the documentation of the
879program should then be examined to determine if file sharing was
880expected, and ``shared=xxx`` parameters added to `Open`
881and `Create` calls as required.
882
883When a program is ported from GNAT to some other Ada compiler, no
884special attention is required unless the ``shared=xxx`` form
885parameter is used in the program.  In this case, you must examine the
886documentation of the new compiler to see if it supports the required
887file sharing semantics, and form strings modified appropriately.  Of
888course it may be the case that the program cannot be ported if the
889target compiler does not support the required functionality.  The best
890approach in writing portable code is to avoid file sharing (and hence
891the use of the ``shared=xxx`` parameter in the form string)
892completely.
893
894One common use of file sharing in Ada 83 is the use of instantiations of
895Sequential_IO on the same file with different types, to achieve
896heterogeneous input-output.  Although this approach will work in GNAT if
897``shared=yes`` is specified, it is preferable in Ada to use Stream_IO
898for this purpose (using the stream attributes)
899
900.. _Filenames_encoding:
901
902Filenames encoding
903==================
904
905An encoding form parameter can be used to specify the filename
906encoding ``encoding=xxx``.
907
908*
909  If the form parameter ``encoding=utf8`` appears in the form string, the
910  filename must be encoded in UTF-8.
911
912*
913  If the form parameter ``encoding=8bits`` appears in the form
914  string, the filename must be a standard 8bits string.
915
916In the absence of a ``encoding=xxx`` form parameter, the
917encoding is controlled by the ``GNAT_CODE_PAGE`` environment
918variable. And if not set ``utf8`` is assumed.
919
920
921
922*CP_ACP*
923  The current system Windows ANSI code page.
924
925*CP_UTF8*
926  UTF-8 encoding
927
928This encoding form parameter is only supported on the Windows
929platform. On the other Operating Systems the run-time is supporting
930UTF-8 natively.
931
932.. _File_content_encoding:
933
934File content encoding
935=====================
936
937For text files it is possible to specify the encoding to use. This is
938controlled by the by the ``GNAT_CCS_ENCODING`` environment
939variable. And if not set ``TEXT`` is assumed.
940
941The possible values are those supported on Windows:
942
943
944
945*TEXT*
946  Translated text mode
947
948*WTEXT*
949  Translated unicode encoding
950
951*U16TEXT*
952  Unicode 16-bit encoding
953
954*U8TEXT*
955  Unicode 8-bit encoding
956
957This encoding is only supported on the Windows platform.
958
959.. _Open_Modes:
960
961Open Modes
962==========
963
964`Open` and `Create` calls result in a call to `fopen`
965using the mode shown in the following table:
966
967+----------------------------+---------------+------------------+
968|           `Open` and `Create` Call Modes                      |
969+----------------------------+---------------+------------------+
970|                            |   **OPEN**    |     **CREATE**   |
971+============================+===============+==================+
972| Append_File                |   "r+"        |    "w+"          |
973+----------------------------+---------------+------------------+
974| In_File                    |   "r"         |    "w+"          |
975+----------------------------+---------------+------------------+
976| Out_File (Direct_IO)       |   "r+"        |    "w"           |
977+----------------------------+---------------+------------------+
978| Out_File (all other cases) |   "w"         |    "w"           |
979+----------------------------+---------------+------------------+
980| Inout_File                 |   "r+"        |    "w+"          |
981+----------------------------+---------------+------------------+
982
983
984If text file translation is required, then either ``b`` or ``t``
985is added to the mode, depending on the setting of Text.  Text file
986translation refers to the mapping of CR/LF sequences in an external file
987to LF characters internally.  This mapping only occurs in DOS and
988DOS-like systems, and is not relevant to other systems.
989
990A special case occurs with Stream_IO.  As shown in the above table, the
991file is initially opened in ``r`` or ``w`` mode for the
992`In_File` and `Out_File` cases.  If a `Set_Mode` operation
993subsequently requires switching from reading to writing or vice-versa,
994then the file is reopened in ``r+`` mode to permit the required operation.
995
996.. _Operations_on_C_Streams:
997
998Operations on C Streams
999=======================
1000
1001The package `Interfaces.C_Streams` provides an Ada program with direct
1002access to the C library functions for operations on C streams:
1003
1004
1005.. code-block:: ada
1006
1007  package Interfaces.C_Streams is
1008    -- Note: the reason we do not use the types that are in
1009    -- Interfaces.C is that we want to avoid dragging in the
1010    -- code in this unit if possible.
1011    subtype chars is System.Address;
1012    -- Pointer to null-terminated array of characters
1013    subtype FILEs is System.Address;
1014    -- Corresponds to the C type FILE*
1015    subtype voids is System.Address;
1016    -- Corresponds to the C type void*
1017    subtype int is Integer;
1018    subtype long is Long_Integer;
1019    -- Note: the above types are subtypes deliberately, and it
1020    -- is part of this spec that the above correspondences are
1021    -- guaranteed.  This means that it is legitimate to, for
1022    -- example, use Integer instead of int.  We provide these
1023    -- synonyms for clarity, but in some cases it may be
1024    -- convenient to use the underlying types (for example to
1025    -- avoid an unnecessary dependency of a spec on the spec
1026    -- of this unit).
1027    type size_t is mod 2 ** Standard'Address_Size;
1028    NULL_Stream : constant FILEs;
1029    -- Value returned (NULL in C) to indicate an
1030    -- fdopen/fopen/tmpfile error
1031    ----------------------------------
1032    -- Constants Defined in stdio.h --
1033    ----------------------------------
1034    EOF : constant int;
1035    -- Used by a number of routines to indicate error or
1036    -- end of file
1037    IOFBF : constant int;
1038    IOLBF : constant int;
1039    IONBF : constant int;
1040    -- Used to indicate buffering mode for setvbuf call
1041    SEEK_CUR : constant int;
1042    SEEK_END : constant int;
1043    SEEK_SET : constant int;
1044    -- Used to indicate origin for fseek call
1045    function stdin return FILEs;
1046    function stdout return FILEs;
1047    function stderr return FILEs;
1048    -- Streams associated with standard files
1049    --------------------------
1050    -- Standard C functions --
1051    --------------------------
1052    -- The functions selected below are ones that are
1053    -- available in UNIX (but not necessarily in ANSI C).
1054    -- These are very thin interfaces
1055    -- which copy exactly the C headers.  For more
1056    -- documentation on these functions, see the Microsoft C
1057    -- "Run-Time Library Reference" (Microsoft Press, 1990,
1058    -- ISBN 1-55615-225-6), which includes useful information
1059    -- on system compatibility.
1060    procedure clearerr (stream : FILEs);
1061    function fclose (stream : FILEs) return int;
1062    function fdopen (handle : int; mode : chars) return FILEs;
1063    function feof (stream : FILEs) return int;
1064    function ferror (stream : FILEs) return int;
1065    function fflush (stream : FILEs) return int;
1066    function fgetc (stream : FILEs) return int;
1067    function fgets (strng : chars; n : int; stream : FILEs)
1068        return chars;
1069    function fileno (stream : FILEs) return int;
1070    function fopen (filename : chars; Mode : chars)
1071        return FILEs;
1072    -- Note: to maintain target independence, use
1073    -- text_translation_required, a boolean variable defined in
1074    -- a-sysdep.c to deal with the target dependent text
1075    -- translation requirement.  If this variable is set,
1076    -- then  b/t should be appended to the standard mode
1077    -- argument to set the text translation mode off or on
1078    -- as required.
1079    function fputc (C : int; stream : FILEs) return int;
1080    function fputs (Strng : chars; Stream : FILEs) return int;
1081    function fread
1082       (buffer : voids;
1083        size : size_t;
1084        count : size_t;
1085        stream : FILEs)
1086        return size_t;
1087    function freopen
1088       (filename : chars;
1089        mode : chars;
1090        stream : FILEs)
1091        return FILEs;
1092    function fseek
1093       (stream : FILEs;
1094        offset : long;
1095        origin : int)
1096        return int;
1097    function ftell (stream : FILEs) return long;
1098    function fwrite
1099       (buffer : voids;
1100        size : size_t;
1101        count : size_t;
1102        stream : FILEs)
1103        return size_t;
1104    function isatty (handle : int) return int;
1105    procedure mktemp (template : chars);
1106    -- The return value (which is just a pointer to template)
1107    -- is discarded
1108    procedure rewind (stream : FILEs);
1109    function rmtmp return int;
1110    function setvbuf
1111       (stream : FILEs;
1112        buffer : chars;
1113        mode : int;
1114        size : size_t)
1115        return int;
1116
1117    function tmpfile return FILEs;
1118    function ungetc (c : int; stream : FILEs) return int;
1119    function unlink (filename : chars) return int;
1120    ---------------------
1121    -- Extra functions --
1122    ---------------------
1123    -- These functions supply slightly thicker bindings than
1124    -- those above.  They are derived from functions in the
1125    -- C Run-Time Library, but may do a bit more work than
1126    -- just directly calling one of the Library functions.
1127    function is_regular_file (handle : int) return int;
1128    -- Tests if given handle is for a regular file (result 1)
1129    -- or for a non-regular file (pipe or device, result 0).
1130    ---------------------------------
1131    -- Control of Text/Binary Mode --
1132    ---------------------------------
1133    -- If text_translation_required is true, then the following
1134    -- functions may be used to dynamically switch a file from
1135    -- binary to text mode or vice versa.  These functions have
1136    -- no effect if text_translation_required is false (i.e., in
1137    -- normal UNIX mode).  Use fileno to get a stream handle.
1138    procedure set_binary_mode (handle : int);
1139    procedure set_text_mode (handle : int);
1140    ----------------------------
1141    -- Full Path Name support --
1142    ----------------------------
1143    procedure full_name (nam : chars; buffer : chars);
1144    -- Given a NUL terminated string representing a file
1145    -- name, returns in buffer a NUL terminated string
1146    -- representing the full path name for the file name.
1147    -- On systems where it is relevant the   drive is also
1148    -- part of the full path name.  It is the responsibility
1149    -- of the caller to pass an actual parameter for buffer
1150    -- that is big enough for any full path name.  Use
1151    -- max_path_len given below as the size of buffer.
1152    max_path_len : integer;
1153    -- Maximum length of an allowable full path name on the
1154    -- system, including a terminating NUL character.
1155  end Interfaces.C_Streams;
1156
1157
1158.. _Interfacing_to_C_Streams:
1159
1160Interfacing to C Streams
1161========================
1162
1163The packages in this section permit interfacing Ada files to C Stream
1164operations.
1165
1166
1167.. code-block:: ada
1168
1169   with Interfaces.C_Streams;
1170   package Ada.Sequential_IO.C_Streams is
1171      function C_Stream (F : File_Type)
1172         return Interfaces.C_Streams.FILEs;
1173      procedure Open
1174        (File : in out File_Type;
1175         Mode : in File_Mode;
1176         C_Stream : in Interfaces.C_Streams.FILEs;
1177         Form : in String := "");
1178   end Ada.Sequential_IO.C_Streams;
1179
1180    with Interfaces.C_Streams;
1181    package Ada.Direct_IO.C_Streams is
1182       function C_Stream (F : File_Type)
1183          return Interfaces.C_Streams.FILEs;
1184       procedure Open
1185         (File : in out File_Type;
1186          Mode : in File_Mode;
1187          C_Stream : in Interfaces.C_Streams.FILEs;
1188          Form : in String := "");
1189    end Ada.Direct_IO.C_Streams;
1190
1191    with Interfaces.C_Streams;
1192    package Ada.Text_IO.C_Streams is
1193       function C_Stream (F : File_Type)
1194          return Interfaces.C_Streams.FILEs;
1195       procedure Open
1196         (File : in out File_Type;
1197          Mode : in File_Mode;
1198          C_Stream : in Interfaces.C_Streams.FILEs;
1199          Form : in String := "");
1200    end Ada.Text_IO.C_Streams;
1201
1202    with Interfaces.C_Streams;
1203    package Ada.Wide_Text_IO.C_Streams is
1204       function C_Stream (F : File_Type)
1205          return Interfaces.C_Streams.FILEs;
1206       procedure Open
1207         (File : in out File_Type;
1208          Mode : in File_Mode;
1209          C_Stream : in Interfaces.C_Streams.FILEs;
1210          Form : in String := "");
1211   end Ada.Wide_Text_IO.C_Streams;
1212
1213    with Interfaces.C_Streams;
1214    package Ada.Wide_Wide_Text_IO.C_Streams is
1215       function C_Stream (F : File_Type)
1216          return Interfaces.C_Streams.FILEs;
1217       procedure Open
1218         (File : in out File_Type;
1219          Mode : in File_Mode;
1220          C_Stream : in Interfaces.C_Streams.FILEs;
1221          Form : in String := "");
1222   end Ada.Wide_Wide_Text_IO.C_Streams;
1223
1224   with Interfaces.C_Streams;
1225   package Ada.Stream_IO.C_Streams is
1226      function C_Stream (F : File_Type)
1227         return Interfaces.C_Streams.FILEs;
1228      procedure Open
1229        (File : in out File_Type;
1230         Mode : in File_Mode;
1231         C_Stream : in Interfaces.C_Streams.FILEs;
1232         Form : in String := "");
1233   end Ada.Stream_IO.C_Streams;
1234
1235
1236In each of these six packages, the `C_Stream` function obtains the
1237`FILE` pointer from a currently opened Ada file.  It is then
1238possible to use the `Interfaces.C_Streams` package to operate on
1239this stream, or the stream can be passed to a C program which can
1240operate on it directly.  Of course the program is responsible for
1241ensuring that only appropriate sequences of operations are executed.
1242
1243One particular use of relevance to an Ada program is that the
1244`setvbuf` function can be used to control the buffering of the
1245stream used by an Ada file.  In the absence of such a call the standard
1246default buffering is used.
1247
1248The `Open` procedures in these packages open a file giving an
1249existing C Stream instead of a file name.  Typically this stream is
1250imported from a C program, allowing an Ada file to operate on an
1251existing C file.
1252
1253