1.. _The_Implementation_of_Standard_I/O: 2 3********************************** 4The Implementation of Standard I/O 5********************************** 6 7GNAT implements all the required input-output facilities described in 8A.6 through A.14. These sections of the Ada Reference Manual describe the 9required behavior of these packages from the Ada point of view, and if 10you are writing a portable Ada program that does not need to know the 11exact manner in which Ada maps to the outside world when it comes to 12reading or writing external files, then you do not need to read this 13chapter. As long as your files are all regular files (not pipes or 14devices), and as long as you write and read the files only from Ada, the 15description in the Ada Reference Manual is sufficient. 16 17However, if you want to do input-output to pipes or other devices, such 18as the keyboard or screen, or if the files you are dealing with are 19either generated by some other language, or to be read by some other 20language, then you need to know more about the details of how the GNAT 21implementation of these input-output facilities behaves. 22 23In this chapter we give a detailed description of exactly how GNAT 24interfaces to the file system. As always, the sources of the system are 25available to you for answering questions at an even more detailed level, 26but for most purposes the information in this chapter will suffice. 27 28Another reason that you may need to know more about how input-output is 29implemented arises when you have a program written in mixed languages 30where, for example, files are shared between the C and Ada sections of 31the same program. GNAT provides some additional facilities, in the form 32of additional child library packages, that facilitate this sharing, and 33these additional facilities are also described in this chapter. 34 35.. _Standard_I/O_Packages: 36 37Standard I/O Packages 38===================== 39 40The Standard I/O packages described in Annex A for 41 42* 43 Ada.Text_IO 44* 45 Ada.Text_IO.Complex_IO 46* 47 Ada.Text_IO.Text_Streams 48* 49 Ada.Wide_Text_IO 50* 51 Ada.Wide_Text_IO.Complex_IO 52* 53 Ada.Wide_Text_IO.Text_Streams 54* 55 Ada.Wide_Wide_Text_IO 56* 57 Ada.Wide_Wide_Text_IO.Complex_IO 58* 59 Ada.Wide_Wide_Text_IO.Text_Streams 60* 61 Ada.Stream_IO 62* 63 Ada.Sequential_IO 64* 65 Ada.Direct_IO 66 67are implemented using the C 68library streams facility; where 69 70* 71 All files are opened using ``fopen``. 72* 73 All input/output operations use ``fread``/`fwrite`. 74 75There is no internal buffering of any kind at the Ada library level. The only 76buffering is that provided at the system level in the implementation of the 77library routines that support streams. This facilitates shared use of these 78streams by mixed language programs. Note though that system level buffering is 79explicitly enabled at elaboration of the standard I/O packages and that can 80have an impact on mixed language programs, in particular those using I/O before 81calling the Ada elaboration routine (e.g., adainit). It is recommended to call 82the Ada elaboration routine before performing any I/O or when impractical, 83flush the common I/O streams and in particular Standard_Output before 84elaborating the Ada code. 85 86.. _FORM_Strings: 87 88FORM Strings 89============ 90 91The format of a FORM string in GNAT is: 92 93 94:: 95 96 "keyword=value,keyword=value,...,keyword=value" 97 98 99where letters may be in upper or lower case, and there are no spaces 100between values. The order of the entries is not important. Currently 101the following keywords defined. 102 103 104:: 105 106 TEXT_TRANSLATION=[YES|NO|TEXT|BINARY|U8TEXT|WTEXT|U16TEXT] 107 SHARED=[YES|NO] 108 WCEM=[n|h|u|s|e|8|b] 109 ENCODING=[UTF8|8BITS] 110 111 112The use of these parameters is described later in this section. If an 113unrecognized keyword appears in a form string, it is silently ignored 114and not considered invalid. 115 116.. _Direct_IO: 117 118Direct_IO 119========= 120 121Direct_IO can only be instantiated for definite types. This is a 122restriction of the Ada language, which means that the records are fixed 123length (the length being determined by ``type'Size``, rounded 124up to the next storage unit boundary if necessary). 125 126The records of a Direct_IO file are simply written to the file in index 127sequence, with the first record starting at offset zero, and subsequent 128records following. There is no control information of any kind. For 129example, if 32-bit integers are being written, each record takes 1304-bytes, so the record at index ``K`` starts at offset 131(``K``-1)*4. 132 133There is no limit on the size of Direct_IO files, they are expanded as 134necessary to accommodate whatever records are written to the file. 135 136.. _Sequential_IO: 137 138Sequential_IO 139============= 140 141Sequential_IO may be instantiated with either a definite (constrained) 142or indefinite (unconstrained) type. 143 144For the definite type case, the elements written to the file are simply 145the memory images of the data values with no control information of any 146kind. The resulting file should be read using the same type, no validity 147checking is performed on input. 148 149For the indefinite type case, the elements written consist of two 150parts. First is the size of the data item, written as the memory image 151of a ``Interfaces.C.size_t`` value, followed by the memory image of 152the data value. The resulting file can only be read using the same 153(unconstrained) type. Normal assignment checks are performed on these 154read operations, and if these checks fail, ``Data_Error`` is 155raised. In particular, in the array case, the lengths must match, and in 156the variant record case, if the variable for a particular read operation 157is constrained, the discriminants must match. 158 159Note that it is not possible to use Sequential_IO to write variable 160length array items, and then read the data back into different length 161arrays. For example, the following will raise ``Data_Error``: 162 163 164.. code-block:: ada 165 166 package IO is new Sequential_IO (String); 167 F : IO.File_Type; 168 S : String (1..4); 169 ... 170 IO.Create (F) 171 IO.Write (F, "hello!") 172 IO.Reset (F, Mode=>In_File); 173 IO.Read (F, S); 174 Put_Line (S); 175 176 177 178On some Ada implementations, this will print ``hell``, but the program is 179clearly incorrect, since there is only one element in the file, and that 180element is the string ``hello!``. 181 182In Ada 95 and Ada 2005, this kind of behavior can be legitimately achieved 183using Stream_IO, and this is the preferred mechanism. In particular, the 184above program fragment rewritten to use Stream_IO will work correctly. 185 186.. _Text_IO: 187 188Text_IO 189======= 190 191Text_IO files consist of a stream of characters containing the following 192special control characters: 193 194 195:: 196 197 LF (line feed, 16#0A#) Line Mark 198 FF (form feed, 16#0C#) Page Mark 199 200 201A canonical Text_IO file is defined as one in which the following 202conditions are met: 203 204* 205 The character ``LF`` is used only as a line mark, i.e., to mark the end 206 of the line. 207 208* 209 The character ``FF`` is used only as a page mark, i.e., to mark the 210 end of a page and consequently can appear only immediately following a 211 ``LF`` (line mark) character. 212 213* 214 The file ends with either ``LF`` (line mark) or ``LF``-`FF` 215 (line mark, page mark). In the former case, the page mark is implicitly 216 assumed to be present. 217 218A file written using Text_IO will be in canonical form provided that no 219explicit ``LF`` or ``FF`` characters are written using ``Put`` 220or ``Put_Line``. There will be no ``FF`` character at the end of 221the file unless an explicit ``New_Page`` operation was performed 222before closing the file. 223 224A canonical Text_IO file that is a regular file (i.e., not a device or a 225pipe) can be read using any of the routines in Text_IO. The 226semantics in this case will be exactly as defined in the Ada Reference 227Manual, and all the routines in Text_IO are fully implemented. 228 229A text file that does not meet the requirements for a canonical Text_IO 230file has one of the following: 231 232* 233 The file contains ``FF`` characters not immediately following a 234 ``LF`` character. 235 236* 237 The file contains ``LF`` or ``FF`` characters written by 238 ``Put`` or ``Put_Line``, which are not logically considered to be 239 line marks or page marks. 240 241* 242 The file ends in a character other than ``LF`` or ``FF``, 243 i.e., there is no explicit line mark or page mark at the end of the file. 244 245Text_IO can be used to read such non-standard text files but subprograms 246to do with line or page numbers do not have defined meanings. In 247particular, a ``FF`` character that does not follow a ``LF`` 248character may or may not be treated as a page mark from the point of 249view of page and line numbering. Every ``LF`` character is considered 250to end a line, and there is an implied ``LF`` character at the end of 251the file. 252 253.. _Stream_Pointer_Positioning: 254 255Stream Pointer Positioning 256-------------------------- 257 258``Ada.Text_IO`` has a definition of current position for a file that 259is being read. No internal buffering occurs in Text_IO, and usually the 260physical position in the stream used to implement the file corresponds 261to this logical position defined by Text_IO. There are two exceptions: 262 263* 264 After a call to ``End_Of_Page`` that returns ``True``, the stream 265 is positioned past the ``LF`` (line mark) that precedes the page 266 mark. Text_IO maintains an internal flag so that subsequent read 267 operations properly handle the logical position which is unchanged by 268 the ``End_Of_Page`` call. 269 270* 271 After a call to ``End_Of_File`` that returns ``True``, if the 272 Text_IO file was positioned before the line mark at the end of file 273 before the call, then the logical position is unchanged, but the stream 274 is physically positioned right at the end of file (past the line mark, 275 and past a possible page mark following the line mark. Again Text_IO 276 maintains internal flags so that subsequent read operations properly 277 handle the logical position. 278 279These discrepancies have no effect on the observable behavior of 280Text_IO, but if a single Ada stream is shared between a C program and 281Ada program, or shared (using ``shared=yes`` in the form string) 282between two Ada files, then the difference may be observable in some 283situations. 284 285.. _Reading_and_Writing_Non-Regular_Files: 286 287Reading and Writing Non-Regular Files 288------------------------------------- 289 290A non-regular file is a device (such as a keyboard), or a pipe. Text_IO 291can be used for reading and writing. Writing is not affected and the 292sequence of characters output is identical to the normal file case, but 293for reading, the behavior of Text_IO is modified to avoid undesirable 294look-ahead as follows: 295 296An input file that is not a regular file is considered to have no page 297marks. Any ``Ascii.FF`` characters (the character normally used for a 298page mark) appearing in the file are considered to be data 299characters. In particular: 300 301* 302 ``Get_Line`` and ``Skip_Line`` do not test for a page mark 303 following a line mark. If a page mark appears, it will be treated as a 304 data character. 305 306* 307 This avoids the need to wait for an extra character to be typed or 308 entered from the pipe to complete one of these operations. 309 310* 311 ``End_Of_Page`` always returns ``False`` 312 313* 314 ``End_Of_File`` will return ``False`` if there is a page mark at 315 the end of the file. 316 317Output to non-regular files is the same as for regular files. Page marks 318may be written to non-regular files using ``New_Page``, but as noted 319above they will not be treated as page marks on input if the output is 320piped to another Ada program. 321 322Another important discrepancy when reading non-regular files is that the end 323of file indication is not 'sticky'. If an end of file is entered, e.g., by 324pressing the :kbd:`EOT` key, 325then end of file 326is signaled once (i.e., the test ``End_Of_File`` 327will yield ``True``, or a read will 328raise ``End_Error``), but then reading can resume 329to read data past that end of 330file indication, until another end of file indication is entered. 331 332.. _Get_Immediate: 333 334Get_Immediate 335------------- 336 337.. index:: Get_Immediate 338 339Get_Immediate returns the next character (including control characters) 340from the input file. In particular, Get_Immediate will return LF or FF 341characters used as line marks or page marks. Such operations leave the 342file positioned past the control character, and it is thus not treated 343as having its normal function. This means that page, line and column 344counts after this kind of Get_Immediate call are set as though the mark 345did not occur. In the case where a Get_Immediate leaves the file 346positioned between the line mark and page mark (which is not normally 347possible), it is undefined whether the FF character will be treated as a 348page mark. 349 350.. _Treating_Text_IO_Files_as_Streams: 351 352Treating Text_IO Files as Streams 353--------------------------------- 354 355.. index:: Stream files 356 357The package ``Text_IO.Streams`` allows a ``Text_IO`` file to be treated 358as a stream. Data written to a ``Text_IO`` file in this stream mode is 359binary data. If this binary data contains bytes 16#0A# (``LF``) or 36016#0C# (``FF``), the resulting file may have non-standard 361format. Similarly if read operations are used to read from a Text_IO 362file treated as a stream, then ``LF`` and ``FF`` characters may be 363skipped and the effect is similar to that described above for 364``Get_Immediate``. 365 366.. _Text_IO_Extensions: 367 368Text_IO Extensions 369------------------ 370 371.. index:: Text_IO extensions 372 373A package GNAT.IO_Aux in the GNAT library provides some useful extensions 374to the standard ``Text_IO`` package: 375 376* function File_Exists (Name : String) return Boolean; 377 Determines if a file of the given name exists. 378 379* function Get_Line return String; 380 Reads a string from the standard input file. The value returned is exactly 381 the length of the line that was read. 382 383* function Get_Line (File : Ada.Text_IO.File_Type) return String; 384 Similar, except that the parameter File specifies the file from which 385 the string is to be read. 386 387 388.. _Text_IO_Facilities_for_Unbounded_Strings: 389 390Text_IO Facilities for Unbounded Strings 391---------------------------------------- 392 393.. index:: Text_IO for unbounded strings 394 395.. index:: Unbounded_String, Text_IO operations 396 397The package ``Ada.Strings.Unbounded.Text_IO`` 398in library files :file:`a-suteio.ads/adb` contains some GNAT-specific 399subprograms useful for Text_IO operations on unbounded strings: 400 401 402* function Get_Line (File : File_Type) return Unbounded_String; 403 Reads a line from the specified file 404 and returns the result as an unbounded string. 405 406* procedure Put (File : File_Type; U : Unbounded_String); 407 Writes the value of the given unbounded string to the specified file 408 Similar to the effect of 409 ``Put (To_String (U))`` except that an extra copy is avoided. 410 411* procedure Put_Line (File : File_Type; U : Unbounded_String); 412 Writes the value of the given unbounded string to the specified file, 413 followed by a ``New_Line``. 414 Similar to the effect of ``Put_Line (To_String (U))`` except 415 that an extra copy is avoided. 416 417In the above procedures, ``File`` is of type ``Ada.Text_IO.File_Type`` 418and is optional. If the parameter is omitted, then the standard input or 419output file is referenced as appropriate. 420 421The package ``Ada.Strings.Wide_Unbounded.Wide_Text_IO`` in library 422files :file:`a-swuwti.ads` and :file:`a-swuwti.adb` provides similar extended 423``Wide_Text_IO`` functionality for unbounded wide strings. 424 425The package ``Ada.Strings.Wide_Wide_Unbounded.Wide_Wide_Text_IO`` in library 426files :file:`a-szuzti.ads` and :file:`a-szuzti.adb` provides similar extended 427``Wide_Wide_Text_IO`` functionality for unbounded wide wide strings. 428 429.. _Wide_Text_IO: 430 431Wide_Text_IO 432============ 433 434``Wide_Text_IO`` is similar in most respects to Text_IO, except that 435both input and output files may contain special sequences that represent 436wide character values. The encoding scheme for a given file may be 437specified using a FORM parameter: 438 439 440:: 441 442 WCEM=`x` 443 444 445as part of the FORM string (WCEM = wide character encoding method), 446where ``x`` is one of the following characters 447 448========== ==================== 449Character Encoding 450========== ==================== 451*h* Hex ESC encoding 452*u* Upper half encoding 453*s* Shift-JIS encoding 454*e* EUC Encoding 455*8* UTF-8 encoding 456*b* Brackets encoding 457========== ==================== 458 459The encoding methods match those that 460can be used in a source 461program, but there is no requirement that the encoding method used for 462the source program be the same as the encoding method used for files, 463and different files may use different encoding methods. 464 465The default encoding method for the standard files, and for opened files 466for which no WCEM parameter is given in the FORM string matches the 467wide character encoding specified for the main program (the default 468being brackets encoding if no coding method was specified with -gnatW). 469 470 471 472*Hex Coding* 473 In this encoding, a wide character is represented by a five character 474 sequence: 475 476 477:: 478 479 ESC a b c d 480 481.. 482 483 where ``a``, ``b``, ``c``, ``d`` are the four hexadecimal 484 characters (using upper case letters) of the wide character code. For 485 example, ESC A345 is used to represent the wide character with code 486 16#A345#. This scheme is compatible with use of the full 487 ``Wide_Character`` set. 488 489 490*Upper Half Coding* 491 The wide character with encoding 16#abcd#, where the upper bit is on 492 (i.e., a is in the range 8-F) is represented as two bytes 16#ab# and 493 16#cd#. The second byte may never be a format control character, but is 494 not required to be in the upper half. This method can be also used for 495 shift-JIS or EUC where the internal coding matches the external coding. 496 497 498*Shift JIS Coding* 499 A wide character is represented by a two character sequence 16#ab# and 500 16#cd#, with the restrictions described for upper half encoding as 501 described above. The internal character code is the corresponding JIS 502 character according to the standard algorithm for Shift-JIS 503 conversion. Only characters defined in the JIS code set table can be 504 used with this encoding method. 505 506 507*EUC Coding* 508 A wide character is represented by a two character sequence 16#ab# and 509 16#cd#, with both characters being in the upper half. The internal 510 character code is the corresponding JIS character according to the EUC 511 encoding algorithm. Only characters defined in the JIS code set table 512 can be used with this encoding method. 513 514 515*UTF-8 Coding* 516 A wide character is represented using 517 UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO 518 10646-1/Am.2. Depending on the character value, the representation 519 is a one, two, or three byte sequence: 520 521 522:: 523 524 16#0000#-16#007f#: 2#0xxxxxxx# 525 16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx# 526 16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx# 527 528.. 529 530 where the ``xxx`` bits correspond to the left-padded bits of the 531 16-bit character value. Note that all lower half ASCII characters 532 are represented as ASCII bytes and all upper half characters and 533 other wide characters are represented as sequences of upper-half 534 (The full UTF-8 scheme allows for encoding 31-bit characters as 535 6-byte sequences, but in this implementation, all UTF-8 sequences 536 of four or more bytes length will raise a Constraint_Error, as 537 will all invalid UTF-8 sequences.) 538 539 540*Brackets Coding* 541 In this encoding, a wide character is represented by the following eight 542 character sequence: 543 544 545:: 546 547 [ " a b c d " ] 548 549.. 550 551 where ``a``, ``b``, ``c``, ``d`` are the four hexadecimal 552 characters (using uppercase letters) of the wide character code. For 553 example, ``["A345"]`` is used to represent the wide character with code 554 ``16#A345#``. 555 This scheme is compatible with use of the full Wide_Character set. 556 On input, brackets coding can also be used for upper half characters, 557 e.g., ``["C1"]`` for lower case a. However, on output, brackets notation 558 is only used for wide characters with a code greater than ``16#FF#``. 559 560 Note that brackets coding is not normally used in the context of 561 Wide_Text_IO or Wide_Wide_Text_IO, since it is really just designed as 562 a portable way of encoding source files. In the context of Wide_Text_IO 563 or Wide_Wide_Text_IO, it can only be used if the file does not contain 564 any instance of the left bracket character other than to encode wide 565 character values using the brackets encoding method. In practice it is 566 expected that some standard wide character encoding method such 567 as UTF-8 will be used for text input output. 568 569 If brackets notation is used, then any occurrence of a left bracket 570 in the input file which is not the start of a valid wide character 571 sequence will cause Constraint_Error to be raised. It is possible to 572 encode a left bracket as ["5B"] and Wide_Text_IO and Wide_Wide_Text_IO 573 input will interpret this as a left bracket. 574 575 However, when a left bracket is output, it will be output as a left bracket 576 and not as ["5B"]. We make this decision because for normal use of 577 Wide_Text_IO for outputting messages, it is unpleasant to clobber left 578 brackets. For example, if we write: 579 580 581 .. code-block:: ada 582 583 Put_Line ("Start of output [first run]"); 584 585 586 we really do not want to have the left bracket in this message clobbered so 587 that the output reads: 588 589 590:: 591 592 Start of output ["5B"]first run] 593 594.. 595 596 In practice brackets encoding is reasonably useful for normal Put_Line use 597 since we won't get confused between left brackets and wide character 598 sequences in the output. But for input, or when files are written out 599 and read back in, it really makes better sense to use one of the standard 600 encoding methods such as UTF-8. 601 602 603For the coding schemes other than UTF-8, Hex, or Brackets encoding, 604not all wide character 605values can be represented. An attempt to output a character that cannot 606be represented using the encoding scheme for the file causes 607Constraint_Error to be raised. An invalid wide character sequence on 608input also causes Constraint_Error to be raised. 609 610.. _Stream_Pointer_Positioning_1: 611 612Stream Pointer Positioning 613-------------------------- 614 615``Ada.Wide_Text_IO`` is similar to ``Ada.Text_IO`` in its handling 616of stream pointer positioning (:ref:`Text_IO`). There is one additional 617case: 618 619If ``Ada.Wide_Text_IO.Look_Ahead`` reads a character outside the 620normal lower ASCII set (i.e., a character in the range: 621 622 623.. code-block:: ada 624 625 Wide_Character'Val (16#0080#) .. Wide_Character'Val (16#FFFF#) 626 627 628then although the logical position of the file pointer is unchanged by 629the ``Look_Ahead`` call, the stream is physically positioned past the 630wide character sequence. Again this is to avoid the need for buffering 631or backup, and all ``Wide_Text_IO`` routines check the internal 632indication that this situation has occurred so that this is not visible 633to a normal program using ``Wide_Text_IO``. However, this discrepancy 634can be observed if the wide text file shares a stream with another file. 635 636.. _Reading_and_Writing_Non-Regular_Files_1: 637 638Reading and Writing Non-Regular Files 639------------------------------------- 640 641As in the case of Text_IO, when a non-regular file is read, it is 642assumed that the file contains no page marks (any form characters are 643treated as data characters), and ``End_Of_Page`` always returns 644``False``. Similarly, the end of file indication is not sticky, so 645it is possible to read beyond an end of file. 646 647.. _Wide_Wide_Text_IO: 648 649Wide_Wide_Text_IO 650================= 651 652``Wide_Wide_Text_IO`` is similar in most respects to Text_IO, except that 653both input and output files may contain special sequences that represent 654wide wide character values. The encoding scheme for a given file may be 655specified using a FORM parameter: 656 657 658:: 659 660 WCEM=`x` 661 662 663as part of the FORM string (WCEM = wide character encoding method), 664where ``x`` is one of the following characters 665 666========== ==================== 667Character Encoding 668========== ==================== 669*h* Hex ESC encoding 670*u* Upper half encoding 671*s* Shift-JIS encoding 672*e* EUC Encoding 673*8* UTF-8 encoding 674*b* Brackets encoding 675========== ==================== 676 677 678The encoding methods match those that 679can be used in a source 680program, but there is no requirement that the encoding method used for 681the source program be the same as the encoding method used for files, 682and different files may use different encoding methods. 683 684The default encoding method for the standard files, and for opened files 685for which no WCEM parameter is given in the FORM string matches the 686wide character encoding specified for the main program (the default 687being brackets encoding if no coding method was specified with -gnatW). 688 689 690 691*UTF-8 Coding* 692 A wide character is represented using 693 UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO 694 10646-1/Am.2. Depending on the character value, the representation 695 is a one, two, three, or four byte sequence: 696 697 698:: 699 700 16#000000#-16#00007f#: 2#0xxxxxxx# 701 16#000080#-16#0007ff#: 2#110xxxxx# 2#10xxxxxx# 702 16#000800#-16#00ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx# 703 16#010000#-16#10ffff#: 2#11110xxx# 2#10xxxxxx# 2#10xxxxxx# 2#10xxxxxx# 704 705.. 706 707 where the ``xxx`` bits correspond to the left-padded bits of the 708 21-bit character value. Note that all lower half ASCII characters 709 are represented as ASCII bytes and all upper half characters and 710 other wide characters are represented as sequences of upper-half 711 characters. 712 713 714*Brackets Coding* 715 In this encoding, a wide wide character is represented by the following eight 716 character sequence if is in wide character range 717 718 719:: 720 721 [ " a b c d " ] 722 723.. 724 725 and by the following ten character sequence if not 726 727 728:: 729 730 [ " a b c d e f " ] 731 732.. 733 734 where ``a``, ``b``, ``c``, ``d``, ``e``, and ``f`` 735 are the four or six hexadecimal 736 characters (using uppercase letters) of the wide wide character code. For 737 example, ``["01A345"]`` is used to represent the wide wide character 738 with code ``16#01A345#``. 739 740 This scheme is compatible with use of the full Wide_Wide_Character set. 741 On input, brackets coding can also be used for upper half characters, 742 e.g., ``["C1"]`` for lower case a. However, on output, brackets notation 743 is only used for wide characters with a code greater than ``16#FF#``. 744 745 746If is also possible to use the other Wide_Character encoding methods, 747such as Shift-JIS, but the other schemes cannot support the full range 748of wide wide characters. 749An attempt to output a character that cannot 750be represented using the encoding scheme for the file causes 751Constraint_Error to be raised. An invalid wide character sequence on 752input also causes Constraint_Error to be raised. 753 754.. _Stream_Pointer_Positioning_2: 755 756Stream Pointer Positioning 757-------------------------- 758 759``Ada.Wide_Wide_Text_IO`` is similar to ``Ada.Text_IO`` in its handling 760of stream pointer positioning (:ref:`Text_IO`). There is one additional 761case: 762 763If ``Ada.Wide_Wide_Text_IO.Look_Ahead`` reads a character outside the 764normal lower ASCII set (i.e., a character in the range: 765 766 767.. code-block:: ada 768 769 Wide_Wide_Character'Val (16#0080#) .. Wide_Wide_Character'Val (16#10FFFF#) 770 771 772then although the logical position of the file pointer is unchanged by 773the ``Look_Ahead`` call, the stream is physically positioned past the 774wide character sequence. Again this is to avoid the need for buffering 775or backup, and all ``Wide_Wide_Text_IO`` routines check the internal 776indication that this situation has occurred so that this is not visible 777to a normal program using ``Wide_Wide_Text_IO``. However, this discrepancy 778can be observed if the wide text file shares a stream with another file. 779 780.. _Reading_and_Writing_Non-Regular_Files_2: 781 782Reading and Writing Non-Regular Files 783------------------------------------- 784 785As in the case of Text_IO, when a non-regular file is read, it is 786assumed that the file contains no page marks (any form characters are 787treated as data characters), and ``End_Of_Page`` always returns 788``False``. Similarly, the end of file indication is not sticky, so 789it is possible to read beyond an end of file. 790 791.. _Stream_IO: 792 793Stream_IO 794========= 795 796A stream file is a sequence of bytes, where individual elements are 797written to the file as described in the Ada Reference Manual. The type 798``Stream_Element`` is simply a byte. There are two ways to read or 799write a stream file. 800 801* 802 The operations ``Read`` and ``Write`` directly read or write a 803 sequence of stream elements with no control information. 804 805* 806 The stream attributes applied to a stream file transfer data in the 807 manner described for stream attributes. 808 809.. _Text_Translation: 810 811Text Translation 812================ 813 814``Text_Translation=xxx`` may be used as the Form parameter 815passed to Text_IO.Create and Text_IO.Open. ``Text_Translation=xxx`` 816has no effect on Unix systems. Possible values are: 817 818 819* 820 ``Yes`` or ``Text`` is the default, which means to 821 translate LF to/from CR/LF on Windows systems. 822 823 ``No`` disables this translation; i.e. it 824 uses binary mode. For output files, ``Text_Translation=No`` 825 may be used to create Unix-style files on 826 Windows. 827 828* 829 ``wtext`` translation enabled in Unicode mode. 830 (corresponds to _O_WTEXT). 831 832* 833 ``u8text`` translation enabled in Unicode UTF-8 mode. 834 (corresponds to O_U8TEXT). 835 836* 837 ``u16text`` translation enabled in Unicode UTF-16 838 mode. (corresponds to_O_U16TEXT). 839 840 841.. _Shared_Files: 842 843Shared Files 844============ 845 846Section A.14 of the Ada Reference Manual allows implementations to 847provide a wide variety of behavior if an attempt is made to access the 848same external file with two or more internal files. 849 850To provide a full range of functionality, while at the same time 851minimizing the problems of portability caused by this implementation 852dependence, GNAT handles file sharing as follows: 853 854* 855 In the absence of a ``shared=xxx`` form parameter, an attempt 856 to open two or more files with the same full name is considered an error 857 and is not supported. The exception ``Use_Error`` will be 858 raised. Note that a file that is not explicitly closed by the program 859 remains open until the program terminates. 860 861* 862 If the form parameter ``shared=no`` appears in the form string, the 863 file can be opened or created with its own separate stream identifier, 864 regardless of whether other files sharing the same external file are 865 opened. The exact effect depends on how the C stream routines handle 866 multiple accesses to the same external files using separate streams. 867 868* 869 If the form parameter ``shared=yes`` appears in the form string for 870 each of two or more files opened using the same full name, the same 871 stream is shared between these files, and the semantics are as described 872 in Ada Reference Manual, Section A.14. 873 874When a program that opens multiple files with the same name is ported 875from another Ada compiler to GNAT, the effect will be that 876``Use_Error`` is raised. 877 878The documentation of the original compiler and the documentation of the 879program should then be examined to determine if file sharing was 880expected, and ``shared=xxx`` parameters added to ``Open`` 881and ``Create`` calls as required. 882 883When a program is ported from GNAT to some other Ada compiler, no 884special attention is required unless the ``shared=xxx`` form 885parameter is used in the program. In this case, you must examine the 886documentation of the new compiler to see if it supports the required 887file sharing semantics, and form strings modified appropriately. Of 888course it may be the case that the program cannot be ported if the 889target compiler does not support the required functionality. The best 890approach in writing portable code is to avoid file sharing (and hence 891the use of the ``shared=xxx`` parameter in the form string) 892completely. 893 894One common use of file sharing in Ada 83 is the use of instantiations of 895Sequential_IO on the same file with different types, to achieve 896heterogeneous input-output. Although this approach will work in GNAT if 897``shared=yes`` is specified, it is preferable in Ada to use Stream_IO 898for this purpose (using the stream attributes) 899 900.. _Filenames_encoding: 901 902Filenames encoding 903================== 904 905An encoding form parameter can be used to specify the filename 906encoding ``encoding=xxx``. 907 908* 909 If the form parameter ``encoding=utf8`` appears in the form string, the 910 filename must be encoded in UTF-8. 911 912* 913 If the form parameter ``encoding=8bits`` appears in the form 914 string, the filename must be a standard 8bits string. 915 916In the absence of a ``encoding=xxx`` form parameter, the 917encoding is controlled by the ``GNAT_CODE_PAGE`` environment 918variable. And if not set ``utf8`` is assumed. 919 920 921 922*CP_ACP* 923 The current system Windows ANSI code page. 924 925*CP_UTF8* 926 UTF-8 encoding 927 928This encoding form parameter is only supported on the Windows 929platform. On the other Operating Systems the run-time is supporting 930UTF-8 natively. 931 932.. _File_content_encoding: 933 934File content encoding 935===================== 936 937For text files it is possible to specify the encoding to use. This is 938controlled by the by the ``GNAT_CCS_ENCODING`` environment 939variable. And if not set ``TEXT`` is assumed. 940 941The possible values are those supported on Windows: 942 943 944 945*TEXT* 946 Translated text mode 947 948*WTEXT* 949 Translated unicode encoding 950 951*U16TEXT* 952 Unicode 16-bit encoding 953 954*U8TEXT* 955 Unicode 8-bit encoding 956 957This encoding is only supported on the Windows platform. 958 959.. _Open_Modes: 960 961Open Modes 962========== 963 964``Open`` and ``Create`` calls result in a call to ``fopen`` 965using the mode shown in the following table: 966 967+----------------------------+---------------+------------------+ 968| ``Open`` and ``Create`` Call Modes | 969+----------------------------+---------------+------------------+ 970| | **OPEN** | **CREATE** | 971+============================+===============+==================+ 972| Append_File | "r+" | "w+" | 973+----------------------------+---------------+------------------+ 974| In_File | "r" | "w+" | 975+----------------------------+---------------+------------------+ 976| Out_File (Direct_IO) | "r+" | "w" | 977+----------------------------+---------------+------------------+ 978| Out_File (all other cases) | "w" | "w" | 979+----------------------------+---------------+------------------+ 980| Inout_File | "r+" | "w+" | 981+----------------------------+---------------+------------------+ 982 983 984If text file translation is required, then either ``b`` or ``t`` 985is added to the mode, depending on the setting of Text. Text file 986translation refers to the mapping of CR/LF sequences in an external file 987to LF characters internally. This mapping only occurs in DOS and 988DOS-like systems, and is not relevant to other systems. 989 990A special case occurs with Stream_IO. As shown in the above table, the 991file is initially opened in ``r`` or ``w`` mode for the 992``In_File`` and ``Out_File`` cases. If a ``Set_Mode`` operation 993subsequently requires switching from reading to writing or vice-versa, 994then the file is reopened in ``r+`` mode to permit the required operation. 995 996.. _Operations_on_C_Streams: 997 998Operations on C Streams 999======================= 1000 1001The package ``Interfaces.C_Streams`` provides an Ada program with direct 1002access to the C library functions for operations on C streams: 1003 1004 1005.. code-block:: ada 1006 1007 package Interfaces.C_Streams is 1008 -- Note: the reason we do not use the types that are in 1009 -- Interfaces.C is that we want to avoid dragging in the 1010 -- code in this unit if possible. 1011 subtype chars is System.Address; 1012 -- Pointer to null-terminated array of characters 1013 subtype FILEs is System.Address; 1014 -- Corresponds to the C type FILE* 1015 subtype voids is System.Address; 1016 -- Corresponds to the C type void* 1017 subtype int is Integer; 1018 subtype long is Long_Integer; 1019 -- Note: the above types are subtypes deliberately, and it 1020 -- is part of this spec that the above correspondences are 1021 -- guaranteed. This means that it is legitimate to, for 1022 -- example, use Integer instead of int. We provide these 1023 -- synonyms for clarity, but in some cases it may be 1024 -- convenient to use the underlying types (for example to 1025 -- avoid an unnecessary dependency of a spec on the spec 1026 -- of this unit). 1027 type size_t is mod 2 ** Standard'Address_Size; 1028 NULL_Stream : constant FILEs; 1029 -- Value returned (NULL in C) to indicate an 1030 -- fdopen/fopen/tmpfile error 1031 ---------------------------------- 1032 -- Constants Defined in stdio.h -- 1033 ---------------------------------- 1034 EOF : constant int; 1035 -- Used by a number of routines to indicate error or 1036 -- end of file 1037 IOFBF : constant int; 1038 IOLBF : constant int; 1039 IONBF : constant int; 1040 -- Used to indicate buffering mode for setvbuf call 1041 SEEK_CUR : constant int; 1042 SEEK_END : constant int; 1043 SEEK_SET : constant int; 1044 -- Used to indicate origin for fseek call 1045 function stdin return FILEs; 1046 function stdout return FILEs; 1047 function stderr return FILEs; 1048 -- Streams associated with standard files 1049 -------------------------- 1050 -- Standard C functions -- 1051 -------------------------- 1052 -- The functions selected below are ones that are 1053 -- available in UNIX (but not necessarily in ANSI C). 1054 -- These are very thin interfaces 1055 -- which copy exactly the C headers. For more 1056 -- documentation on these functions, see the Microsoft C 1057 -- "Run-Time Library Reference" (Microsoft Press, 1990, 1058 -- ISBN 1-55615-225-6), which includes useful information 1059 -- on system compatibility. 1060 procedure clearerr (stream : FILEs); 1061 function fclose (stream : FILEs) return int; 1062 function fdopen (handle : int; mode : chars) return FILEs; 1063 function feof (stream : FILEs) return int; 1064 function ferror (stream : FILEs) return int; 1065 function fflush (stream : FILEs) return int; 1066 function fgetc (stream : FILEs) return int; 1067 function fgets (strng : chars; n : int; stream : FILEs) 1068 return chars; 1069 function fileno (stream : FILEs) return int; 1070 function fopen (filename : chars; Mode : chars) 1071 return FILEs; 1072 -- Note: to maintain target independence, use 1073 -- text_translation_required, a boolean variable defined in 1074 -- a-sysdep.c to deal with the target dependent text 1075 -- translation requirement. If this variable is set, 1076 -- then b/t should be appended to the standard mode 1077 -- argument to set the text translation mode off or on 1078 -- as required. 1079 function fputc (C : int; stream : FILEs) return int; 1080 function fputs (Strng : chars; Stream : FILEs) return int; 1081 function fread 1082 (buffer : voids; 1083 size : size_t; 1084 count : size_t; 1085 stream : FILEs) 1086 return size_t; 1087 function freopen 1088 (filename : chars; 1089 mode : chars; 1090 stream : FILEs) 1091 return FILEs; 1092 function fseek 1093 (stream : FILEs; 1094 offset : long; 1095 origin : int) 1096 return int; 1097 function ftell (stream : FILEs) return long; 1098 function fwrite 1099 (buffer : voids; 1100 size : size_t; 1101 count : size_t; 1102 stream : FILEs) 1103 return size_t; 1104 function isatty (handle : int) return int; 1105 procedure mktemp (template : chars); 1106 -- The return value (which is just a pointer to template) 1107 -- is discarded 1108 procedure rewind (stream : FILEs); 1109 function rmtmp return int; 1110 function setvbuf 1111 (stream : FILEs; 1112 buffer : chars; 1113 mode : int; 1114 size : size_t) 1115 return int; 1116 1117 function tmpfile return FILEs; 1118 function ungetc (c : int; stream : FILEs) return int; 1119 function unlink (filename : chars) return int; 1120 --------------------- 1121 -- Extra functions -- 1122 --------------------- 1123 -- These functions supply slightly thicker bindings than 1124 -- those above. They are derived from functions in the 1125 -- C Run-Time Library, but may do a bit more work than 1126 -- just directly calling one of the Library functions. 1127 function is_regular_file (handle : int) return int; 1128 -- Tests if given handle is for a regular file (result 1) 1129 -- or for a non-regular file (pipe or device, result 0). 1130 --------------------------------- 1131 -- Control of Text/Binary Mode -- 1132 --------------------------------- 1133 -- If text_translation_required is true, then the following 1134 -- functions may be used to dynamically switch a file from 1135 -- binary to text mode or vice versa. These functions have 1136 -- no effect if text_translation_required is false (i.e., in 1137 -- normal UNIX mode). Use fileno to get a stream handle. 1138 procedure set_binary_mode (handle : int); 1139 procedure set_text_mode (handle : int); 1140 ---------------------------- 1141 -- Full Path Name support -- 1142 ---------------------------- 1143 procedure full_name (nam : chars; buffer : chars); 1144 -- Given a NUL terminated string representing a file 1145 -- name, returns in buffer a NUL terminated string 1146 -- representing the full path name for the file name. 1147 -- On systems where it is relevant the drive is also 1148 -- part of the full path name. It is the responsibility 1149 -- of the caller to pass an actual parameter for buffer 1150 -- that is big enough for any full path name. Use 1151 -- max_path_len given below as the size of buffer. 1152 max_path_len : integer; 1153 -- Maximum length of an allowable full path name on the 1154 -- system, including a terminating NUL character. 1155 end Interfaces.C_Streams; 1156 1157 1158.. _Interfacing_to_C_Streams: 1159 1160Interfacing to C Streams 1161======================== 1162 1163The packages in this section permit interfacing Ada files to C Stream 1164operations. 1165 1166 1167.. code-block:: ada 1168 1169 with Interfaces.C_Streams; 1170 package Ada.Sequential_IO.C_Streams is 1171 function C_Stream (F : File_Type) 1172 return Interfaces.C_Streams.FILEs; 1173 procedure Open 1174 (File : in out File_Type; 1175 Mode : in File_Mode; 1176 C_Stream : in Interfaces.C_Streams.FILEs; 1177 Form : in String := ""); 1178 end Ada.Sequential_IO.C_Streams; 1179 1180 with Interfaces.C_Streams; 1181 package Ada.Direct_IO.C_Streams is 1182 function C_Stream (F : File_Type) 1183 return Interfaces.C_Streams.FILEs; 1184 procedure Open 1185 (File : in out File_Type; 1186 Mode : in File_Mode; 1187 C_Stream : in Interfaces.C_Streams.FILEs; 1188 Form : in String := ""); 1189 end Ada.Direct_IO.C_Streams; 1190 1191 with Interfaces.C_Streams; 1192 package Ada.Text_IO.C_Streams is 1193 function C_Stream (F : File_Type) 1194 return Interfaces.C_Streams.FILEs; 1195 procedure Open 1196 (File : in out File_Type; 1197 Mode : in File_Mode; 1198 C_Stream : in Interfaces.C_Streams.FILEs; 1199 Form : in String := ""); 1200 end Ada.Text_IO.C_Streams; 1201 1202 with Interfaces.C_Streams; 1203 package Ada.Wide_Text_IO.C_Streams is 1204 function C_Stream (F : File_Type) 1205 return Interfaces.C_Streams.FILEs; 1206 procedure Open 1207 (File : in out File_Type; 1208 Mode : in File_Mode; 1209 C_Stream : in Interfaces.C_Streams.FILEs; 1210 Form : in String := ""); 1211 end Ada.Wide_Text_IO.C_Streams; 1212 1213 with Interfaces.C_Streams; 1214 package Ada.Wide_Wide_Text_IO.C_Streams is 1215 function C_Stream (F : File_Type) 1216 return Interfaces.C_Streams.FILEs; 1217 procedure Open 1218 (File : in out File_Type; 1219 Mode : in File_Mode; 1220 C_Stream : in Interfaces.C_Streams.FILEs; 1221 Form : in String := ""); 1222 end Ada.Wide_Wide_Text_IO.C_Streams; 1223 1224 with Interfaces.C_Streams; 1225 package Ada.Stream_IO.C_Streams is 1226 function C_Stream (F : File_Type) 1227 return Interfaces.C_Streams.FILEs; 1228 procedure Open 1229 (File : in out File_Type; 1230 Mode : in File_Mode; 1231 C_Stream : in Interfaces.C_Streams.FILEs; 1232 Form : in String := ""); 1233 end Ada.Stream_IO.C_Streams; 1234 1235 1236In each of these six packages, the ``C_Stream`` function obtains the 1237``FILE`` pointer from a currently opened Ada file. It is then 1238possible to use the ``Interfaces.C_Streams`` package to operate on 1239this stream, or the stream can be passed to a C program which can 1240operate on it directly. Of course the program is responsible for 1241ensuring that only appropriate sequences of operations are executed. 1242 1243One particular use of relevance to an Ada program is that the 1244``setvbuf`` function can be used to control the buffering of the 1245stream used by an Ada file. In the absence of such a call the standard 1246default buffering is used. 1247 1248The ``Open`` procedures in these packages open a file giving an 1249existing C Stream instead of a file name. Typically this stream is 1250imported from a C program, allowing an Ada file to operate on an 1251existing C file. 1252