1@c Copyright (C) 1999 Free Software Foundation, Inc. 2@c This is part of the G77 manual. 3@c For copying conditions, see the file g77.texi. 4 5@node Front End 6@chapter Front End 7@cindex GNU Fortran Front End (FFE) 8@cindex FFE 9@cindex @code{g77}, front end 10@cindex front end, @code{g77} 11 12This chapter describes some aspects of the design and implementation 13of the @code{g77} front end. 14 15To find about things that are ``To Be Determined'' or ``To Be Done'', 16search for the string TBD. 17If you want to help by working on one or more of these items, 18email @email{gcc@@gcc.gnu.org}. 19If you're planning to do more than just research issues and offer comments, 20see @uref{http://gcc.gnu.org/contribute.html} for steps you might 21need to take first. 22 23@menu 24* Overview of Sources:: 25* Overview of Translation Process:: 26* Philosophy of Code Generation:: 27* Two-pass Design:: 28* Challenges Posed:: 29* Transforming Statements:: 30* Transforming Expressions:: 31* Internal Naming Conventions:: 32@end menu 33 34@node Overview of Sources 35@section Overview of Sources 36 37The current directory layout includes the following: 38 39@table @file 40@item @value{srcdir}/gcc/ 41Non-g77 files in gcc 42 43@item @value{srcdir}/gcc/f/ 44GNU Fortran front end sources 45 46@item @value{srcdir}/libf2c/ 47@code{libg2c} configuration and @code{g2c.h} file generation 48 49@item @value{srcdir}/libf2c/libF77/ 50General support and math portion of @code{libg2c} 51 52@item @value{srcdir}/libf2c/libI77/ 53I/O portion of @code{libg2c} 54 55@item @value{srcdir}/libf2c/libU77/ 56Additional interfaces to Unix @code{libc} for @code{libg2c} 57@end table 58 59Components of note in @code{g77} are described below. 60 61@file{f/} as a whole contains the source for @code{g77}, 62while @file{libf2c/} contains a portion of the separate program 63@code{f2c}. 64Note that the @code{libf2c} code is not part of the program @code{g77}, 65just distributed with it. 66 67@file{f/} contains text files that document the Fortran compiler, source 68files for the GNU Fortran Front End (FFE), and some other stuff. 69The @code{g77} compiler code is placed in @file{f/} because it, 70along with its contents, 71is designed to be a subdirectory of a @code{gcc} source directory, 72@file{gcc/}, 73which is structured so that language-specific front ends can be ``dropped 74in'' as subdirectories. 75The C++ front end (@code{g++}), is an example of this---it resides in 76the @file{cp/} subdirectory. 77Note that the C front end (also referred to as @code{gcc}) 78is an exception to this, as its source files reside 79in the @file{gcc/} directory itself. 80 81@file{libf2c/} contains the run-time libraries for the @code{f2c} program, 82also used by @code{g77}. 83These libraries normally referred to collectively as @code{libf2c}. 84When built as part of @code{g77}, 85@code{libf2c} is installed under the name @code{libg2c} to avoid 86conflict with any existing version of @code{libf2c}, 87and thus is often referred to as @code{libg2c} when the 88@code{g77} version is specifically being referred to. 89 90The @code{netlib} version of @code{libf2c/} 91contains two distinct libraries, 92@code{libF77} and @code{libI77}, 93each in their own subdirectories. 94In @code{g77}, this distinction is not made, 95beyond maintaining the subdirectory structure in the source-code tree. 96 97@file{libf2c/} is not part of the program @code{g77}, 98just distributed with it. 99It contains files not present 100in the official (@code{netlib}) version of @code{libf2c}, 101and also contains some minor changes made from @code{libf2c}, 102to fix some bugs, 103and to facilitate automatic configuration, building, and installation of 104@code{libf2c} (as @code{libg2c}) for use by @code{g77} users. 105See @file{libf2c/README} for more information, 106including licensing conditions 107governing distribution of programs containing code from @code{libg2c}. 108 109@code{libg2c}, @code{g77}'s version of @code{libf2c}, 110adds Dave Love's implementation of @code{libU77}, 111in the @file{libf2c/libU77/} directory. 112This library is distributed under the 113GNU Library General Public License (LGPL)---see the 114file @file{libf2c/libU77/COPYING.LIB} 115for more information, 116as this license 117governs distribution conditions for programs containing code 118from this portion of the library. 119 120Files of note in @file{f/} and @file{libf2c/} are described below: 121 122@table @file 123@item f/BUGS 124Lists some important bugs known to be in g77. 125Or use Info (or GNU Emacs Info mode) to read 126the ``Actual Bugs'' node of the @code{g77} documentation: 127 128@smallexample 129info -f f/g77.info -n "Actual Bugs" 130@end smallexample 131 132@item f/ChangeLog 133Lists recent changes to @code{g77} internals. 134 135@item libf2c/ChangeLog 136Lists recent changes to @code{libg2c} internals. 137 138@item f/NEWS 139Contains the per-release changes. 140These include the user-visible 141changes described in the node ``Changes'' 142in the @code{g77} documentation, plus internal 143changes of import. 144Or use: 145 146@smallexample 147info -f f/g77.info -n News 148@end smallexample 149 150@item f/g77.info* 151The @code{g77} documentation, in Info format, 152produced by building @code{g77}. 153 154All users of @code{g77} (not just installers) should read this, 155using the @code{more} command if neither the @code{info} command, 156nor GNU Emacs (with its Info mode), are available, or if users 157aren't yet accustomed to using these tools. 158All of these files are readable as ``plain text'' files, 159though they're easier to navigate using Info readers 160such as @code{info} and GNU Emacs Info mode. 161@end table 162 163If you want to explore the FFE code, which lives entirely in @file{f/}, 164here are a few clues. 165The file @file{g77spec.c} contains the @code{g77}-specific source code 166for the @code{g77} command only---this just forms a variant of the 167@code{gcc} command, so, 168just as the @code{gcc} command itself does not contain the C front end, 169the @code{g77} command does not contain the Fortran front end (FFE). 170The FFE code ends up in an executable named @file{f771}, 171which does the actual compiling, 172so it contains the FFE plus the @code{gcc} back end (GBE), 173the latter to do most of the optimization, and the code generation. 174 175The file @file{parse.c} is the source file for @code{yyparse()}, 176which is invoked by the GBE to start the compilation process, 177for @file{f771}. 178 179The file @file{top.c} contains the top-level FFE function @code{ffe_file} 180and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*}, 181and @samp{FFE_[A-Za-z].*} symbols. 182 183The file @file{fini.c} is a @code{main()} program that is used when building 184the FFE to generate C header and source files for recognizing keywords. 185The files @file{malloc.c} and @file{malloc.h} comprise a memory manager 186that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and 187@samp{MALLOC_[A-Za-z].*} symbols. 188 189All other modules named @var{xyz} 190are comprised of all files named @samp{@var{xyz}*.@var{ext}} 191and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*}, 192and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols. 193If you understand all this, congratulations---it's easier for me to remember 194how it works than to type in these regular expressions. 195But it does make it easy to find where a symbol is defined. 196For example, the symbol @samp{ffexyz_set_something} would be defined 197in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}. 198 199The ``porting'' files of note currently are: 200 201@table @file 202@item proj.c 203@itemx proj.h 204This defines the ``language'' used by all the other source files, 205the language being Standard C plus some useful things 206like @code{ARRAY_SIZE} and such. 207 208@item target.c 209@itemx target.h 210These describe the target machine 211in terms of what data types are supported, 212how they are denoted 213(to what C type does an @code{INTEGER*8} map, for example), 214how to convert between them, 215and so on. 216Over time, versions of @code{g77} rely less on this file 217and more on run-time configuration based on GBE info 218in @file{com.c}. 219 220@item com.c 221@itemx com.h 222These are the primary interface to the GBE. 223 224@item ste.c 225@itemx ste.h 226This contains code for implementing recognized executable statements 227in the GBE. 228 229@item src.c 230@itemx src.h 231These contain information on the format(s) of source files 232(such as whether they are never to be processed as case-insensitive 233with regard to Fortran keywords). 234@end table 235 236If you want to debug the @file{f771} executable, 237for example if it crashes, 238note that the global variables @code{lineno} and @code{input_filename} 239are usually set to reflect the current line being read by the lexer 240during the first-pass analysis of a program unit and to reflect 241the current line being processed during the second-pass compilation 242of a program unit. 243 244If an invocation of the function @code{ffestd_exec_end} is on the stack, 245the compiler is in the second pass, otherwise it is in the first. 246 247(This information might help you reduce a test case and/or work around 248a bug in @code{g77} until a fix is available.) 249 250@node Overview of Translation Process 251@section Overview of Translation Process 252 253The order of phases translating source code to the form accepted 254by the GBE is: 255 256@enumerate 257@item 258Stripping punched-card sources (@file{g77stripcard.c}) 259 260@item 261Lexing (@file{lex.c}) 262 263@item 264Stand-alone statement identification (@file{sta.c}) 265 266@item 267INCLUDE handling (@file{sti.c}) 268 269@item 270Order-dependent statement identification (@file{stq.c}) 271 272@item 273Parsing (@file{stb.c} and @file{expr.c}) 274 275@item 276Constructing (@file{stc.c}) 277 278@item 279Collecting (@file{std.c}) 280 281@item 282Expanding (@file{ste.c}) 283@end enumerate 284 285To get a rough idea of how a particularly twisted Fortran statement 286gets treated by the passes, consider: 287 288@smallexample 289 FORMAT(I2 4H)=(J/ 290 & I3) 291@end smallexample 292 293The job of @file{lex.c} is to know enough about Fortran syntax rules 294to break the statement up into distinct lexemes without requiring 295any feedback from subsequent phases: 296 297@smallexample 298`FORMAT' 299`(' 300`I24H' 301`)' 302`=' 303`(' 304`J' 305`/' 306`I3' 307`)' 308@end smallexample 309 310The job of @file{sta.c} is to figure out the kind of statement, 311or, at least, statement form, that sequence of lexemes represent. 312 313The sooner it can do this (in terms of using the smallest number of 314lexemes, starting with the first for each statement), the better, 315because that leaves diagnostics for problems beyond the recognition 316of the statement form to subsequent phases, 317which can usually better describe the nature of the problem. 318 319In this case, the @samp{=} at ``level zero'' 320(not nested within parentheses) 321tells @file{sta.c} that this is an @emph{assignment-form}, 322not @code{FORMAT}, statement. 323 324An assignment-form statement might be a statement-function 325definition or an executable assignment statement. 326 327To make that determination, 328@file{sta.c} looks at the first two lexemes. 329 330Since the second lexeme is @samp{(}, 331the first must represent an array for this to be an assignment statement, 332else it's a statement function. 333 334Either way, @file{sta.c} hands off the statement to @file{stq.c} 335(via @file{sti.c}, which expands INCLUDE files). 336@file{stq.c} figures out what a statement that is, 337on its own, ambiguous, must actually be based on the context 338established by previous statements. 339 340So, @file{stq.c} watches the statement stream for executable statements, 341END statements, and so on, so it knows whether @samp{A(B)=C} is 342(intended as) a statement-function definition or an assignment statement. 343 344After establishing the context-aware statement info, @file{stq.c} 345passes the original sample statement on to @file{stb.c} 346(either its statement-function parser or its assignment-statement parser). 347 348@file{stb.c} forms a 349statement-specific record containing the pertinent information. 350That information includes a source expression and, 351for an assignment statement, a destination expression. 352Expressions are parsed by @file{expr.c}. 353 354This record is passed to @file{stc.c}, 355which copes with the implications of the statement 356within the context established by previous statements. 357 358For example, if it's the first statement in the file 359or after an @code{END} statement, 360@file{stc.c} recognizes that, first of all, 361a main program unit is now being lexed 362(and tells that to @file{std.c} 363before telling it about the current statement). 364 365@file{stc.c} attaches whatever information it can, 366usually derived from the context established by the preceding statements, 367and passes the information to @file{std.c}. 368 369@file{std.c} saves this information away, 370since the GBE cannot cope with information 371that might be incomplete at this stage. 372 373For example, @samp{I3} might later be determined 374to be an argument to an alternate @code{ENTRY} point. 375 376When @file{std.c} is told about the end of an external (top-level) 377program unit, 378it passes all the information it has saved away 379on statements in that program unit 380to @file{ste.c}. 381 382@file{ste.c} ``expands'' each statement, in sequence, by 383constructing the appropriate GBE information and calling 384the appropriate GBE routines. 385 386Details on the transformational phases follow. 387Keep in mind that Fortran numbering is used, 388so the first character on a line is column 1, 389decimal numbering is used, and so on. 390 391@menu 392* g77stripcard:: 393* lex.c:: 394* sta.c:: 395* sti.c:: 396* stq.c:: 397* stb.c:: 398* expr.c:: 399* stc.c:: 400* std.c:: 401* ste.c:: 402 403* Gotchas (Transforming):: 404* TBD (Transforming):: 405@end menu 406 407@node g77stripcard 408@subsection g77stripcard 409 410The @code{g77stripcard} program handles removing content beyond 411column 72 (adjustable via a command-line option), 412optionally warning about that content being something other 413than trailing whitespace or Fortran commentary. 414 415This program is needed because @code{lex.c} doesn't pay attention 416to maximum line lengths at all, to make it easier to maintain, 417as well as faster (for sources that don't depend on the maximum 418column length vis-a-vis trailing non-blank non-commentary content). 419 420Just how this program will be run---whether automatically for 421old source (perhaps as the default for @file{.f} files?)---is not 422yet determined. 423 424In the meantime, it might as well be implemented as a typical UNIX pipe. 425 426It should accept a @samp{-fline-length-@var{n}} option, 427with the default line length set to 72. 428 429When the text it strips off the end of a line is not blank 430(not spaces and tabs), 431it should insert an additional comment line 432(beginning with @samp{!}, 433so it works for both fixed-form and free-form files) 434containing the text, 435following the stripped line. 436The inserted comment should have a prefix of some kind, 437TBD, that distinguishes the comment as representing stripped text. 438Users could use that to @code{sed} out such lines, if they wished---it 439seems silly to provide a command-line option to delete information 440when it can be so easily filtered out by another program. 441 442(This inserted comment should be designed to ``fit in'' well 443with whatever the Fortran community is using these days for 444preprocessor, translator, and other such products, like OpenMP. 445What that's all about, and how @code{g77} can elegantly fit its 446special comment conventions into it all, is TBD as well. 447We don't want to reinvent the wheel here, but if there turn out 448to be too many conflicting conventions, we might have to invent 449one that looks nothing like the others, but which offers their 450host products a better infrastructure in which to fit and coexist 451peacefully.) 452 453@code{g77stripcard} probably shouldn't do any tab expansion or other 454fancy stuff. 455People can use @code{expand} or other pre-filtering if they like. 456The idea here is to keep each stage quite simple, while providing 457excellent performance for ``normal'' code. 458 459(Code with junk beyond column 73 is not really ``normal'', 460as it comes from a card-punch heritage, 461and will be increasingly hard for tomorrow's Fortran programmers to read.) 462 463@node lex.c 464@subsection lex.c 465 466To help make the lexer simple, fast, and easy to maintain, 467while also having @code{g77} generally encourage Fortran programmers 468to write simple, maintainable, portable code by maximizing the 469performance of compiling that kind of code: 470 471@itemize @bullet 472@item 473There'll be just one lexer, for both fixed-form and free-form source. 474 475@item 476It'll care about the form only when handling the first 7 columns of 477text, stuff like spaces between strings of alphanumerics, and 478how lines are continued. 479 480Some other distinctions will be handled by subsequent phases, 481so at least one of them will have to know which form is involved. 482 483For example, @samp{I = 2 . 4} is acceptable in fixed form, 484and works in free form as well given the implementation @code{g77} 485presently uses. 486But the standard requires a diagnostic for it in free form, 487so the parser has to be able to recognize that 488the lexemes aren't contiguous 489(information the lexer @emph{does} have to provide) 490and that free-form source is being parsed, 491so it can provide the diagnostic. 492 493The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme. 494Otherwise, it'd have to know a whole lot more about how to parse Fortran, 495or subsequent phases (mainly parsing) would have two paths through 496lots of critical code---one to handle the lexeme @samp{2}, @samp{.}, 497and @samp{4} in sequence, another to handle the lexeme @samp{2.4}. 498 499@item 500It won't worry about line lengths 501(beyond the first 7 columns for fixed-form source). 502 503That is, once it starts parsing the ``statement'' part of a line 504(column 7 for fixed-form, column 1 for free-form), 505it'll keep going until it finds a newline, 506rather than ignoring everything past a particular column 507(72 or 132). 508 509The implication here is that there shouldn't @emph{be} 510anything past that last column, other than whitespace or 511commentary, because users using typical editors 512(or viewing output as typically printed) 513won't necessarily know just where the last column is. 514 515Code that has ``garbage'' beyond the last column 516(almost certainly only fixed-form code with a punched-card legacy, 517such as code using columns 73-80 for ``sequence numbers'') 518will have to be run through @code{g77stripcard} first. 519 520Also, keeping track of the maximum column position while also watching out 521for the end of a line @emph{and} while reading from a file 522just makes things slower. 523Since a file must be read, and watching for the end of the line 524is necessary (unless the typical input file was preprocessed to 525include the necessary number of trailing spaces), 526dropping the tracking of the maximum column position 527is the only way to reduce the complexity of the pertinent code 528while maintaining high performance. 529 530@item 531ASCII encoding is assumed for the input file. 532 533Code written in other character sets will have to be converted first. 534 535@item 536Tabs (ASCII code 9) 537will be converted to spaces via the straightforward 538approach. 539 540Specifically, a tab is converted to between one and eight spaces 541as necessary to reach column @var{n}, 542where dividing @samp{(@var{n} - 1)} by eight 543results in a remainder of zero. 544 545That saves having to pass most source files through @code{expand}. 546 547@item 548Linefeeds (ASCII code 10) 549mark the ends of lines. 550 551@item 552A carriage return (ASCII code 13) 553is accept if it immediately precedes a linefeed, 554in which case it is ignored. 555 556Otherwise, it is rejected (with a diagnostic). 557 558@item 559Any other characters other than the above 560that are not part of the GNU Fortran Character Set 561(@pxref{Character Set}) 562are rejected with a diagnostic. 563 564This includes backspaces, form feeds, and the like. 565 566(It might make sense to allow a form feed in column 1 567as long as that's the only character on a line. 568It certainly wouldn't seem to cost much in terms of performance.) 569 570@item 571The end of the input stream (EOF) 572ends the current line. 573 574@item 575The distinction between uppercase and lowercase letters 576will be preserved. 577 578It will be up to subsequent phases to decide to fold case. 579 580Current plans are to permit any casing for Fortran (reserved) keywords 581while preserving casing for user-defined names. 582(This might not be made the default for @file{.f} files, though.) 583 584Preserving case seems necessary to provide more direct access 585to facilities outside of @code{g77}, such as to C or Pascal code. 586 587Names of intrinsics will probably be matchable in any case, 588 589(How @samp{external SiN; r = sin(x)} would be handled is TBD. 590I think old @code{g77} might already handle that pretty elegantly, 591but whether we can cope with allowing the same fragment to reference 592a @emph{different} procedure, even with the same interface, 593via @samp{s = SiN(r)}, needs to be determined. 594If it can't, we need to make sure that when code introduces 595a user-defined name, any intrinsic matching that name 596using a case-insensitive comparison 597is ``turned off''.) 598 599@item 600Backslashes in @code{CHARACTER} and Hollerith constants 601are not allowed. 602 603This avoids the confusion introduced by some Fortran compiler vendors 604providing C-like interpretation of backslashes, 605while others provide straight-through interpretation. 606 607Some kind of lexical construct (TBD) will be provided to allow 608flagging of a @code{CHARACTER} 609(but probably not a Hollerith) 610constant that permits backslashes. 611It'll necessarily be a prefix, such as: 612 613@smallexample 614PRINT *, C'This line has a backspace \b here.' 615PRINT *, F'This line has a straight backslash \ here.' 616@end smallexample 617 618Further, command-line options might be provided to specify that 619one prefix or the other is to be assumed as the default 620for @code{CHARACTER} constants. 621 622However, it seems more helpful for @code{g77} to provide a program 623that converts prefix all constants 624(or just those containing backslashes) 625with the desired designation, 626so printouts of code can be read 627without knowing the compile-time options used when compiling it. 628 629If such a program is provided 630(let's name it @code{g77slash} for now), 631then a command-line option to @code{g77} should not be provided. 632(Though, given that it'll be easy to implement, it might be hard 633to resist user requests for it ``to compile faster than if we 634have to invoke another filter''.) 635 636This program would take a command-line option to specify the 637default interpretation of slashes, 638affecting which prefix it uses for constants. 639 640@code{g77slash} probably should automatically convert Hollerith 641constants that contain slashes 642to the appropriate @code{CHARACTER} constants. 643Then @code{g77} wouldn't have to define a prefix syntax for Hollerith 644constants specifying whether they want C-style or straight-through 645backslashes. 646 647@item 648To allow for form-neutral INCLUDE files without requiring them 649to be preprocessed, 650the fixed-form lexer should offer an extension (if possible) 651allowing a trailing @samp{&} to be ignored, especially if after 652column 72, as it would be using the traditional Unix Fortran source 653model (which ignores @emph{everything} after column 72). 654@end itemize 655 656The above implements nearly exactly what is specified by 657@ref{Character Set}, 658and 659@ref{Lines}, 660except it also provides automatic conversion of tabs 661and ignoring of newline-related carriage returns, 662as well as accommodating form-neutral INCLUDE files. 663 664It also implements the ``pure visual'' model, 665by which is meant that a user viewing his code 666in a typical text editor 667(assuming it's not preprocessed via @code{g77stripcard} or similar) 668doesn't need any special knowledge 669of whether spaces on the screen are really tabs, 670whether lines end immediately after the last visible non-space character 671or after a number of spaces and tabs that follow it, 672or whether the last line in the file is ended by a newline. 673 674Most editors don't make these distinctions, 675the ANSI FORTRAN 77 standard doesn't require them to, 676and it permits a standard-conforming compiler 677to define a method for transforming source code to 678``standard form'' however it wants. 679 680So, GNU Fortran defines it such that users have the best chance 681of having the code be interpreted the way it looks on the screen 682of the typical editor. 683 684(Fancy editors should @emph{never} be required to correctly read code 685written in classic two-dimensional-plaintext form. 686By correct reading I mean ability to read it, book-like, without 687mistaking text ignored by the compiler for program code and vice versa, 688and without having to count beyond the first several columns. 689The vague meaning of ASCII TAB, among other things, complicates 690this somewhat, but as long as ``everyone'', including the editor, 691other tools, and printer, agrees about the every-eighth-column convention, 692the GNU Fortran ``pure visual'' model meets these requirements. 693Any language or user-visible source form 694requiring special tagging of tabs, 695the ends of lines after spaces/tabs, 696and so on, fails to meet this fairly straightforward specification. 697Fortunately, Fortran @emph{itself} does not mandate such a failure, 698though most vendor-supplied defaults for their Fortran compilers @emph{do} 699fail to meet this specification for readability.) 700 701Further, this model provides a clean interface 702to whatever preprocessors or code-generators are used 703to produce input to this phase of @code{g77}. 704Mainly, they need not worry about long lines. 705 706@node sta.c 707@subsection sta.c 708 709@node sti.c 710@subsection sti.c 711 712@node stq.c 713@subsection stq.c 714 715@node stb.c 716@subsection stb.c 717 718@node expr.c 719@subsection expr.c 720 721@node stc.c 722@subsection stc.c 723 724@node std.c 725@subsection std.c 726 727@node ste.c 728@subsection ste.c 729 730@node Gotchas (Transforming) 731@subsection Gotchas (Transforming) 732 733This section is not about transforming ``gotchas'' into something else. 734It is about the weirder aspects of transforming Fortran, 735however that's defined, 736into a more modern, canonical form. 737 738@subsubsection Multi-character Lexemes 739 740Each lexeme carries with it a pointer to where it appears in the source. 741 742To provide the ability for diagnostics to point to column numbers, 743in addition to line numbers and names, 744lexemes that represent more than one (significant) character 745in the source code need, generally, 746to provide pointers to where each @emph{character} appears in the source. 747 748This provides the ability to properly identify the precise location 749of the problem in code like 750 751@smallexample 752SUBROUTINE X 753END 754BLOCK DATA X 755END 756@end smallexample 757 758which, in fixed-form source, would result in single lexemes 759consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}. 760(The problem is that @samp{X} is defined twice, 761so a pointer to the @samp{X} in the second definition, 762as well as a follow-up pointer to the corresponding pointer in the first, 763would be preferable to pointing to the beginnings of the statements.) 764 765This need also arises when parsing (and diagnosing) @code{FORMAT} 766statements. 767 768Further, it arises when diagnosing 769@code{FMT=} specifiers that contain constants 770(or partial constants, or even propagated constants!) 771in I/O statements, as in: 772 773@smallexample 774PRINT '(I2, 3HAB)', J 775@end smallexample 776 777(A pointer to the beginning of the prematurely-terminated Hollerith 778constant, and/or to the close parenthese, is preferable to a pointer 779to the open-parenthese or the apostrophe that precedes it.) 780 781Multi-character lexemes, which would seem to naturally include 782at least digit strings, alphanumeric strings, @code{CHARACTER} 783constants, and Hollerith constants, therefore need to provide 784location information on each character. 785(Maybe Hollerith constants don't, but it's unnecessary to except them.) 786 787The question then arises, what about @emph{other} multi-character lexemes, 788such as @samp{**} and @samp{//}, 789and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on? 790 791Turns out there's a need to identify the location of the second character 792of these two-character lexemes. 793For example, in @samp{I(/J) = K}, the slash needs to be diagnosed 794as the problem, not the open parenthese. 795Similarly, it is preferable to diagnose the second slash in 796@samp{I = J // K} rather than the first, given the implicit typing 797rules, which would result in the compiler disallowing the attempted 798concatenation of two integers. 799(Though, since that's more of a semantic issue, 800it's not @emph{that} much preferable.) 801 802Even sequences that could be parsed as digit strings could use location info, 803for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}. 804(This probably will be parsed as a character string, 805to be consistent with the parsing of @samp{Z'129A'}.) 806 807To avoid the hassle of recording the location of the second character, 808while also preserving the general rule that each significant character 809is distinctly pointed to by the lexeme that contains it, 810it's best to simply not have any fixed-size lexemes 811larger than one character. 812 813This new design is expected to make checking for two 814@samp{*} lexemes in a row much easier than the old design, 815so this is not much of a sacrifice. 816It probably makes the lexer much easier to implement 817than it makes the parser harder. 818 819@subsubsection Space-padding Lexemes 820 821Certain lexemes need to be padded with virtual spaces when the 822end of the line (or file) is encountered. 823 824This is necessary in fixed form, to handle lines that don't 825extend to column 72, assuming that's the line length in effect. 826 827@subsubsection Bizarre Free-form Hollerith Constants 828 829Last I checked, the Fortran 90 standard actually required the compiler 830to silently accept something like 831 832@smallexample 833FORMAT ( 1 2 Htwelve chars ) 834@end smallexample 835 836as a valid @code{FORMAT} statement specifying a twelve-character 837Hollerith constant. 838 839The implication here is that, since the new lexer is a zero-feedback one, 840it won't know that the special case of a @code{FORMAT} statement being parsed 841requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as 842a single lexeme. 843 844(This is a horrible misfeature of the Fortran 90 language. 845It's one of many such misfeatures that almost make me want 846to not support them, and forge ahead with designing a new 847``GNU Fortran'' language that has the features, 848but not the misfeatures, of Fortran 90, 849and provide utility programs to do the conversion automatically.) 850 851So, the lexer must gather distinct chunks of decimal strings into 852a single lexeme in contexts where a single decimal lexeme might 853start a Hollerith constant. 854 855(Which probably means it might as well do that all the time 856for all multi-character lexemes, even in free-form mode, 857leaving it to subsequent phases to pull them apart as they see fit.) 858 859Compare the treatment of this to how 860 861@smallexample 862CHARACTER * 4 5 HEY 863@end smallexample 864 865and 866 867@smallexample 868CHARACTER * 12 HEY 869@end smallexample 870 871must be treated---the former must be diagnosed, due to the separation 872between lexemes, the latter must be accepted as a proper declaration. 873 874@subsubsection Hollerith Constants 875 876Recognizing a Hollerith constant---specifically, 877that an @samp{H} or @samp{h} after a digit string begins 878such a constant---requires some knowledge of context. 879 880Hollerith constants (such as @samp{2HAB}) can appear after: 881 882@itemize @bullet 883@item 884@samp{(} 885 886@item 887@samp{,} 888 889@item 890@samp{=} 891 892@item 893@samp{+}, @samp{-}, @samp{/} 894 895@item 896@samp{*}, except as noted below 897@end itemize 898 899Hollerith constants don't appear after: 900 901@itemize @bullet 902@item 903@samp{CHARACTER*}, 904which can be treated generally as 905any @samp{*} that is the second lexeme of a statement 906@end itemize 907 908@subsubsection Confusing Function Keyword 909 910While 911 912@smallexample 913REAL FUNCTION FOO () 914@end smallexample 915 916must be a @code{FUNCTION} statement and 917 918@smallexample 919REAL FUNCTION FOO (5) 920@end smallexample 921 922must be a type-definition statement, 923 924@smallexample 925REAL FUNCTION FOO (@var{names}) 926@end smallexample 927 928where @var{names} is a comma-separated list of names, 929can be one or the other. 930 931The only way to disambiguate that statement 932(short of mandating free-form source or a short maximum 933length for name for external procedures) 934is based on the context of the statement. 935 936In particular, the statement is known to be within an 937already-started program unit 938(but not at the outer level of the @code{CONTAINS} block), 939it is a type-declaration statement. 940 941Otherwise, the statement is a @code{FUNCTION} statement, 942in that it begins a function program unit 943(external, or, within @code{CONTAINS}, nested). 944 945@subsubsection Weird READ 946 947The statement 948 949@smallexample 950READ (N) 951@end smallexample 952 953is equivalent to either 954 955@smallexample 956READ (UNIT=(N)) 957@end smallexample 958 959or 960 961@smallexample 962READ (FMT=(N)) 963@end smallexample 964 965depending on which would be valid in context. 966 967Specifically, if @samp{N} is type @code{INTEGER}, 968@samp{READ (FMT=(N))} would not be valid, 969because parentheses may not be used around @samp{N}, 970whereas they may around it in @samp{READ (UNIT=(N))}. 971 972Further, if @samp{N} is type @code{CHARACTER}, 973the opposite is true---@samp{READ (UNIT=(N))} is not valid, 974but @samp{READ (FMT=(N))} is. 975 976Strictly speaking, if anything follows 977 978@smallexample 979READ (N) 980@end smallexample 981 982in the statement, whether the first lexeme after the close 983parenthese is a comma could be used to disambiguate the two cases, 984without looking at the type of @samp{N}, 985because the comma is required for the @samp{READ (FMT=(N))} 986interpretation and disallowed for the @samp{READ (UNIT=(N))} 987interpretation. 988 989However, in practice, many Fortran compilers allow 990the comma for the @samp{READ (UNIT=(N))} 991interpretation anyway 992(in that they generally allow a leading comma before 993an I/O list in an I/O statement), 994and much code takes advantage of this allowance. 995 996(This is quite a reasonable allowance, since the 997juxtaposition of a comma-separated list immediately 998after an I/O control-specification list, which is also comma-separated, 999without an intervening comma, 1000looks sufficiently ``wrong'' to programmers 1001that they can't resist the itch to insert the comma. 1002@samp{READ (I, J), K, L} simply looks cleaner than 1003@samp{READ (I, J) K, L}.) 1004 1005So, type-based disambiguation is needed unless strict adherence 1006to the standard is always assumed, and we're not going to assume that. 1007 1008@node TBD (Transforming) 1009@subsection TBD (Transforming) 1010 1011Continue researching gotchas, designing the transformational process, 1012and implementing it. 1013 1014Specific issues to resolve: 1015 1016@itemize @bullet 1017@item 1018Just where should (if it was implemented) @code{USE} processing take place? 1019 1020This gets into the whole issue of how @code{g77} should handle the concept 1021of modules. 1022I think GNAT already takes on this issue, but don't know more than that. 1023Jim Giles has written extensively on @code{comp.lang.fortran} 1024about his opinions on module handling, as have others. 1025Jim's views should be taken into account. 1026 1027Actually, Richard M. Stallman (RMS) also has written up 1028some guidelines for implementing such things, 1029but I'm not sure where I read them. 1030Perhaps the old @email{gcc2@@cygnus.com} list. 1031 1032If someone could dig references to these up and get them to me, 1033that would be much appreciated! 1034Even though modules are not on the short-term list for implementation, 1035it'd be helpful to know @emph{now} how to avoid making them harder to 1036implement them @emph{later}. 1037 1038@item 1039Should the @code{g77} command become just a script that invokes 1040all the various preprocessing that might be needed, 1041thus making it seem slower than necessary for legacy code 1042that people are unwilling to convert, 1043or should we provide a separate script for that, 1044thus encouraging people to convert their code once and for all? 1045 1046At least, a separate script to behave as old @code{g77} did, 1047perhaps named @code{g77old}, might ease the transition, 1048as might a corresponding one that converts source codes 1049named @code{g77oldnew}. 1050 1051These scripts would take all the pertinent options @code{g77} used 1052to take and run the appropriate filters, 1053passing the results to @code{g77} or just making new sources out of them 1054(in a subdirectory, leaving the user to do the dirty deed of 1055moving or copying them over the old sources). 1056 1057@item 1058Do other Fortran compilers provide a prefix syntax 1059to govern the treatment of backslashes in @code{CHARACTER} 1060(or Hollerith) constants? 1061 1062Knowing what other compilers provide would help. 1063 1064@item 1065Is it okay to drop support for the @samp{-fintrin-case-initcap}, 1066@samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap}, 1067and @samp{-fcase-initcap} options? 1068 1069I've asked @email{info-gnu-fortran@@gnu.org} for input on this. 1070Not having to support these makes it easier to write the new front end, 1071and might also avoid complicated its design. 1072 1073The consensus to date (1999-11-17) has been to drop this support. 1074Can't recall anybody saying they're using it, in fact. 1075@end itemize 1076 1077@node Philosophy of Code Generation 1078@section Philosophy of Code Generation 1079 1080Don't poke the bear. 1081 1082The @code{g77} front end generates code 1083via the @code{gcc} back end. 1084 1085@cindex GNU Back End (GBE) 1086@cindex GBE 1087@cindex @code{gcc}, back end 1088@cindex back end, gcc 1089@cindex code generator 1090The @code{gcc} back end (GBE) is a large, complex 1091labyrinth of intricate code 1092written in a combination of the C language 1093and specialized languages internal to @code{gcc}. 1094 1095While the @emph{code} that implements the GBE 1096is written in a combination of languages, 1097the GBE itself is, 1098to the front end for a language like Fortran, 1099best viewed as a @emph{compiler} 1100that compiles its own, unique, language. 1101 1102The GBE's ``source'', then, is written in this language, 1103which consists primarily of 1104a combination of calls to GBE functions 1105and @dfn{tree} nodes 1106(which are, themselves, created 1107by calling GBE functions). 1108 1109So, the @code{g77} generates code by, in effect, 1110translating the Fortran code it reads 1111into a form ``written'' in the ``language'' 1112of the @code{gcc} back end. 1113 1114@cindex GBEL 1115@cindex GNU Back End Language (GBEL) 1116This language will heretofore be referred to as @dfn{GBEL}, 1117for GNU Back End Language. 1118 1119GBEL is an evolving language, 1120not fully specified in any published form 1121as of this writing. 1122It offers many facilities, 1123but its ``core'' facilities 1124are those that corresponding most directly 1125to those needed to support @code{gcc} 1126(compiling code written in GNU C). 1127 1128The @code{g77} Fortran Front End (FFE) 1129is designed and implemented 1130to navigate the currents and eddies 1131of ongoing GBEL and @code{gcc} development 1132while also delivering on the potential 1133of an integrated FFE 1134(as compared to using a converter like @code{f2c} 1135and feeding the output into @code{gcc}). 1136 1137Goals of the FFE's code-generation strategy include: 1138 1139@itemize @bullet 1140@item 1141High likelihood of generation of correct code, 1142or, failing that, producing a fatal diagnostic or crashing. 1143 1144@item 1145Generation of highly optimized code, 1146as directed by the user 1147via GBE-specific (versus @code{g77}-specific) constructs, 1148such as command-line options. 1149 1150@item 1151Fast overall (FFE plus GBE) compilation. 1152 1153@item 1154Preservation of source-level debugging information. 1155@end itemize 1156 1157The strategies historically, and currently, used by the FFE 1158to achieve these goals include: 1159 1160@itemize @bullet 1161@item 1162Use of GBEL constructs that most faithfully encapsulate 1163the semantics of Fortran. 1164 1165@item 1166Avoidance of GBEL constructs that are so rarely used, 1167or limited to use in specialized situations not related to Fortran, 1168that their reliability and performance has not yet been established 1169as sufficient for use by the FFE. 1170 1171@item 1172Flexible design, to readily accommodate changes to specific 1173code-generation strategies, perhaps governed by command-line options. 1174@end itemize 1175 1176@cindex Bear-poking 1177@cindex Poking the bear 1178``Don't poke the bear'' somewhat summarizes the above strategies. 1179The GBE is the bear. 1180The FFE is designed and implemented to avoid poking it 1181in ways that are likely to just annoy it. 1182The FFE usually either tackles it head-on, 1183or avoids treating it in ways dissimilar to how 1184the @code{gcc} front end treats it. 1185 1186For example, the FFE uses the native array facility in the back end 1187instead of the lower-level pointer-arithmetic facility 1188used by @code{gcc} when compiling @code{f2c} output). 1189Theoretically, this presents more opportunities for optimization, 1190faster compile times, 1191and the production of more faithful debugging information. 1192These benefits were not, however, immediately realized, 1193mainly because @code{gcc} itself makes little or no use 1194of the native array facility. 1195 1196Complex arithmetic is a case study of the evolution of this strategy. 1197When originally implemented, 1198the GBEL had just evolved its own native complex-arithmetic facility, 1199so the FFE took advantage of that. 1200 1201When porting @code{g77} to 64-bit systems, 1202it was discovered that the GBE didn't really 1203implement its native complex-arithmetic facility properly. 1204 1205The short-term solution was to rewrite the FFE 1206to instead use the lower-level facilities 1207that'd be used by @code{gcc}-compiled code 1208(assuming that code, itself, didn't use the native complex type 1209provided, as an extension, by @code{gcc}), 1210since these were known to work, 1211and, in any case, if shown to not work, 1212would likely be rapidly fixed 1213(since they'd likely not work for vanilla C code in similar circumstances). 1214 1215However, the rewrite accommodated the original, native approach as well 1216by offering a command-line option to select it over the emulated approach. 1217This allowed users, and especially GBE maintainers, to try out 1218fixes to complex-arithmetic support in the GBE 1219while @code{g77} continued to default to compiling more code correctly, 1220albeit producing (typically) slower executables. 1221 1222As of April 1999, it appeared that the last few bugs 1223in the GBE's support of its native complex-arithmetic facility 1224were worked out. 1225The FFE was changed back to default to using that native facility, 1226leaving emulation as an option. 1227 1228Later during the release cycle 1229(which was called EGCS 1.2, but soon became GCC 2.95), 1230bugs in the native facility were found. 1231Reactions among various people included 1232``the last thing we should do is change the default back'', 1233``we must change the default back'', 1234and ``let's figure out whether we can narrow down the bugs to 1235few enough cases to allow the now-months-long-tested default 1236to remain the same''. 1237The latter viewpoint won that particular time. 1238The bugs exposed other concerns regarding ABI compliance 1239when the ABI specified treatment of complex data as different 1240from treatment of what Fortran and GNU C consider the equivalent 1241aggregation (structure) of real (or float) pairs. 1242 1243Other Fortran constructs---arrays, character strings, 1244complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates, 1245and so on---involve issues similar to those pertaining to complex arithmetic. 1246 1247So, it is possible that the history 1248of how the FFE handled complex arithmetic 1249will be repeated, probably in modified form 1250(and hopefully over shorter timeframes), 1251for some of these other facilities. 1252 1253@node Two-pass Design 1254@section Two-pass Design 1255 1256The FFE does not tell the GBE anything about a program unit 1257until after the last statement in that unit has been parsed. 1258(A program unit is a Fortran concept that corresponds, in the C world, 1259mostly closely to functions definitions in ISO C. 1260That is, a program unit in Fortran is like a top-level function in C. 1261Nested functions, found among the extensions offered by GNU C, 1262correspond roughly to Fortran's statement functions.) 1263 1264So, while parsing the code in a program unit, 1265the FFE saves up all the information 1266on statements, expressions, names, and so on, 1267until it has seen the last statement. 1268 1269At that point, the FFE revisits the saved information 1270(in what amounts to a second @dfn{pass} over the program unit) 1271to perform the actual translation of the program unit into GBEL, 1272ultimating in the generation of assembly code for it. 1273 1274Some lookahead is performed during this second pass, 1275so the FFE could be viewed as a ``two-plus-pass'' design. 1276 1277@menu 1278* Two-pass Code:: 1279* Why Two Passes:: 1280@end menu 1281 1282@node Two-pass Code 1283@subsection Two-pass Code 1284 1285Most of the code that turns the first pass (parsing) 1286into a second pass for code generation 1287is in @file{@value{path-g77}/std.c}. 1288 1289It has external functions, 1290called mainly by siblings in @file{@value{path-g77}/stc.c}, 1291that record the information on statements and expressions 1292in the order they are seen in the source code. 1293These functions save that information. 1294 1295It also has an external function that revisits that information, 1296calling the siblings in @file{@value{path-g77}/ste.c}, 1297which handles the actual code generation 1298(by generating GBEL code, 1299that is, by calling GBE routines 1300to represent and specify expressions, statements, and so on). 1301 1302@node Why Two Passes 1303@subsection Why Two Passes 1304 1305The need for two passes was not immediately evident 1306during the design and implementation of the code in the FFE 1307that was to produce GBEL. 1308Only after a few kludges, 1309to handle things like incorrectly-guessed @code{ASSIGN} label nature, 1310had been implemented, 1311did enough evidence pile up to make it clear 1312that @file{std.c} had to be introduced to intercept, 1313save, then revisit as part of a second pass, 1314the digested contents of a program unit. 1315 1316Other such missteps have occurred during the evolution of the FFE, 1317because of the different goals of the FFE and the GBE. 1318 1319Because the GBE's original, and still primary, goal 1320was to directly support the GNU C language, 1321the GBEL, and the GBE itself, 1322requires more complexity 1323on the part of most front ends 1324than it requires of @code{gcc}'s. 1325 1326For example, 1327the GBEL offers an interface that permits the @code{gcc} front end 1328to implement most, or all, of the language features it supports, 1329without the front end having to 1330make use of non-user-defined variables. 1331(It's almost certainly the case that all of K&R C, 1332and probably ANSI C as well, 1333is handled by the @code{gcc} front end 1334without declaring such variables.) 1335 1336The FFE, on the other hand, must resort to a variety of ``tricks'' 1337to achieve its goals. 1338 1339Consider the following C code: 1340 1341@smallexample 1342int 1343foo (int a, int b) 1344@{ 1345 int c = 0; 1346 1347 if ((c = bar (c)) == 0) 1348 goto done; 1349 1350 quux (c << 1); 1351 1352done: 1353 return c; 1354@} 1355@end smallexample 1356 1357Note what kinds of objects are declared, or defined, before their use, 1358and before any actual code generation involving them 1359would normally take place: 1360 1361@itemize @bullet 1362@item 1363Return type of function 1364 1365@item 1366Entry point(s) of function 1367 1368@item 1369Dummy arguments 1370 1371@item 1372Variables 1373 1374@item 1375Initial values for variables 1376@end itemize 1377 1378Whereas, the following items can, and do, 1379suddenly appear ``out of the blue'' in C: 1380 1381@itemize @bullet 1382@item 1383Label references 1384 1385@item 1386Function references 1387@end itemize 1388 1389Not surprisingly, the GBE faithfully permits the latter set of items 1390to be ``discovered'' partway through GBEL ``programs'', 1391just as they are permitted to in C. 1392 1393Yet, the GBE has tended, at least in the past, 1394to be reticent to fully support similar ``late'' discovery 1395of items in the former set. 1396 1397This makes Fortran a poor fit for the ``safe'' subset of GBEL. 1398Consider: 1399 1400@smallexample 1401 FUNCTION X (A, ARRAY, ID1) 1402 CHARACTER*(*) A 1403 DOUBLE PRECISION X, Y, Z, TMP, EE, PI 1404 REAL ARRAY(ID1*ID2) 1405 COMMON ID2 1406 EXTERNAL FRED 1407 1408 ASSIGN 100 TO J 1409 CALL FOO (I) 1410 IF (I .EQ. 0) PRINT *, A(0) 1411 GOTO 200 1412 1413 ENTRY Y (Z) 1414 ASSIGN 101 TO J 1415200 PRINT *, A(1) 1416 READ *, TMP 1417 GOTO J 1418100 X = TMP * EE 1419 RETURN 1420101 Y = TMP * PI 1421 CALL FRED 1422 DATA EE, PI /2.71D0, 3.14D0/ 1423 END 1424@end smallexample 1425 1426Here are some observations about the above code, 1427which, while somewhat contrived, 1428conforms to the FORTRAN 77 and Fortran 90 standards: 1429 1430@itemize @bullet 1431@item 1432The return type of function @samp{X} is not known 1433until the @samp{DOUBLE PRECISION} line has been parsed. 1434 1435@item 1436Whether @samp{A} is a function or a variable 1437is not known until the @samp{PRINT *, A(0)} statement 1438has been parsed. 1439 1440@item 1441The bounds of the array of argument @samp{ARRAY} 1442depend on a computation involving 1443the subsequent argument @samp{ID1} 1444and the blank-common member @samp{ID2}. 1445 1446@item 1447Whether @samp{Y} and @samp{Z} are local variables, 1448additional function entry points, 1449or dummy arguments to additional entry points 1450is not known 1451until the @code{ENTRY} statement is parsed. 1452 1453@item 1454Similarly, whether @samp{TMP} is a local variable is not known 1455until the @samp{READ *, TMP} statement is parsed. 1456 1457@item 1458The initial values for @samp{EE} and @samp{PI} 1459are not known until after the @code{DATA} statement is parsed. 1460 1461@item 1462Whether @samp{FRED} is a function returning type @code{REAL} 1463or a subroutine 1464(which can be thought of as returning type @code{void} 1465@emph{or}, to support alternate returns in a simple way, 1466type @code{int}) 1467is not known 1468until the @samp{CALL FRED} statement is parsed. 1469 1470@item 1471Whether @samp{100} is a @code{FORMAT} label 1472or the label of an executable statement 1473is not known 1474until the @samp{X =} statement is parsed. 1475(These two types of labels get @emph{very} different treatment, 1476especially when @code{ASSIGN}'ed.) 1477 1478@item 1479That @samp{J} is a local variable is not known 1480until the first @code{ASSIGN} statement is parsed. 1481(This happens @emph{after} executable code has been seen.) 1482@end itemize 1483 1484Very few of these ``discoveries'' 1485can be accommodated by the GBE as it has evolved over the years. 1486The GBEL doesn't support several of them, 1487and those it might appear to support 1488don't always work properly, 1489especially in combination with other GBEL and GBE features, 1490as implemented in the GBE. 1491 1492(Had the GBE and its GBEL originally evolved to support @code{g77}, 1493the shoe would be on the other foot, so to speak---most, if not all, 1494of the above would be directly supported by the GBEL, 1495and a few C constructs would probably not, as they are in reality, 1496be supported. 1497Both this mythical, and today's real, GBE caters to its GBEL 1498by, sometimes, scrambling around, cleaning up after itself---after 1499discovering that assumptions it made earlier during code generation 1500are incorrect. 1501That's not a great design, since it indicates significant code 1502paths that might be rarely tested but used in some key production 1503environments.) 1504 1505So, the FFE handles these discrepancies---between the order in which 1506it discovers facts about the code it is compiling, 1507and the order in which the GBEL and GBE support such discoveries---by 1508performing what amounts to two 1509passes over each program unit. 1510 1511(A few ambiguities can remain at that point, 1512such as whether, given @samp{EXTERNAL BAZ} 1513and no other reference to @samp{BAZ} in the program unit, 1514it is a subroutine, a function, or a block-data---which, in C-speak, 1515governs its declared return type. 1516Fortunately, these distinctions are easily finessed 1517for the procedure, library, and object-file interfaces 1518supported by @code{g77}.) 1519 1520@node Challenges Posed 1521@section Challenges Posed 1522 1523Consider the following Fortran code, which uses various extensions 1524(including some to Fortran 90): 1525 1526@smallexample 1527SUBROUTINE X(A) 1528CHARACTER*(*) A 1529COMPLEX CFUNC 1530INTEGER*2 CLOCKS(200) 1531INTEGER IFUNC 1532 1533CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')')))) 1534@end smallexample 1535 1536The above poses the following challenges to any Fortran compiler 1537that uses run-time interfaces, and a run-time library, roughly similar 1538to those used by @code{g77}: 1539 1540@itemize @bullet 1541@item 1542Assuming the library routine that supports @code{SYSTEM_CLOCK} 1543expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument, 1544the compiler must make available to it a temporary variable of that type. 1545 1546@item 1547Further, after the @code{SYSTEM_CLOCK} library routine returns, 1548the compiler must ensure that the temporary variable it wrote 1549is copied into the appropriate element of the @samp{CLOCKS} array. 1550(This assumes the compiler doesn't just reject the code, 1551which it should if it is compiling under some kind of a ``strict'' option.) 1552 1553@item 1554To determine the correct index into the @samp{CLOCKS} array, 1555(putting aside the fact that the index, in this particular case, 1556need not be computed until after 1557the @code{SYSTEM_CLOCK} library routine returns), 1558the compiler must ensure that the @code{IFUNC} function is called. 1559 1560That requires evaluating its argument, 1561which requires, for @code{g77} 1562(assuming @code{-ff2c} is in force), 1563reserving a temporary variable of type @code{COMPLEX} 1564for use as a repository for the return value 1565being computed by @samp{CFUNC}. 1566 1567@item 1568Before invoking @samp{CFUNC}, 1569is argument must be evaluated, 1570which requires allocating, at run time, 1571a temporary large enough to hold the result of the concatenation, 1572as well as actually performing the concatenation. 1573 1574@item 1575The large temporary needed during invocation of @code{CFUNC} 1576should, ideally, be deallocated 1577(or, at least, left to the GBE to dispose of, as it sees fit) 1578as soon as @code{CFUNC} returns, 1579which means before @code{IFUNC} is called 1580(as it might need a lot of dynamically allocated memory). 1581@end itemize 1582 1583@code{g77} currently doesn't support all of the above, 1584but, so that it might someday, it has evolved to handle 1585at least some of the above requirements. 1586 1587Meeting the above requirements is made more challenging 1588by conforming to the requirements of the GBEL/GBE combination. 1589 1590@node Transforming Statements 1591@section Transforming Statements 1592 1593Most Fortran statements are given their own block, 1594and, for temporary variables they might need, their own scope. 1595(A block is what distinguishes @samp{@{ foo (); @}} 1596from just @samp{foo ();} in C. 1597A scope is included with every such block, 1598providing a distinct name space for local variables.) 1599 1600Label definitions for the statement precede this block, 1601so @samp{10 PRINT *, I} is handled more like 1602@samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}} 1603(where @samp{fl10} is just a notation meaning ``Fortran Label 10'' 1604for the purposes of this document). 1605 1606@menu 1607* Statements Needing Temporaries:: 1608* Transforming DO WHILE:: 1609* Transforming Iterative DO:: 1610* Transforming Block IF:: 1611* Transforming SELECT CASE:: 1612@end menu 1613 1614@node Statements Needing Temporaries 1615@subsection Statements Needing Temporaries 1616 1617Any temporaries needed during, but not beyond, 1618execution of a Fortran statement, 1619are made local to the scope of that statement's block. 1620 1621This allows the GBE to share storage for these temporaries 1622among the various statements without the FFE 1623having to manage that itself. 1624 1625(The GBE could, of course, decide to optimize 1626management of these temporaries. 1627For example, it could, theoretically, 1628schedule some of the computations involving these temporaries 1629to occur in parallel. 1630More practically, it might leave the storage for some temporaries 1631``live'' beyond their scopes, to reduce the number of 1632manipulations of the stack pointer at run time.) 1633 1634Temporaries needed across distinct statement boundaries usually 1635are associated with Fortran blocks (such as @code{DO}/@code{END DO}). 1636(Also, there might be temporaries not associated with blocks at all---these 1637would be in the scope of the entire program unit.) 1638 1639Each Fortran block @emph{should} get its own block/scope in the GBE. 1640This is best, because it allows temporaries to be more naturally handled. 1641However, it might pose problems when handling labels 1642(in particular, when they're the targets of @code{GOTO}s outside the Fortran 1643block), and generally just hassling with replicating 1644parts of the @code{gcc} front end 1645(because the FFE needs to support 1646an arbitrary number of nested back-end blocks 1647if each Fortran block gets one). 1648 1649So, there might still be a need for top-level temporaries, whose 1650``owning'' scope is that of the containing procedure. 1651 1652Also, there seems to be problems declaring new variables after 1653generating code (within a block) in the back end, leading to, e.g., 1654@samp{label not defined before binding contour} or similar messages, 1655when compiling with @samp{-fstack-check} or 1656when compiling for certain targets. 1657 1658Because of that, and because sometimes these temporaries are not 1659discovered until in the middle of of generating code for an expression 1660statement (as in the case of the optimization for @samp{X**I}), 1661it seems best to always 1662pre-scan all the expressions that'll be expanded for a block 1663before generating any of the code for that block. 1664 1665This pre-scan then handles discovering and declaring, to the back end, 1666the temporaries needed for that block. 1667 1668It's also important to treat distinct items in an I/O list as distinct 1669statements deserving their own blocks. 1670That's because there's a requirement 1671that each I/O item be fully processed before the next one, 1672which matters in cases like @samp{READ (*,*), I, A(I)}---the 1673element of @samp{A} read in the second item 1674@emph{must} be determined from the value 1675of @samp{I} read in the first item. 1676 1677@node Transforming DO WHILE 1678@subsection Transforming DO WHILE 1679 1680@samp{DO WHILE(expr)} @emph{must} be implemented 1681so that temporaries needed to evaluate @samp{expr} 1682are generated just for the test, each time. 1683 1684Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed: 1685 1686@smallexample 1687for (;;) 1688 @{ 1689 int temp0; 1690 1691 @{ 1692 char temp1[large]; 1693 1694 libg77_catenate (temp1, a, b); 1695 temp0 = libg77_ne (temp1, 'END'); 1696 @} 1697 1698 if (! temp0) 1699 break; 1700 1701 @dots{} 1702 @} 1703@end smallexample 1704 1705In this case, it seems like a time/space tradeoff 1706between allocating and deallocating @samp{temp1} for each iteration 1707and allocating it just once for the entire loop. 1708 1709However, if @samp{temp1} is allocated just once for the entire loop, 1710it could be the wrong size for subsequent iterations of that loop 1711in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')}, 1712because the body of the loop might modify @samp{I} or @samp{J}. 1713 1714So, the above implementation is used, 1715though a more optimal one can be used 1716in specific circumstances. 1717 1718@node Transforming Iterative DO 1719@subsection Transforming Iterative DO 1720 1721An iterative @code{DO} loop 1722(one that specifies an iteration variable) 1723is required by the Fortran standards 1724to be implemented as though an iteration count 1725is computed before entering the loop body, 1726and that iteration count used to determine 1727the number of times the loop body is to be performed 1728(assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}). 1729 1730The FFE handles this by allocating a temporary variable 1731to contain the computed number of iterations. 1732Since this variable must be in a scope that includes the entire loop, 1733a GBEL block is created for that loop, 1734and the variable declared as belonging to the scope of that block. 1735 1736@node Transforming Block IF 1737@subsection Transforming Block IF 1738 1739Consider: 1740 1741@smallexample 1742SUBROUTINE X(A,B,C) 1743CHARACTER*(*) A, B, C 1744LOGICAL LFUNC 1745 1746IF (LFUNC (A//B)) THEN 1747 CALL SUBR1 1748ELSE IF (LFUNC (A//C)) THEN 1749 CALL SUBR2 1750ELSE 1751 CALL SUBR3 1752END 1753@end smallexample 1754 1755The arguments to the two calls to @samp{LFUNC} 1756require dynamic allocation (at run time), 1757but are not required during execution of the @code{CALL} statements. 1758 1759So, the scopes of those temporaries must be within blocks inside 1760the block corresponding to the Fortran @code{IF} block. 1761 1762This cannot be represented ``naturally'' 1763in vanilla C, nor in GBEL. 1764The @code{if}, @code{elseif}, @code{else}, 1765and @code{endif} constructs 1766provided by both languages must, 1767for a given @code{if} block, 1768share the same C/GBE block. 1769 1770Therefore, any temporaries needed during evaluation of @samp{expr} 1771while executing @samp{ELSE IF(expr)} 1772must either have been predeclared 1773at the top of the corresponding @code{IF} block, 1774or declared within a new block for that @code{ELSE IF}---a block that, 1775since it cannot contain the @code{else} or @code{else if} itself 1776(due to the above requirement), 1777actually implements the rest of the @code{IF} block's 1778@code{ELSE IF} and @code{ELSE} statements 1779within an inner block. 1780 1781The FFE takes the latter approach. 1782 1783@node Transforming SELECT CASE 1784@subsection Transforming SELECT CASE 1785 1786@code{SELECT CASE} poses a few interesting problems for code generation, 1787if efficiency and frugal stack management are important. 1788 1789Consider @samp{SELECT CASE (I('PREFIX'//A))}, 1790where @samp{A} is @code{CHARACTER*(*)}. 1791In a case like this---basically, 1792in any case where largish temporaries are needed 1793to evaluate the expression---those temporaries should 1794not be ``live'' during execution of any of the @code{CASE} blocks. 1795 1796So, evaluation of the expression is best done within its own block, 1797which in turn is within the @code{SELECT CASE} block itself 1798(which contains the code for the CASE blocks as well, 1799though each within their own block). 1800 1801Otherwise, we'd have the rough equivalent of this pseudo-code: 1802 1803@smallexample 1804@{ 1805 char temp[large]; 1806 1807 libg77_catenate (temp, 'prefix', a); 1808 1809 switch (i (temp)) 1810 @{ 1811 case 0: 1812 @dots{} 1813 @} 1814@} 1815@end smallexample 1816 1817And that would leave temp[large] in scope during the CASE blocks 1818(although a clever back end *could* see that it isn't referenced 1819in them, and thus free that temp before executing the blocks). 1820 1821So this approach is used instead: 1822 1823@smallexample 1824@{ 1825 int temp0; 1826 1827 @{ 1828 char temp1[large]; 1829 1830 libg77_catenate (temp1, 'prefix', a); 1831 temp0 = i (temp1); 1832 @} 1833 1834 switch (temp0) 1835 @{ 1836 case 0: 1837 @dots{} 1838 @} 1839@} 1840@end smallexample 1841 1842Note how @samp{temp1} goes out of scope before starting the switch, 1843thus making it easy for a back end to free it. 1844 1845The problem @emph{that} solution has, however, 1846is with @samp{SELECT CASE('prefix'//A)} 1847(which is currently not supported). 1848 1849Unless the GBEL is extended to support arbitrarily long character strings 1850in its @code{case} facility, 1851the FFE has to implement @code{SELECT CASE} on @code{CHARACTER} 1852(probably excepting @code{CHARACTER*1}) 1853using a cascade of 1854@code{if}, @code{elseif}, @code{else}, and @code{endif} constructs 1855in GBEL. 1856 1857To prevent the (potentially large) temporary, 1858needed to hold the selected expression itself (@samp{'prefix'//A}), 1859from being in scope during execution of the @code{CASE} blocks, 1860two approaches are available: 1861 1862@itemize @bullet 1863@item 1864Pre-evaluate all the @code{CASE} tests, 1865producing an integer ordinal that is used, 1866a la @samp{temp0} in the earlier example, 1867as if @samp{SELECT CASE(temp0)} had been written. 1868 1869Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})}, 1870where @var{i} is the ordinal for that case, 1871determined while, or before, 1872generating the cascade of @code{if}-related constructs 1873to cope with @code{CHARACTER} selection. 1874 1875@item 1876Make @samp{temp0} above just 1877large enough to hold the longest @code{CASE} string 1878that'll actually be compared against the expression 1879(in this case, @samp{'prefix'//A}). 1880 1881Since that length must be constant 1882(because @code{CASE} expressions are all constant), 1883it won't be so large, 1884and, further, @samp{temp1} need not be dynamically allocated, 1885since normal @code{CHARACTER} assignment can be used 1886into the fixed-length @samp{temp0}. 1887@end itemize 1888 1889Both of these solutions require @code{SELECT CASE} implementation 1890to be changed so all the corresponding @code{CASE} statements 1891are seen during the actual code generation for @code{SELECT CASE}. 1892 1893@node Transforming Expressions 1894@section Transforming Expressions 1895 1896The interactions between statements, expressions, and subexpressions 1897at program run time can be viewed as: 1898 1899@smallexample 1900@var{action}(@var{expr}) 1901@end smallexample 1902 1903Here, @var{action} is the series of steps 1904performed to effect the statement, 1905and @var{expr} is the expression 1906whose value is used by @var{action}. 1907 1908Expanding the above shows a typical order of events at run time: 1909 1910@smallexample 1911Evaluate @var{expr} 1912Perform @var{action}, using result of evaluation of @var{expr} 1913Clean up after evaluating @var{expr} 1914@end smallexample 1915 1916So, if evaluating @var{expr} requires allocating memory, 1917that memory can be freed before performing @var{action} 1918only if it is not needed to hold the result of evaluating @var{expr}. 1919Otherwise, it must be freed no sooner than 1920after @var{action} has been performed. 1921 1922The above are recursive definitions, 1923in the sense that they apply to subexpressions of @var{expr}. 1924 1925That is, evaluating @var{expr} involves 1926evaluating all of its subexpressions, 1927performing the @var{action} that computes the 1928result value of @var{expr}, 1929then cleaning up after evaluating those subexpressions. 1930 1931The recursive nature of this evaluation is implemented 1932via recursive-descent transformation of the top-level statements, 1933their expressions, @emph{their} subexpressions, and so on. 1934 1935However, that recursive-descent transformation is, 1936due to the nature of the GBEL, 1937focused primarily on generating a @emph{single} stream of code 1938to be executed at run time. 1939 1940Yet, from the above, it's clear that multiple streams of code 1941must effectively be simultaneously generated 1942during the recursive-descent analysis of statements. 1943 1944The primary stream implements the primary @var{action} items, 1945while at least two other streams implement 1946the evaluation and clean-up items. 1947 1948Requirements imposed by expressions include: 1949 1950@itemize @bullet 1951@item 1952Whether the caller needs to have a temporary ready 1953to hold the value of the expression. 1954 1955@item 1956Other stuff??? 1957@end itemize 1958 1959@node Internal Naming Conventions 1960@section Internal Naming Conventions 1961 1962Names exported by FFE modules have the following (regular-expression) forms. 1963Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}}, 1964where @var{mod} is lowercase or uppercase alphanumerics, respectively, 1965are exported by the module @code{ffe@var{mod}}, 1966with the source code doing the exporting in @file{@var{mod}.h}. 1967(Usually, the source code for the implementation is in @file{@var{mod}.c}.) 1968 1969Identifiers that don't fit the following forms 1970are not considered exported, 1971even if they are according to the C language. 1972(For example, they might be made available to other modules 1973solely for use within expansions of exported macros, 1974not for use within any source code in those other modules.) 1975 1976@table @code 1977@item ffe@var{mod} 1978The single typedef exported by the module. 1979 1980@item FFE@var{umod}_[A-Z][A-Z0-9_]* 1981(Where @var{umod} is the uppercase for of @var{mod}.) 1982 1983A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}. 1984 1985@item ffe@var{mod}[A-Z][A-Z][a-z0-9]* 1986A typedef exported by the module. 1987 1988The portion of the identifier after @code{ffe@var{mod}} is 1989referred to as @code{ctype}, a capitalized (mixed-case) form 1990of @code{type}. 1991 1992@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]? 1993(Where @var{umod} is the uppercase for of @var{mod}.) 1994 1995A @code{#define} or @code{enum} constant of the type 1996@code{ffe@var{mod}@var{type}}, 1997where @var{type} is the lowercase form of @var{ctype} 1998in an exported typedef. 1999 2000@item ffe@var{mod}_@var{value} 2001A function that does or returns something, 2002as described by @var{value} (see below). 2003 2004@item ffe@var{mod}_@var{value}_@var{input} 2005A function that does or returns something based 2006primarily on the thing described by @var{input} (see below). 2007@end table 2008 2009Below are names used for @var{value} and @var{input}, 2010along with their definitions. 2011 2012@table @code 2013@item col 2014A column number within a line (first column is number 1). 2015 2016@item file 2017An encapsulation of a file's name. 2018 2019@item find 2020Looks up an instance of some type that matches specified criteria, 2021and returns that, even if it has to create a new instance or 2022crash trying to find it (as appropriate). 2023 2024@item initialize 2025Initializes, usually a module. No type. 2026 2027@item int 2028A generic integer of type @code{int}. 2029 2030@item is 2031A generic integer that contains a true (nonzero) or false (zero) value. 2032 2033@item len 2034A generic integer that contains the length of something. 2035 2036@item line 2037A line number within a source file, 2038or a global line number. 2039 2040@item lookup 2041Looks up an instance of some type that matches specified criteria, 2042and returns that, or returns nil. 2043 2044@item name 2045A @code{text} that points to a name of something. 2046 2047@item new 2048Makes a new instance of the indicated type. 2049Might return an existing one if appropriate---if so, 2050similar to @code{find} without crashing. 2051 2052@item pt 2053Pointer to a particular character (line, column pairs) 2054in the input file (source code being compiled). 2055 2056@item run 2057Performs some herculean task. No type. 2058 2059@item terminate 2060Terminates, usually a module. No type. 2061 2062@item text 2063A @code{char *} that points to generic text. 2064@end table 2065