1*c87b03e5Sespie@c Copyright (C) 1999 Free Software Foundation, Inc. 2*c87b03e5Sespie@c This is part of the G77 manual. 3*c87b03e5Sespie@c For copying conditions, see the file g77.texi. 4*c87b03e5Sespie 5*c87b03e5Sespie@node Front End 6*c87b03e5Sespie@chapter Front End 7*c87b03e5Sespie@cindex GNU Fortran Front End (FFE) 8*c87b03e5Sespie@cindex FFE 9*c87b03e5Sespie@cindex @code{g77}, front end 10*c87b03e5Sespie@cindex front end, @code{g77} 11*c87b03e5Sespie 12*c87b03e5SespieThis chapter describes some aspects of the design and implementation 13*c87b03e5Sespieof the @code{g77} front end. 14*c87b03e5Sespie 15*c87b03e5SespieTo find about things that are ``To Be Determined'' or ``To Be Done'', 16*c87b03e5Sespiesearch for the string TBD. 17*c87b03e5SespieIf you want to help by working on one or more of these items, 18*c87b03e5Sespieemail @email{gcc@@gcc.gnu.org}. 19*c87b03e5SespieIf you're planning to do more than just research issues and offer comments, 20*c87b03e5Sespiesee @uref{http://gcc.gnu.org/contribute.html} for steps you might 21*c87b03e5Sespieneed to take first. 22*c87b03e5Sespie 23*c87b03e5Sespie@menu 24*c87b03e5Sespie* Overview of Sources:: 25*c87b03e5Sespie* Overview of Translation Process:: 26*c87b03e5Sespie* Philosophy of Code Generation:: 27*c87b03e5Sespie* Two-pass Design:: 28*c87b03e5Sespie* Challenges Posed:: 29*c87b03e5Sespie* Transforming Statements:: 30*c87b03e5Sespie* Transforming Expressions:: 31*c87b03e5Sespie* Internal Naming Conventions:: 32*c87b03e5Sespie@end menu 33*c87b03e5Sespie 34*c87b03e5Sespie@node Overview of Sources 35*c87b03e5Sespie@section Overview of Sources 36*c87b03e5Sespie 37*c87b03e5SespieThe current directory layout includes the following: 38*c87b03e5Sespie 39*c87b03e5Sespie@table @file 40*c87b03e5Sespie@item @value{srcdir}/gcc/ 41*c87b03e5SespieNon-g77 files in gcc 42*c87b03e5Sespie 43*c87b03e5Sespie@item @value{srcdir}/gcc/f/ 44*c87b03e5SespieGNU Fortran front end sources 45*c87b03e5Sespie 46*c87b03e5Sespie@item @value{srcdir}/libf2c/ 47*c87b03e5Sespie@code{libg2c} configuration and @code{g2c.h} file generation 48*c87b03e5Sespie 49*c87b03e5Sespie@item @value{srcdir}/libf2c/libF77/ 50*c87b03e5SespieGeneral support and math portion of @code{libg2c} 51*c87b03e5Sespie 52*c87b03e5Sespie@item @value{srcdir}/libf2c/libI77/ 53*c87b03e5SespieI/O portion of @code{libg2c} 54*c87b03e5Sespie 55*c87b03e5Sespie@item @value{srcdir}/libf2c/libU77/ 56*c87b03e5SespieAdditional interfaces to Unix @code{libc} for @code{libg2c} 57*c87b03e5Sespie@end table 58*c87b03e5Sespie 59*c87b03e5SespieComponents of note in @code{g77} are described below. 60*c87b03e5Sespie 61*c87b03e5Sespie@file{f/} as a whole contains the source for @code{g77}, 62*c87b03e5Sespiewhile @file{libf2c/} contains a portion of the separate program 63*c87b03e5Sespie@code{f2c}. 64*c87b03e5SespieNote that the @code{libf2c} code is not part of the program @code{g77}, 65*c87b03e5Sespiejust distributed with it. 66*c87b03e5Sespie 67*c87b03e5Sespie@file{f/} contains text files that document the Fortran compiler, source 68*c87b03e5Sespiefiles for the GNU Fortran Front End (FFE), and some other stuff. 69*c87b03e5SespieThe @code{g77} compiler code is placed in @file{f/} because it, 70*c87b03e5Sespiealong with its contents, 71*c87b03e5Sespieis designed to be a subdirectory of a @code{gcc} source directory, 72*c87b03e5Sespie@file{gcc/}, 73*c87b03e5Sespiewhich is structured so that language-specific front ends can be ``dropped 74*c87b03e5Sespiein'' as subdirectories. 75*c87b03e5SespieThe C++ front end (@code{g++}), is an example of this---it resides in 76*c87b03e5Sespiethe @file{cp/} subdirectory. 77*c87b03e5SespieNote that the C front end (also referred to as @code{gcc}) 78*c87b03e5Sespieis an exception to this, as its source files reside 79*c87b03e5Sespiein the @file{gcc/} directory itself. 80*c87b03e5Sespie 81*c87b03e5Sespie@file{libf2c/} contains the run-time libraries for the @code{f2c} program, 82*c87b03e5Sespiealso used by @code{g77}. 83*c87b03e5SespieThese libraries normally referred to collectively as @code{libf2c}. 84*c87b03e5SespieWhen built as part of @code{g77}, 85*c87b03e5Sespie@code{libf2c} is installed under the name @code{libg2c} to avoid 86*c87b03e5Sespieconflict with any existing version of @code{libf2c}, 87*c87b03e5Sespieand thus is often referred to as @code{libg2c} when the 88*c87b03e5Sespie@code{g77} version is specifically being referred to. 89*c87b03e5Sespie 90*c87b03e5SespieThe @code{netlib} version of @code{libf2c/} 91*c87b03e5Sespiecontains two distinct libraries, 92*c87b03e5Sespie@code{libF77} and @code{libI77}, 93*c87b03e5Sespieeach in their own subdirectories. 94*c87b03e5SespieIn @code{g77}, this distinction is not made, 95*c87b03e5Sespiebeyond maintaining the subdirectory structure in the source-code tree. 96*c87b03e5Sespie 97*c87b03e5Sespie@file{libf2c/} is not part of the program @code{g77}, 98*c87b03e5Sespiejust distributed with it. 99*c87b03e5SespieIt contains files not present 100*c87b03e5Sespiein the official (@code{netlib}) version of @code{libf2c}, 101*c87b03e5Sespieand also contains some minor changes made from @code{libf2c}, 102*c87b03e5Sespieto fix some bugs, 103*c87b03e5Sespieand to facilitate automatic configuration, building, and installation of 104*c87b03e5Sespie@code{libf2c} (as @code{libg2c}) for use by @code{g77} users. 105*c87b03e5SespieSee @file{libf2c/README} for more information, 106*c87b03e5Sespieincluding licensing conditions 107*c87b03e5Sespiegoverning distribution of programs containing code from @code{libg2c}. 108*c87b03e5Sespie 109*c87b03e5Sespie@code{libg2c}, @code{g77}'s version of @code{libf2c}, 110*c87b03e5Sespieadds Dave Love's implementation of @code{libU77}, 111*c87b03e5Sespiein the @file{libf2c/libU77/} directory. 112*c87b03e5SespieThis library is distributed under the 113*c87b03e5SespieGNU Library General Public License (LGPL)---see the 114*c87b03e5Sespiefile @file{libf2c/libU77/COPYING.LIB} 115*c87b03e5Sespiefor more information, 116*c87b03e5Sespieas this license 117*c87b03e5Sespiegoverns distribution conditions for programs containing code 118*c87b03e5Sespiefrom this portion of the library. 119*c87b03e5Sespie 120*c87b03e5SespieFiles of note in @file{f/} and @file{libf2c/} are described below: 121*c87b03e5Sespie 122*c87b03e5Sespie@table @file 123*c87b03e5Sespie@item f/BUGS 124*c87b03e5SespieLists some important bugs known to be in g77. 125*c87b03e5SespieOr use Info (or GNU Emacs Info mode) to read 126*c87b03e5Sespiethe ``Actual Bugs'' node of the @code{g77} documentation: 127*c87b03e5Sespie 128*c87b03e5Sespie@smallexample 129*c87b03e5Sespieinfo -f f/g77.info -n "Actual Bugs" 130*c87b03e5Sespie@end smallexample 131*c87b03e5Sespie 132*c87b03e5Sespie@item f/ChangeLog 133*c87b03e5SespieLists recent changes to @code{g77} internals. 134*c87b03e5Sespie 135*c87b03e5Sespie@item libf2c/ChangeLog 136*c87b03e5SespieLists recent changes to @code{libg2c} internals. 137*c87b03e5Sespie 138*c87b03e5Sespie@item f/NEWS 139*c87b03e5SespieContains the per-release changes. 140*c87b03e5SespieThese include the user-visible 141*c87b03e5Sespiechanges described in the node ``Changes'' 142*c87b03e5Sespiein the @code{g77} documentation, plus internal 143*c87b03e5Sespiechanges of import. 144*c87b03e5SespieOr use: 145*c87b03e5Sespie 146*c87b03e5Sespie@smallexample 147*c87b03e5Sespieinfo -f f/g77.info -n News 148*c87b03e5Sespie@end smallexample 149*c87b03e5Sespie 150*c87b03e5Sespie@item f/g77.info* 151*c87b03e5SespieThe @code{g77} documentation, in Info format, 152*c87b03e5Sespieproduced by building @code{g77}. 153*c87b03e5Sespie 154*c87b03e5SespieAll users of @code{g77} (not just installers) should read this, 155*c87b03e5Sespieusing the @code{more} command if neither the @code{info} command, 156*c87b03e5Sespienor GNU Emacs (with its Info mode), are available, or if users 157*c87b03e5Sespiearen't yet accustomed to using these tools. 158*c87b03e5SespieAll of these files are readable as ``plain text'' files, 159*c87b03e5Sespiethough they're easier to navigate using Info readers 160*c87b03e5Sespiesuch as @code{info} and GNU Emacs Info mode. 161*c87b03e5Sespie@end table 162*c87b03e5Sespie 163*c87b03e5SespieIf you want to explore the FFE code, which lives entirely in @file{f/}, 164*c87b03e5Sespiehere are a few clues. 165*c87b03e5SespieThe file @file{g77spec.c} contains the @code{g77}-specific source code 166*c87b03e5Sespiefor the @code{g77} command only---this just forms a variant of the 167*c87b03e5Sespie@code{gcc} command, so, 168*c87b03e5Sespiejust as the @code{gcc} command itself does not contain the C front end, 169*c87b03e5Sespiethe @code{g77} command does not contain the Fortran front end (FFE). 170*c87b03e5SespieThe FFE code ends up in an executable named @file{f771}, 171*c87b03e5Sespiewhich does the actual compiling, 172*c87b03e5Sespieso it contains the FFE plus the @code{gcc} back end (GBE), 173*c87b03e5Sespiethe latter to do most of the optimization, and the code generation. 174*c87b03e5Sespie 175*c87b03e5SespieThe file @file{parse.c} is the source file for @code{yyparse()}, 176*c87b03e5Sespiewhich is invoked by the GBE to start the compilation process, 177*c87b03e5Sespiefor @file{f771}. 178*c87b03e5Sespie 179*c87b03e5SespieThe file @file{top.c} contains the top-level FFE function @code{ffe_file} 180*c87b03e5Sespieand it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*}, 181*c87b03e5Sespieand @samp{FFE_[A-Za-z].*} symbols. 182*c87b03e5Sespie 183*c87b03e5SespieThe file @file{fini.c} is a @code{main()} program that is used when building 184*c87b03e5Sespiethe FFE to generate C header and source files for recognizing keywords. 185*c87b03e5SespieThe files @file{malloc.c} and @file{malloc.h} comprise a memory manager 186*c87b03e5Sespiethat defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and 187*c87b03e5Sespie@samp{MALLOC_[A-Za-z].*} symbols. 188*c87b03e5Sespie 189*c87b03e5SespieAll other modules named @var{xyz} 190*c87b03e5Sespieare comprised of all files named @samp{@var{xyz}*.@var{ext}} 191*c87b03e5Sespieand define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*}, 192*c87b03e5Sespieand @samp{FFE@var{XYZ}_[A-Za-z].*} symbols. 193*c87b03e5SespieIf you understand all this, congratulations---it's easier for me to remember 194*c87b03e5Sespiehow it works than to type in these regular expressions. 195*c87b03e5SespieBut it does make it easy to find where a symbol is defined. 196*c87b03e5SespieFor example, the symbol @samp{ffexyz_set_something} would be defined 197*c87b03e5Sespiein @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}. 198*c87b03e5Sespie 199*c87b03e5SespieThe ``porting'' files of note currently are: 200*c87b03e5Sespie 201*c87b03e5Sespie@table @file 202*c87b03e5Sespie@item proj.c 203*c87b03e5Sespie@itemx proj.h 204*c87b03e5SespieThis defines the ``language'' used by all the other source files, 205*c87b03e5Sespiethe language being Standard C plus some useful things 206*c87b03e5Sespielike @code{ARRAY_SIZE} and such. 207*c87b03e5Sespie 208*c87b03e5Sespie@item target.c 209*c87b03e5Sespie@itemx target.h 210*c87b03e5SespieThese describe the target machine 211*c87b03e5Sespiein terms of what data types are supported, 212*c87b03e5Sespiehow they are denoted 213*c87b03e5Sespie(to what C type does an @code{INTEGER*8} map, for example), 214*c87b03e5Sespiehow to convert between them, 215*c87b03e5Sespieand so on. 216*c87b03e5SespieOver time, versions of @code{g77} rely less on this file 217*c87b03e5Sespieand more on run-time configuration based on GBE info 218*c87b03e5Sespiein @file{com.c}. 219*c87b03e5Sespie 220*c87b03e5Sespie@item com.c 221*c87b03e5Sespie@itemx com.h 222*c87b03e5SespieThese are the primary interface to the GBE. 223*c87b03e5Sespie 224*c87b03e5Sespie@item ste.c 225*c87b03e5Sespie@itemx ste.h 226*c87b03e5SespieThis contains code for implementing recognized executable statements 227*c87b03e5Sespiein the GBE. 228*c87b03e5Sespie 229*c87b03e5Sespie@item src.c 230*c87b03e5Sespie@itemx src.h 231*c87b03e5SespieThese contain information on the format(s) of source files 232*c87b03e5Sespie(such as whether they are never to be processed as case-insensitive 233*c87b03e5Sespiewith regard to Fortran keywords). 234*c87b03e5Sespie@end table 235*c87b03e5Sespie 236*c87b03e5SespieIf you want to debug the @file{f771} executable, 237*c87b03e5Sespiefor example if it crashes, 238*c87b03e5Sespienote that the global variables @code{lineno} and @code{input_filename} 239*c87b03e5Sespieare usually set to reflect the current line being read by the lexer 240*c87b03e5Sespieduring the first-pass analysis of a program unit and to reflect 241*c87b03e5Sespiethe current line being processed during the second-pass compilation 242*c87b03e5Sespieof a program unit. 243*c87b03e5Sespie 244*c87b03e5SespieIf an invocation of the function @code{ffestd_exec_end} is on the stack, 245*c87b03e5Sespiethe compiler is in the second pass, otherwise it is in the first. 246*c87b03e5Sespie 247*c87b03e5Sespie(This information might help you reduce a test case and/or work around 248*c87b03e5Sespiea bug in @code{g77} until a fix is available.) 249*c87b03e5Sespie 250*c87b03e5Sespie@node Overview of Translation Process 251*c87b03e5Sespie@section Overview of Translation Process 252*c87b03e5Sespie 253*c87b03e5SespieThe order of phases translating source code to the form accepted 254*c87b03e5Sespieby the GBE is: 255*c87b03e5Sespie 256*c87b03e5Sespie@enumerate 257*c87b03e5Sespie@item 258*c87b03e5SespieStripping punched-card sources (@file{g77stripcard.c}) 259*c87b03e5Sespie 260*c87b03e5Sespie@item 261*c87b03e5SespieLexing (@file{lex.c}) 262*c87b03e5Sespie 263*c87b03e5Sespie@item 264*c87b03e5SespieStand-alone statement identification (@file{sta.c}) 265*c87b03e5Sespie 266*c87b03e5Sespie@item 267*c87b03e5SespieINCLUDE handling (@file{sti.c}) 268*c87b03e5Sespie 269*c87b03e5Sespie@item 270*c87b03e5SespieOrder-dependent statement identification (@file{stq.c}) 271*c87b03e5Sespie 272*c87b03e5Sespie@item 273*c87b03e5SespieParsing (@file{stb.c} and @file{expr.c}) 274*c87b03e5Sespie 275*c87b03e5Sespie@item 276*c87b03e5SespieConstructing (@file{stc.c}) 277*c87b03e5Sespie 278*c87b03e5Sespie@item 279*c87b03e5SespieCollecting (@file{std.c}) 280*c87b03e5Sespie 281*c87b03e5Sespie@item 282*c87b03e5SespieExpanding (@file{ste.c}) 283*c87b03e5Sespie@end enumerate 284*c87b03e5Sespie 285*c87b03e5SespieTo get a rough idea of how a particularly twisted Fortran statement 286*c87b03e5Sespiegets treated by the passes, consider: 287*c87b03e5Sespie 288*c87b03e5Sespie@smallexample 289*c87b03e5Sespie FORMAT(I2 4H)=(J/ 290*c87b03e5Sespie & I3) 291*c87b03e5Sespie@end smallexample 292*c87b03e5Sespie 293*c87b03e5SespieThe job of @file{lex.c} is to know enough about Fortran syntax rules 294*c87b03e5Sespieto break the statement up into distinct lexemes without requiring 295*c87b03e5Sespieany feedback from subsequent phases: 296*c87b03e5Sespie 297*c87b03e5Sespie@smallexample 298*c87b03e5Sespie`FORMAT' 299*c87b03e5Sespie`(' 300*c87b03e5Sespie`I24H' 301*c87b03e5Sespie`)' 302*c87b03e5Sespie`=' 303*c87b03e5Sespie`(' 304*c87b03e5Sespie`J' 305*c87b03e5Sespie`/' 306*c87b03e5Sespie`I3' 307*c87b03e5Sespie`)' 308*c87b03e5Sespie@end smallexample 309*c87b03e5Sespie 310*c87b03e5SespieThe job of @file{sta.c} is to figure out the kind of statement, 311*c87b03e5Sespieor, at least, statement form, that sequence of lexemes represent. 312*c87b03e5Sespie 313*c87b03e5SespieThe sooner it can do this (in terms of using the smallest number of 314*c87b03e5Sespielexemes, starting with the first for each statement), the better, 315*c87b03e5Sespiebecause that leaves diagnostics for problems beyond the recognition 316*c87b03e5Sespieof the statement form to subsequent phases, 317*c87b03e5Sespiewhich can usually better describe the nature of the problem. 318*c87b03e5Sespie 319*c87b03e5SespieIn this case, the @samp{=} at ``level zero'' 320*c87b03e5Sespie(not nested within parentheses) 321*c87b03e5Sespietells @file{sta.c} that this is an @emph{assignment-form}, 322*c87b03e5Sespienot @code{FORMAT}, statement. 323*c87b03e5Sespie 324*c87b03e5SespieAn assignment-form statement might be a statement-function 325*c87b03e5Sespiedefinition or an executable assignment statement. 326*c87b03e5Sespie 327*c87b03e5SespieTo make that determination, 328*c87b03e5Sespie@file{sta.c} looks at the first two lexemes. 329*c87b03e5Sespie 330*c87b03e5SespieSince the second lexeme is @samp{(}, 331*c87b03e5Sespiethe first must represent an array for this to be an assignment statement, 332*c87b03e5Sespieelse it's a statement function. 333*c87b03e5Sespie 334*c87b03e5SespieEither way, @file{sta.c} hands off the statement to @file{stq.c} 335*c87b03e5Sespie(via @file{sti.c}, which expands INCLUDE files). 336*c87b03e5Sespie@file{stq.c} figures out what a statement that is, 337*c87b03e5Sespieon its own, ambiguous, must actually be based on the context 338*c87b03e5Sespieestablished by previous statements. 339*c87b03e5Sespie 340*c87b03e5SespieSo, @file{stq.c} watches the statement stream for executable statements, 341*c87b03e5SespieEND statements, and so on, so it knows whether @samp{A(B)=C} is 342*c87b03e5Sespie(intended as) a statement-function definition or an assignment statement. 343*c87b03e5Sespie 344*c87b03e5SespieAfter establishing the context-aware statement info, @file{stq.c} 345*c87b03e5Sespiepasses the original sample statement on to @file{stb.c} 346*c87b03e5Sespie(either its statement-function parser or its assignment-statement parser). 347*c87b03e5Sespie 348*c87b03e5Sespie@file{stb.c} forms a 349*c87b03e5Sespiestatement-specific record containing the pertinent information. 350*c87b03e5SespieThat information includes a source expression and, 351*c87b03e5Sespiefor an assignment statement, a destination expression. 352*c87b03e5SespieExpressions are parsed by @file{expr.c}. 353*c87b03e5Sespie 354*c87b03e5SespieThis record is passed to @file{stc.c}, 355*c87b03e5Sespiewhich copes with the implications of the statement 356*c87b03e5Sespiewithin the context established by previous statements. 357*c87b03e5Sespie 358*c87b03e5SespieFor example, if it's the first statement in the file 359*c87b03e5Sespieor after an @code{END} statement, 360*c87b03e5Sespie@file{stc.c} recognizes that, first of all, 361*c87b03e5Sespiea main program unit is now being lexed 362*c87b03e5Sespie(and tells that to @file{std.c} 363*c87b03e5Sespiebefore telling it about the current statement). 364*c87b03e5Sespie 365*c87b03e5Sespie@file{stc.c} attaches whatever information it can, 366*c87b03e5Sespieusually derived from the context established by the preceding statements, 367*c87b03e5Sespieand passes the information to @file{std.c}. 368*c87b03e5Sespie 369*c87b03e5Sespie@file{std.c} saves this information away, 370*c87b03e5Sespiesince the GBE cannot cope with information 371*c87b03e5Sespiethat might be incomplete at this stage. 372*c87b03e5Sespie 373*c87b03e5SespieFor example, @samp{I3} might later be determined 374*c87b03e5Sespieto be an argument to an alternate @code{ENTRY} point. 375*c87b03e5Sespie 376*c87b03e5SespieWhen @file{std.c} is told about the end of an external (top-level) 377*c87b03e5Sespieprogram unit, 378*c87b03e5Sespieit passes all the information it has saved away 379*c87b03e5Sespieon statements in that program unit 380*c87b03e5Sespieto @file{ste.c}. 381*c87b03e5Sespie 382*c87b03e5Sespie@file{ste.c} ``expands'' each statement, in sequence, by 383*c87b03e5Sespieconstructing the appropriate GBE information and calling 384*c87b03e5Sespiethe appropriate GBE routines. 385*c87b03e5Sespie 386*c87b03e5SespieDetails on the transformational phases follow. 387*c87b03e5SespieKeep in mind that Fortran numbering is used, 388*c87b03e5Sespieso the first character on a line is column 1, 389*c87b03e5Sespiedecimal numbering is used, and so on. 390*c87b03e5Sespie 391*c87b03e5Sespie@menu 392*c87b03e5Sespie* g77stripcard:: 393*c87b03e5Sespie* lex.c:: 394*c87b03e5Sespie* sta.c:: 395*c87b03e5Sespie* sti.c:: 396*c87b03e5Sespie* stq.c:: 397*c87b03e5Sespie* stb.c:: 398*c87b03e5Sespie* expr.c:: 399*c87b03e5Sespie* stc.c:: 400*c87b03e5Sespie* std.c:: 401*c87b03e5Sespie* ste.c:: 402*c87b03e5Sespie 403*c87b03e5Sespie* Gotchas (Transforming):: 404*c87b03e5Sespie* TBD (Transforming):: 405*c87b03e5Sespie@end menu 406*c87b03e5Sespie 407*c87b03e5Sespie@node g77stripcard 408*c87b03e5Sespie@subsection g77stripcard 409*c87b03e5Sespie 410*c87b03e5SespieThe @code{g77stripcard} program handles removing content beyond 411*c87b03e5Sespiecolumn 72 (adjustable via a command-line option), 412*c87b03e5Sespieoptionally warning about that content being something other 413*c87b03e5Sespiethan trailing whitespace or Fortran commentary. 414*c87b03e5Sespie 415*c87b03e5SespieThis program is needed because @code{lex.c} doesn't pay attention 416*c87b03e5Sespieto maximum line lengths at all, to make it easier to maintain, 417*c87b03e5Sespieas well as faster (for sources that don't depend on the maximum 418*c87b03e5Sespiecolumn length vis-a-vis trailing non-blank non-commentary content). 419*c87b03e5Sespie 420*c87b03e5SespieJust how this program will be run---whether automatically for 421*c87b03e5Sespieold source (perhaps as the default for @file{.f} files?)---is not 422*c87b03e5Sespieyet determined. 423*c87b03e5Sespie 424*c87b03e5SespieIn the meantime, it might as well be implemented as a typical UNIX pipe. 425*c87b03e5Sespie 426*c87b03e5SespieIt should accept a @samp{-fline-length-@var{n}} option, 427*c87b03e5Sespiewith the default line length set to 72. 428*c87b03e5Sespie 429*c87b03e5SespieWhen the text it strips off the end of a line is not blank 430*c87b03e5Sespie(not spaces and tabs), 431*c87b03e5Sespieit should insert an additional comment line 432*c87b03e5Sespie(beginning with @samp{!}, 433*c87b03e5Sespieso it works for both fixed-form and free-form files) 434*c87b03e5Sespiecontaining the text, 435*c87b03e5Sespiefollowing the stripped line. 436*c87b03e5SespieThe inserted comment should have a prefix of some kind, 437*c87b03e5SespieTBD, that distinguishes the comment as representing stripped text. 438*c87b03e5SespieUsers could use that to @code{sed} out such lines, if they wished---it 439*c87b03e5Sespieseems silly to provide a command-line option to delete information 440*c87b03e5Sespiewhen it can be so easily filtered out by another program. 441*c87b03e5Sespie 442*c87b03e5Sespie(This inserted comment should be designed to ``fit in'' well 443*c87b03e5Sespiewith whatever the Fortran community is using these days for 444*c87b03e5Sespiepreprocessor, translator, and other such products, like OpenMP. 445*c87b03e5SespieWhat that's all about, and how @code{g77} can elegantly fit its 446*c87b03e5Sespiespecial comment conventions into it all, is TBD as well. 447*c87b03e5SespieWe don't want to reinvent the wheel here, but if there turn out 448*c87b03e5Sespieto be too many conflicting conventions, we might have to invent 449*c87b03e5Sespieone that looks nothing like the others, but which offers their 450*c87b03e5Sespiehost products a better infrastructure in which to fit and coexist 451*c87b03e5Sespiepeacefully.) 452*c87b03e5Sespie 453*c87b03e5Sespie@code{g77stripcard} probably shouldn't do any tab expansion or other 454*c87b03e5Sespiefancy stuff. 455*c87b03e5SespiePeople can use @code{expand} or other pre-filtering if they like. 456*c87b03e5SespieThe idea here is to keep each stage quite simple, while providing 457*c87b03e5Sespieexcellent performance for ``normal'' code. 458*c87b03e5Sespie 459*c87b03e5Sespie(Code with junk beyond column 73 is not really ``normal'', 460*c87b03e5Sespieas it comes from a card-punch heritage, 461*c87b03e5Sespieand will be increasingly hard for tomorrow's Fortran programmers to read.) 462*c87b03e5Sespie 463*c87b03e5Sespie@node lex.c 464*c87b03e5Sespie@subsection lex.c 465*c87b03e5Sespie 466*c87b03e5SespieTo help make the lexer simple, fast, and easy to maintain, 467*c87b03e5Sespiewhile also having @code{g77} generally encourage Fortran programmers 468*c87b03e5Sespieto write simple, maintainable, portable code by maximizing the 469*c87b03e5Sespieperformance of compiling that kind of code: 470*c87b03e5Sespie 471*c87b03e5Sespie@itemize @bullet 472*c87b03e5Sespie@item 473*c87b03e5SespieThere'll be just one lexer, for both fixed-form and free-form source. 474*c87b03e5Sespie 475*c87b03e5Sespie@item 476*c87b03e5SespieIt'll care about the form only when handling the first 7 columns of 477*c87b03e5Sespietext, stuff like spaces between strings of alphanumerics, and 478*c87b03e5Sespiehow lines are continued. 479*c87b03e5Sespie 480*c87b03e5SespieSome other distinctions will be handled by subsequent phases, 481*c87b03e5Sespieso at least one of them will have to know which form is involved. 482*c87b03e5Sespie 483*c87b03e5SespieFor example, @samp{I = 2 . 4} is acceptable in fixed form, 484*c87b03e5Sespieand works in free form as well given the implementation @code{g77} 485*c87b03e5Sespiepresently uses. 486*c87b03e5SespieBut the standard requires a diagnostic for it in free form, 487*c87b03e5Sespieso the parser has to be able to recognize that 488*c87b03e5Sespiethe lexemes aren't contiguous 489*c87b03e5Sespie(information the lexer @emph{does} have to provide) 490*c87b03e5Sespieand that free-form source is being parsed, 491*c87b03e5Sespieso it can provide the diagnostic. 492*c87b03e5Sespie 493*c87b03e5SespieThe @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme. 494*c87b03e5SespieOtherwise, it'd have to know a whole lot more about how to parse Fortran, 495*c87b03e5Sespieor subsequent phases (mainly parsing) would have two paths through 496*c87b03e5Sespielots of critical code---one to handle the lexeme @samp{2}, @samp{.}, 497*c87b03e5Sespieand @samp{4} in sequence, another to handle the lexeme @samp{2.4}. 498*c87b03e5Sespie 499*c87b03e5Sespie@item 500*c87b03e5SespieIt won't worry about line lengths 501*c87b03e5Sespie(beyond the first 7 columns for fixed-form source). 502*c87b03e5Sespie 503*c87b03e5SespieThat is, once it starts parsing the ``statement'' part of a line 504*c87b03e5Sespie(column 7 for fixed-form, column 1 for free-form), 505*c87b03e5Sespieit'll keep going until it finds a newline, 506*c87b03e5Sespierather than ignoring everything past a particular column 507*c87b03e5Sespie(72 or 132). 508*c87b03e5Sespie 509*c87b03e5SespieThe implication here is that there shouldn't @emph{be} 510*c87b03e5Sespieanything past that last column, other than whitespace or 511*c87b03e5Sespiecommentary, because users using typical editors 512*c87b03e5Sespie(or viewing output as typically printed) 513*c87b03e5Sespiewon't necessarily know just where the last column is. 514*c87b03e5Sespie 515*c87b03e5SespieCode that has ``garbage'' beyond the last column 516*c87b03e5Sespie(almost certainly only fixed-form code with a punched-card legacy, 517*c87b03e5Sespiesuch as code using columns 73-80 for ``sequence numbers'') 518*c87b03e5Sespiewill have to be run through @code{g77stripcard} first. 519*c87b03e5Sespie 520*c87b03e5SespieAlso, keeping track of the maximum column position while also watching out 521*c87b03e5Sespiefor the end of a line @emph{and} while reading from a file 522*c87b03e5Sespiejust makes things slower. 523*c87b03e5SespieSince a file must be read, and watching for the end of the line 524*c87b03e5Sespieis necessary (unless the typical input file was preprocessed to 525*c87b03e5Sespieinclude the necessary number of trailing spaces), 526*c87b03e5Sespiedropping the tracking of the maximum column position 527*c87b03e5Sespieis the only way to reduce the complexity of the pertinent code 528*c87b03e5Sespiewhile maintaining high performance. 529*c87b03e5Sespie 530*c87b03e5Sespie@item 531*c87b03e5SespieASCII encoding is assumed for the input file. 532*c87b03e5Sespie 533*c87b03e5SespieCode written in other character sets will have to be converted first. 534*c87b03e5Sespie 535*c87b03e5Sespie@item 536*c87b03e5SespieTabs (ASCII code 9) 537*c87b03e5Sespiewill be converted to spaces via the straightforward 538*c87b03e5Sespieapproach. 539*c87b03e5Sespie 540*c87b03e5SespieSpecifically, a tab is converted to between one and eight spaces 541*c87b03e5Sespieas necessary to reach column @var{n}, 542*c87b03e5Sespiewhere dividing @samp{(@var{n} - 1)} by eight 543*c87b03e5Sespieresults in a remainder of zero. 544*c87b03e5Sespie 545*c87b03e5SespieThat saves having to pass most source files through @code{expand}. 546*c87b03e5Sespie 547*c87b03e5Sespie@item 548*c87b03e5SespieLinefeeds (ASCII code 10) 549*c87b03e5Sespiemark the ends of lines. 550*c87b03e5Sespie 551*c87b03e5Sespie@item 552*c87b03e5SespieA carriage return (ASCII code 13) 553*c87b03e5Sespieis accept if it immediately precedes a linefeed, 554*c87b03e5Sespiein which case it is ignored. 555*c87b03e5Sespie 556*c87b03e5SespieOtherwise, it is rejected (with a diagnostic). 557*c87b03e5Sespie 558*c87b03e5Sespie@item 559*c87b03e5SespieAny other characters other than the above 560*c87b03e5Sespiethat are not part of the GNU Fortran Character Set 561*c87b03e5Sespie(@pxref{Character Set}) 562*c87b03e5Sespieare rejected with a diagnostic. 563*c87b03e5Sespie 564*c87b03e5SespieThis includes backspaces, form feeds, and the like. 565*c87b03e5Sespie 566*c87b03e5Sespie(It might make sense to allow a form feed in column 1 567*c87b03e5Sespieas long as that's the only character on a line. 568*c87b03e5SespieIt certainly wouldn't seem to cost much in terms of performance.) 569*c87b03e5Sespie 570*c87b03e5Sespie@item 571*c87b03e5SespieThe end of the input stream (EOF) 572*c87b03e5Sespieends the current line. 573*c87b03e5Sespie 574*c87b03e5Sespie@item 575*c87b03e5SespieThe distinction between uppercase and lowercase letters 576*c87b03e5Sespiewill be preserved. 577*c87b03e5Sespie 578*c87b03e5SespieIt will be up to subsequent phases to decide to fold case. 579*c87b03e5Sespie 580*c87b03e5SespieCurrent plans are to permit any casing for Fortran (reserved) keywords 581*c87b03e5Sespiewhile preserving casing for user-defined names. 582*c87b03e5Sespie(This might not be made the default for @file{.f} files, though.) 583*c87b03e5Sespie 584*c87b03e5SespiePreserving case seems necessary to provide more direct access 585*c87b03e5Sespieto facilities outside of @code{g77}, such as to C or Pascal code. 586*c87b03e5Sespie 587*c87b03e5SespieNames of intrinsics will probably be matchable in any case, 588*c87b03e5Sespie 589*c87b03e5Sespie(How @samp{external SiN; r = sin(x)} would be handled is TBD. 590*c87b03e5SespieI think old @code{g77} might already handle that pretty elegantly, 591*c87b03e5Sespiebut whether we can cope with allowing the same fragment to reference 592*c87b03e5Sespiea @emph{different} procedure, even with the same interface, 593*c87b03e5Sespievia @samp{s = SiN(r)}, needs to be determined. 594*c87b03e5SespieIf it can't, we need to make sure that when code introduces 595*c87b03e5Sespiea user-defined name, any intrinsic matching that name 596*c87b03e5Sespieusing a case-insensitive comparison 597*c87b03e5Sespieis ``turned off''.) 598*c87b03e5Sespie 599*c87b03e5Sespie@item 600*c87b03e5SespieBackslashes in @code{CHARACTER} and Hollerith constants 601*c87b03e5Sespieare not allowed. 602*c87b03e5Sespie 603*c87b03e5SespieThis avoids the confusion introduced by some Fortran compiler vendors 604*c87b03e5Sespieproviding C-like interpretation of backslashes, 605*c87b03e5Sespiewhile others provide straight-through interpretation. 606*c87b03e5Sespie 607*c87b03e5SespieSome kind of lexical construct (TBD) will be provided to allow 608*c87b03e5Sespieflagging of a @code{CHARACTER} 609*c87b03e5Sespie(but probably not a Hollerith) 610*c87b03e5Sespieconstant that permits backslashes. 611*c87b03e5SespieIt'll necessarily be a prefix, such as: 612*c87b03e5Sespie 613*c87b03e5Sespie@smallexample 614*c87b03e5SespiePRINT *, C'This line has a backspace \b here.' 615*c87b03e5SespiePRINT *, F'This line has a straight backslash \ here.' 616*c87b03e5Sespie@end smallexample 617*c87b03e5Sespie 618*c87b03e5SespieFurther, command-line options might be provided to specify that 619*c87b03e5Sespieone prefix or the other is to be assumed as the default 620*c87b03e5Sespiefor @code{CHARACTER} constants. 621*c87b03e5Sespie 622*c87b03e5SespieHowever, it seems more helpful for @code{g77} to provide a program 623*c87b03e5Sespiethat converts prefix all constants 624*c87b03e5Sespie(or just those containing backslashes) 625*c87b03e5Sespiewith the desired designation, 626*c87b03e5Sespieso printouts of code can be read 627*c87b03e5Sespiewithout knowing the compile-time options used when compiling it. 628*c87b03e5Sespie 629*c87b03e5SespieIf such a program is provided 630*c87b03e5Sespie(let's name it @code{g77slash} for now), 631*c87b03e5Sespiethen a command-line option to @code{g77} should not be provided. 632*c87b03e5Sespie(Though, given that it'll be easy to implement, it might be hard 633*c87b03e5Sespieto resist user requests for it ``to compile faster than if we 634*c87b03e5Sespiehave to invoke another filter''.) 635*c87b03e5Sespie 636*c87b03e5SespieThis program would take a command-line option to specify the 637*c87b03e5Sespiedefault interpretation of slashes, 638*c87b03e5Sespieaffecting which prefix it uses for constants. 639*c87b03e5Sespie 640*c87b03e5Sespie@code{g77slash} probably should automatically convert Hollerith 641*c87b03e5Sespieconstants that contain slashes 642*c87b03e5Sespieto the appropriate @code{CHARACTER} constants. 643*c87b03e5SespieThen @code{g77} wouldn't have to define a prefix syntax for Hollerith 644*c87b03e5Sespieconstants specifying whether they want C-style or straight-through 645*c87b03e5Sespiebackslashes. 646*c87b03e5Sespie 647*c87b03e5Sespie@item 648*c87b03e5SespieTo allow for form-neutral INCLUDE files without requiring them 649*c87b03e5Sespieto be preprocessed, 650*c87b03e5Sespiethe fixed-form lexer should offer an extension (if possible) 651*c87b03e5Sespieallowing a trailing @samp{&} to be ignored, especially if after 652*c87b03e5Sespiecolumn 72, as it would be using the traditional Unix Fortran source 653*c87b03e5Sespiemodel (which ignores @emph{everything} after column 72). 654*c87b03e5Sespie@end itemize 655*c87b03e5Sespie 656*c87b03e5SespieThe above implements nearly exactly what is specified by 657*c87b03e5Sespie@ref{Character Set}, 658*c87b03e5Sespieand 659*c87b03e5Sespie@ref{Lines}, 660*c87b03e5Sespieexcept it also provides automatic conversion of tabs 661*c87b03e5Sespieand ignoring of newline-related carriage returns, 662*c87b03e5Sespieas well as accommodating form-neutral INCLUDE files. 663*c87b03e5Sespie 664*c87b03e5SespieIt also implements the ``pure visual'' model, 665*c87b03e5Sespieby which is meant that a user viewing his code 666*c87b03e5Sespiein a typical text editor 667*c87b03e5Sespie(assuming it's not preprocessed via @code{g77stripcard} or similar) 668*c87b03e5Sespiedoesn't need any special knowledge 669*c87b03e5Sespieof whether spaces on the screen are really tabs, 670*c87b03e5Sespiewhether lines end immediately after the last visible non-space character 671*c87b03e5Sespieor after a number of spaces and tabs that follow it, 672*c87b03e5Sespieor whether the last line in the file is ended by a newline. 673*c87b03e5Sespie 674*c87b03e5SespieMost editors don't make these distinctions, 675*c87b03e5Sespiethe ANSI FORTRAN 77 standard doesn't require them to, 676*c87b03e5Sespieand it permits a standard-conforming compiler 677*c87b03e5Sespieto define a method for transforming source code to 678*c87b03e5Sespie``standard form'' however it wants. 679*c87b03e5Sespie 680*c87b03e5SespieSo, GNU Fortran defines it such that users have the best chance 681*c87b03e5Sespieof having the code be interpreted the way it looks on the screen 682*c87b03e5Sespieof the typical editor. 683*c87b03e5Sespie 684*c87b03e5Sespie(Fancy editors should @emph{never} be required to correctly read code 685*c87b03e5Sespiewritten in classic two-dimensional-plaintext form. 686*c87b03e5SespieBy correct reading I mean ability to read it, book-like, without 687*c87b03e5Sespiemistaking text ignored by the compiler for program code and vice versa, 688*c87b03e5Sespieand without having to count beyond the first several columns. 689*c87b03e5SespieThe vague meaning of ASCII TAB, among other things, complicates 690*c87b03e5Sespiethis somewhat, but as long as ``everyone'', including the editor, 691*c87b03e5Sespieother tools, and printer, agrees about the every-eighth-column convention, 692*c87b03e5Sespiethe GNU Fortran ``pure visual'' model meets these requirements. 693*c87b03e5SespieAny language or user-visible source form 694*c87b03e5Sespierequiring special tagging of tabs, 695*c87b03e5Sespiethe ends of lines after spaces/tabs, 696*c87b03e5Sespieand so on, fails to meet this fairly straightforward specification. 697*c87b03e5SespieFortunately, Fortran @emph{itself} does not mandate such a failure, 698*c87b03e5Sespiethough most vendor-supplied defaults for their Fortran compilers @emph{do} 699*c87b03e5Sespiefail to meet this specification for readability.) 700*c87b03e5Sespie 701*c87b03e5SespieFurther, this model provides a clean interface 702*c87b03e5Sespieto whatever preprocessors or code-generators are used 703*c87b03e5Sespieto produce input to this phase of @code{g77}. 704*c87b03e5SespieMainly, they need not worry about long lines. 705*c87b03e5Sespie 706*c87b03e5Sespie@node sta.c 707*c87b03e5Sespie@subsection sta.c 708*c87b03e5Sespie 709*c87b03e5Sespie@node sti.c 710*c87b03e5Sespie@subsection sti.c 711*c87b03e5Sespie 712*c87b03e5Sespie@node stq.c 713*c87b03e5Sespie@subsection stq.c 714*c87b03e5Sespie 715*c87b03e5Sespie@node stb.c 716*c87b03e5Sespie@subsection stb.c 717*c87b03e5Sespie 718*c87b03e5Sespie@node expr.c 719*c87b03e5Sespie@subsection expr.c 720*c87b03e5Sespie 721*c87b03e5Sespie@node stc.c 722*c87b03e5Sespie@subsection stc.c 723*c87b03e5Sespie 724*c87b03e5Sespie@node std.c 725*c87b03e5Sespie@subsection std.c 726*c87b03e5Sespie 727*c87b03e5Sespie@node ste.c 728*c87b03e5Sespie@subsection ste.c 729*c87b03e5Sespie 730*c87b03e5Sespie@node Gotchas (Transforming) 731*c87b03e5Sespie@subsection Gotchas (Transforming) 732*c87b03e5Sespie 733*c87b03e5SespieThis section is not about transforming ``gotchas'' into something else. 734*c87b03e5SespieIt is about the weirder aspects of transforming Fortran, 735*c87b03e5Sespiehowever that's defined, 736*c87b03e5Sespieinto a more modern, canonical form. 737*c87b03e5Sespie 738*c87b03e5Sespie@subsubsection Multi-character Lexemes 739*c87b03e5Sespie 740*c87b03e5SespieEach lexeme carries with it a pointer to where it appears in the source. 741*c87b03e5Sespie 742*c87b03e5SespieTo provide the ability for diagnostics to point to column numbers, 743*c87b03e5Sespiein addition to line numbers and names, 744*c87b03e5Sespielexemes that represent more than one (significant) character 745*c87b03e5Sespiein the source code need, generally, 746*c87b03e5Sespieto provide pointers to where each @emph{character} appears in the source. 747*c87b03e5Sespie 748*c87b03e5SespieThis provides the ability to properly identify the precise location 749*c87b03e5Sespieof the problem in code like 750*c87b03e5Sespie 751*c87b03e5Sespie@smallexample 752*c87b03e5SespieSUBROUTINE X 753*c87b03e5SespieEND 754*c87b03e5SespieBLOCK DATA X 755*c87b03e5SespieEND 756*c87b03e5Sespie@end smallexample 757*c87b03e5Sespie 758*c87b03e5Sespiewhich, in fixed-form source, would result in single lexemes 759*c87b03e5Sespieconsisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}. 760*c87b03e5Sespie(The problem is that @samp{X} is defined twice, 761*c87b03e5Sespieso a pointer to the @samp{X} in the second definition, 762*c87b03e5Sespieas well as a follow-up pointer to the corresponding pointer in the first, 763*c87b03e5Sespiewould be preferable to pointing to the beginnings of the statements.) 764*c87b03e5Sespie 765*c87b03e5SespieThis need also arises when parsing (and diagnosing) @code{FORMAT} 766*c87b03e5Sespiestatements. 767*c87b03e5Sespie 768*c87b03e5SespieFurther, it arises when diagnosing 769*c87b03e5Sespie@code{FMT=} specifiers that contain constants 770*c87b03e5Sespie(or partial constants, or even propagated constants!) 771*c87b03e5Sespiein I/O statements, as in: 772*c87b03e5Sespie 773*c87b03e5Sespie@smallexample 774*c87b03e5SespiePRINT '(I2, 3HAB)', J 775*c87b03e5Sespie@end smallexample 776*c87b03e5Sespie 777*c87b03e5Sespie(A pointer to the beginning of the prematurely-terminated Hollerith 778*c87b03e5Sespieconstant, and/or to the close parenthese, is preferable to a pointer 779*c87b03e5Sespieto the open-parenthese or the apostrophe that precedes it.) 780*c87b03e5Sespie 781*c87b03e5SespieMulti-character lexemes, which would seem to naturally include 782*c87b03e5Sespieat least digit strings, alphanumeric strings, @code{CHARACTER} 783*c87b03e5Sespieconstants, and Hollerith constants, therefore need to provide 784*c87b03e5Sespielocation information on each character. 785*c87b03e5Sespie(Maybe Hollerith constants don't, but it's unnecessary to except them.) 786*c87b03e5Sespie 787*c87b03e5SespieThe question then arises, what about @emph{other} multi-character lexemes, 788*c87b03e5Sespiesuch as @samp{**} and @samp{//}, 789*c87b03e5Sespieand Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on? 790*c87b03e5Sespie 791*c87b03e5SespieTurns out there's a need to identify the location of the second character 792*c87b03e5Sespieof these two-character lexemes. 793*c87b03e5SespieFor example, in @samp{I(/J) = K}, the slash needs to be diagnosed 794*c87b03e5Sespieas the problem, not the open parenthese. 795*c87b03e5SespieSimilarly, it is preferable to diagnose the second slash in 796*c87b03e5Sespie@samp{I = J // K} rather than the first, given the implicit typing 797*c87b03e5Sespierules, which would result in the compiler disallowing the attempted 798*c87b03e5Sespieconcatenation of two integers. 799*c87b03e5Sespie(Though, since that's more of a semantic issue, 800*c87b03e5Sespieit's not @emph{that} much preferable.) 801*c87b03e5Sespie 802*c87b03e5SespieEven sequences that could be parsed as digit strings could use location info, 803*c87b03e5Sespiefor example, to diagnose the @samp{9} in the octal constant @samp{O'129'}. 804*c87b03e5Sespie(This probably will be parsed as a character string, 805*c87b03e5Sespieto be consistent with the parsing of @samp{Z'129A'}.) 806*c87b03e5Sespie 807*c87b03e5SespieTo avoid the hassle of recording the location of the second character, 808*c87b03e5Sespiewhile also preserving the general rule that each significant character 809*c87b03e5Sespieis distinctly pointed to by the lexeme that contains it, 810*c87b03e5Sespieit's best to simply not have any fixed-size lexemes 811*c87b03e5Sespielarger than one character. 812*c87b03e5Sespie 813*c87b03e5SespieThis new design is expected to make checking for two 814*c87b03e5Sespie@samp{*} lexemes in a row much easier than the old design, 815*c87b03e5Sespieso this is not much of a sacrifice. 816*c87b03e5SespieIt probably makes the lexer much easier to implement 817*c87b03e5Sespiethan it makes the parser harder. 818*c87b03e5Sespie 819*c87b03e5Sespie@subsubsection Space-padding Lexemes 820*c87b03e5Sespie 821*c87b03e5SespieCertain lexemes need to be padded with virtual spaces when the 822*c87b03e5Sespieend of the line (or file) is encountered. 823*c87b03e5Sespie 824*c87b03e5SespieThis is necessary in fixed form, to handle lines that don't 825*c87b03e5Sespieextend to column 72, assuming that's the line length in effect. 826*c87b03e5Sespie 827*c87b03e5Sespie@subsubsection Bizarre Free-form Hollerith Constants 828*c87b03e5Sespie 829*c87b03e5SespieLast I checked, the Fortran 90 standard actually required the compiler 830*c87b03e5Sespieto silently accept something like 831*c87b03e5Sespie 832*c87b03e5Sespie@smallexample 833*c87b03e5SespieFORMAT ( 1 2 Htwelve chars ) 834*c87b03e5Sespie@end smallexample 835*c87b03e5Sespie 836*c87b03e5Sespieas a valid @code{FORMAT} statement specifying a twelve-character 837*c87b03e5SespieHollerith constant. 838*c87b03e5Sespie 839*c87b03e5SespieThe implication here is that, since the new lexer is a zero-feedback one, 840*c87b03e5Sespieit won't know that the special case of a @code{FORMAT} statement being parsed 841*c87b03e5Sespierequires apparently distinct lexemes @samp{1} and @samp{2} to be treated as 842*c87b03e5Sespiea single lexeme. 843*c87b03e5Sespie 844*c87b03e5Sespie(This is a horrible misfeature of the Fortran 90 language. 845*c87b03e5SespieIt's one of many such misfeatures that almost make me want 846*c87b03e5Sespieto not support them, and forge ahead with designing a new 847*c87b03e5Sespie``GNU Fortran'' language that has the features, 848*c87b03e5Sespiebut not the misfeatures, of Fortran 90, 849*c87b03e5Sespieand provide utility programs to do the conversion automatically.) 850*c87b03e5Sespie 851*c87b03e5SespieSo, the lexer must gather distinct chunks of decimal strings into 852*c87b03e5Sespiea single lexeme in contexts where a single decimal lexeme might 853*c87b03e5Sespiestart a Hollerith constant. 854*c87b03e5Sespie 855*c87b03e5Sespie(Which probably means it might as well do that all the time 856*c87b03e5Sespiefor all multi-character lexemes, even in free-form mode, 857*c87b03e5Sespieleaving it to subsequent phases to pull them apart as they see fit.) 858*c87b03e5Sespie 859*c87b03e5SespieCompare the treatment of this to how 860*c87b03e5Sespie 861*c87b03e5Sespie@smallexample 862*c87b03e5SespieCHARACTER * 4 5 HEY 863*c87b03e5Sespie@end smallexample 864*c87b03e5Sespie 865*c87b03e5Sespieand 866*c87b03e5Sespie 867*c87b03e5Sespie@smallexample 868*c87b03e5SespieCHARACTER * 12 HEY 869*c87b03e5Sespie@end smallexample 870*c87b03e5Sespie 871*c87b03e5Sespiemust be treated---the former must be diagnosed, due to the separation 872*c87b03e5Sespiebetween lexemes, the latter must be accepted as a proper declaration. 873*c87b03e5Sespie 874*c87b03e5Sespie@subsubsection Hollerith Constants 875*c87b03e5Sespie 876*c87b03e5SespieRecognizing a Hollerith constant---specifically, 877*c87b03e5Sespiethat an @samp{H} or @samp{h} after a digit string begins 878*c87b03e5Sespiesuch a constant---requires some knowledge of context. 879*c87b03e5Sespie 880*c87b03e5SespieHollerith constants (such as @samp{2HAB}) can appear after: 881*c87b03e5Sespie 882*c87b03e5Sespie@itemize @bullet 883*c87b03e5Sespie@item 884*c87b03e5Sespie@samp{(} 885*c87b03e5Sespie 886*c87b03e5Sespie@item 887*c87b03e5Sespie@samp{,} 888*c87b03e5Sespie 889*c87b03e5Sespie@item 890*c87b03e5Sespie@samp{=} 891*c87b03e5Sespie 892*c87b03e5Sespie@item 893*c87b03e5Sespie@samp{+}, @samp{-}, @samp{/} 894*c87b03e5Sespie 895*c87b03e5Sespie@item 896*c87b03e5Sespie@samp{*}, except as noted below 897*c87b03e5Sespie@end itemize 898*c87b03e5Sespie 899*c87b03e5SespieHollerith constants don't appear after: 900*c87b03e5Sespie 901*c87b03e5Sespie@itemize @bullet 902*c87b03e5Sespie@item 903*c87b03e5Sespie@samp{CHARACTER*}, 904*c87b03e5Sespiewhich can be treated generally as 905*c87b03e5Sespieany @samp{*} that is the second lexeme of a statement 906*c87b03e5Sespie@end itemize 907*c87b03e5Sespie 908*c87b03e5Sespie@subsubsection Confusing Function Keyword 909*c87b03e5Sespie 910*c87b03e5SespieWhile 911*c87b03e5Sespie 912*c87b03e5Sespie@smallexample 913*c87b03e5SespieREAL FUNCTION FOO () 914*c87b03e5Sespie@end smallexample 915*c87b03e5Sespie 916*c87b03e5Sespiemust be a @code{FUNCTION} statement and 917*c87b03e5Sespie 918*c87b03e5Sespie@smallexample 919*c87b03e5SespieREAL FUNCTION FOO (5) 920*c87b03e5Sespie@end smallexample 921*c87b03e5Sespie 922*c87b03e5Sespiemust be a type-definition statement, 923*c87b03e5Sespie 924*c87b03e5Sespie@smallexample 925*c87b03e5SespieREAL FUNCTION FOO (@var{names}) 926*c87b03e5Sespie@end smallexample 927*c87b03e5Sespie 928*c87b03e5Sespiewhere @var{names} is a comma-separated list of names, 929*c87b03e5Sespiecan be one or the other. 930*c87b03e5Sespie 931*c87b03e5SespieThe only way to disambiguate that statement 932*c87b03e5Sespie(short of mandating free-form source or a short maximum 933*c87b03e5Sespielength for name for external procedures) 934*c87b03e5Sespieis based on the context of the statement. 935*c87b03e5Sespie 936*c87b03e5SespieIn particular, the statement is known to be within an 937*c87b03e5Sespiealready-started program unit 938*c87b03e5Sespie(but not at the outer level of the @code{CONTAINS} block), 939*c87b03e5Sespieit is a type-declaration statement. 940*c87b03e5Sespie 941*c87b03e5SespieOtherwise, the statement is a @code{FUNCTION} statement, 942*c87b03e5Sespiein that it begins a function program unit 943*c87b03e5Sespie(external, or, within @code{CONTAINS}, nested). 944*c87b03e5Sespie 945*c87b03e5Sespie@subsubsection Weird READ 946*c87b03e5Sespie 947*c87b03e5SespieThe statement 948*c87b03e5Sespie 949*c87b03e5Sespie@smallexample 950*c87b03e5SespieREAD (N) 951*c87b03e5Sespie@end smallexample 952*c87b03e5Sespie 953*c87b03e5Sespieis equivalent to either 954*c87b03e5Sespie 955*c87b03e5Sespie@smallexample 956*c87b03e5SespieREAD (UNIT=(N)) 957*c87b03e5Sespie@end smallexample 958*c87b03e5Sespie 959*c87b03e5Sespieor 960*c87b03e5Sespie 961*c87b03e5Sespie@smallexample 962*c87b03e5SespieREAD (FMT=(N)) 963*c87b03e5Sespie@end smallexample 964*c87b03e5Sespie 965*c87b03e5Sespiedepending on which would be valid in context. 966*c87b03e5Sespie 967*c87b03e5SespieSpecifically, if @samp{N} is type @code{INTEGER}, 968*c87b03e5Sespie@samp{READ (FMT=(N))} would not be valid, 969*c87b03e5Sespiebecause parentheses may not be used around @samp{N}, 970*c87b03e5Sespiewhereas they may around it in @samp{READ (UNIT=(N))}. 971*c87b03e5Sespie 972*c87b03e5SespieFurther, if @samp{N} is type @code{CHARACTER}, 973*c87b03e5Sespiethe opposite is true---@samp{READ (UNIT=(N))} is not valid, 974*c87b03e5Sespiebut @samp{READ (FMT=(N))} is. 975*c87b03e5Sespie 976*c87b03e5SespieStrictly speaking, if anything follows 977*c87b03e5Sespie 978*c87b03e5Sespie@smallexample 979*c87b03e5SespieREAD (N) 980*c87b03e5Sespie@end smallexample 981*c87b03e5Sespie 982*c87b03e5Sespiein the statement, whether the first lexeme after the close 983*c87b03e5Sespieparenthese is a comma could be used to disambiguate the two cases, 984*c87b03e5Sespiewithout looking at the type of @samp{N}, 985*c87b03e5Sespiebecause the comma is required for the @samp{READ (FMT=(N))} 986*c87b03e5Sespieinterpretation and disallowed for the @samp{READ (UNIT=(N))} 987*c87b03e5Sespieinterpretation. 988*c87b03e5Sespie 989*c87b03e5SespieHowever, in practice, many Fortran compilers allow 990*c87b03e5Sespiethe comma for the @samp{READ (UNIT=(N))} 991*c87b03e5Sespieinterpretation anyway 992*c87b03e5Sespie(in that they generally allow a leading comma before 993*c87b03e5Sespiean I/O list in an I/O statement), 994*c87b03e5Sespieand much code takes advantage of this allowance. 995*c87b03e5Sespie 996*c87b03e5Sespie(This is quite a reasonable allowance, since the 997*c87b03e5Sespiejuxtaposition of a comma-separated list immediately 998*c87b03e5Sespieafter an I/O control-specification list, which is also comma-separated, 999*c87b03e5Sespiewithout an intervening comma, 1000*c87b03e5Sespielooks sufficiently ``wrong'' to programmers 1001*c87b03e5Sespiethat they can't resist the itch to insert the comma. 1002*c87b03e5Sespie@samp{READ (I, J), K, L} simply looks cleaner than 1003*c87b03e5Sespie@samp{READ (I, J) K, L}.) 1004*c87b03e5Sespie 1005*c87b03e5SespieSo, type-based disambiguation is needed unless strict adherence 1006*c87b03e5Sespieto the standard is always assumed, and we're not going to assume that. 1007*c87b03e5Sespie 1008*c87b03e5Sespie@node TBD (Transforming) 1009*c87b03e5Sespie@subsection TBD (Transforming) 1010*c87b03e5Sespie 1011*c87b03e5SespieContinue researching gotchas, designing the transformational process, 1012*c87b03e5Sespieand implementing it. 1013*c87b03e5Sespie 1014*c87b03e5SespieSpecific issues to resolve: 1015*c87b03e5Sespie 1016*c87b03e5Sespie@itemize @bullet 1017*c87b03e5Sespie@item 1018*c87b03e5SespieJust where should (if it was implemented) @code{USE} processing take place? 1019*c87b03e5Sespie 1020*c87b03e5SespieThis gets into the whole issue of how @code{g77} should handle the concept 1021*c87b03e5Sespieof modules. 1022*c87b03e5SespieI think GNAT already takes on this issue, but don't know more than that. 1023*c87b03e5SespieJim Giles has written extensively on @code{comp.lang.fortran} 1024*c87b03e5Sespieabout his opinions on module handling, as have others. 1025*c87b03e5SespieJim's views should be taken into account. 1026*c87b03e5Sespie 1027*c87b03e5SespieActually, Richard M. Stallman (RMS) also has written up 1028*c87b03e5Sespiesome guidelines for implementing such things, 1029*c87b03e5Sespiebut I'm not sure where I read them. 1030*c87b03e5SespiePerhaps the old @email{gcc2@@cygnus.com} list. 1031*c87b03e5Sespie 1032*c87b03e5SespieIf someone could dig references to these up and get them to me, 1033*c87b03e5Sespiethat would be much appreciated! 1034*c87b03e5SespieEven though modules are not on the short-term list for implementation, 1035*c87b03e5Sespieit'd be helpful to know @emph{now} how to avoid making them harder to 1036*c87b03e5Sespieimplement them @emph{later}. 1037*c87b03e5Sespie 1038*c87b03e5Sespie@item 1039*c87b03e5SespieShould the @code{g77} command become just a script that invokes 1040*c87b03e5Sespieall the various preprocessing that might be needed, 1041*c87b03e5Sespiethus making it seem slower than necessary for legacy code 1042*c87b03e5Sespiethat people are unwilling to convert, 1043*c87b03e5Sespieor should we provide a separate script for that, 1044*c87b03e5Sespiethus encouraging people to convert their code once and for all? 1045*c87b03e5Sespie 1046*c87b03e5SespieAt least, a separate script to behave as old @code{g77} did, 1047*c87b03e5Sespieperhaps named @code{g77old}, might ease the transition, 1048*c87b03e5Sespieas might a corresponding one that converts source codes 1049*c87b03e5Sespienamed @code{g77oldnew}. 1050*c87b03e5Sespie 1051*c87b03e5SespieThese scripts would take all the pertinent options @code{g77} used 1052*c87b03e5Sespieto take and run the appropriate filters, 1053*c87b03e5Sespiepassing the results to @code{g77} or just making new sources out of them 1054*c87b03e5Sespie(in a subdirectory, leaving the user to do the dirty deed of 1055*c87b03e5Sespiemoving or copying them over the old sources). 1056*c87b03e5Sespie 1057*c87b03e5Sespie@item 1058*c87b03e5SespieDo other Fortran compilers provide a prefix syntax 1059*c87b03e5Sespieto govern the treatment of backslashes in @code{CHARACTER} 1060*c87b03e5Sespie(or Hollerith) constants? 1061*c87b03e5Sespie 1062*c87b03e5SespieKnowing what other compilers provide would help. 1063*c87b03e5Sespie 1064*c87b03e5Sespie@item 1065*c87b03e5SespieIs it okay to drop support for the @samp{-fintrin-case-initcap}, 1066*c87b03e5Sespie@samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap}, 1067*c87b03e5Sespieand @samp{-fcase-initcap} options? 1068*c87b03e5Sespie 1069*c87b03e5SespieI've asked @email{info-gnu-fortran@@gnu.org} for input on this. 1070*c87b03e5SespieNot having to support these makes it easier to write the new front end, 1071*c87b03e5Sespieand might also avoid complicated its design. 1072*c87b03e5Sespie 1073*c87b03e5SespieThe consensus to date (1999-11-17) has been to drop this support. 1074*c87b03e5SespieCan't recall anybody saying they're using it, in fact. 1075*c87b03e5Sespie@end itemize 1076*c87b03e5Sespie 1077*c87b03e5Sespie@node Philosophy of Code Generation 1078*c87b03e5Sespie@section Philosophy of Code Generation 1079*c87b03e5Sespie 1080*c87b03e5SespieDon't poke the bear. 1081*c87b03e5Sespie 1082*c87b03e5SespieThe @code{g77} front end generates code 1083*c87b03e5Sespievia the @code{gcc} back end. 1084*c87b03e5Sespie 1085*c87b03e5Sespie@cindex GNU Back End (GBE) 1086*c87b03e5Sespie@cindex GBE 1087*c87b03e5Sespie@cindex @code{gcc}, back end 1088*c87b03e5Sespie@cindex back end, gcc 1089*c87b03e5Sespie@cindex code generator 1090*c87b03e5SespieThe @code{gcc} back end (GBE) is a large, complex 1091*c87b03e5Sespielabyrinth of intricate code 1092*c87b03e5Sespiewritten in a combination of the C language 1093*c87b03e5Sespieand specialized languages internal to @code{gcc}. 1094*c87b03e5Sespie 1095*c87b03e5SespieWhile the @emph{code} that implements the GBE 1096*c87b03e5Sespieis written in a combination of languages, 1097*c87b03e5Sespiethe GBE itself is, 1098*c87b03e5Sespieto the front end for a language like Fortran, 1099*c87b03e5Sespiebest viewed as a @emph{compiler} 1100*c87b03e5Sespiethat compiles its own, unique, language. 1101*c87b03e5Sespie 1102*c87b03e5SespieThe GBE's ``source'', then, is written in this language, 1103*c87b03e5Sespiewhich consists primarily of 1104*c87b03e5Sespiea combination of calls to GBE functions 1105*c87b03e5Sespieand @dfn{tree} nodes 1106*c87b03e5Sespie(which are, themselves, created 1107*c87b03e5Sespieby calling GBE functions). 1108*c87b03e5Sespie 1109*c87b03e5SespieSo, the @code{g77} generates code by, in effect, 1110*c87b03e5Sespietranslating the Fortran code it reads 1111*c87b03e5Sespieinto a form ``written'' in the ``language'' 1112*c87b03e5Sespieof the @code{gcc} back end. 1113*c87b03e5Sespie 1114*c87b03e5Sespie@cindex GBEL 1115*c87b03e5Sespie@cindex GNU Back End Language (GBEL) 1116*c87b03e5SespieThis language will heretofore be referred to as @dfn{GBEL}, 1117*c87b03e5Sespiefor GNU Back End Language. 1118*c87b03e5Sespie 1119*c87b03e5SespieGBEL is an evolving language, 1120*c87b03e5Sespienot fully specified in any published form 1121*c87b03e5Sespieas of this writing. 1122*c87b03e5SespieIt offers many facilities, 1123*c87b03e5Sespiebut its ``core'' facilities 1124*c87b03e5Sespieare those that corresponding most directly 1125*c87b03e5Sespieto those needed to support @code{gcc} 1126*c87b03e5Sespie(compiling code written in GNU C). 1127*c87b03e5Sespie 1128*c87b03e5SespieThe @code{g77} Fortran Front End (FFE) 1129*c87b03e5Sespieis designed and implemented 1130*c87b03e5Sespieto navigate the currents and eddies 1131*c87b03e5Sespieof ongoing GBEL and @code{gcc} development 1132*c87b03e5Sespiewhile also delivering on the potential 1133*c87b03e5Sespieof an integrated FFE 1134*c87b03e5Sespie(as compared to using a converter like @code{f2c} 1135*c87b03e5Sespieand feeding the output into @code{gcc}). 1136*c87b03e5Sespie 1137*c87b03e5SespieGoals of the FFE's code-generation strategy include: 1138*c87b03e5Sespie 1139*c87b03e5Sespie@itemize @bullet 1140*c87b03e5Sespie@item 1141*c87b03e5SespieHigh likelihood of generation of correct code, 1142*c87b03e5Sespieor, failing that, producing a fatal diagnostic or crashing. 1143*c87b03e5Sespie 1144*c87b03e5Sespie@item 1145*c87b03e5SespieGeneration of highly optimized code, 1146*c87b03e5Sespieas directed by the user 1147*c87b03e5Sespievia GBE-specific (versus @code{g77}-specific) constructs, 1148*c87b03e5Sespiesuch as command-line options. 1149*c87b03e5Sespie 1150*c87b03e5Sespie@item 1151*c87b03e5SespieFast overall (FFE plus GBE) compilation. 1152*c87b03e5Sespie 1153*c87b03e5Sespie@item 1154*c87b03e5SespiePreservation of source-level debugging information. 1155*c87b03e5Sespie@end itemize 1156*c87b03e5Sespie 1157*c87b03e5SespieThe strategies historically, and currently, used by the FFE 1158*c87b03e5Sespieto achieve these goals include: 1159*c87b03e5Sespie 1160*c87b03e5Sespie@itemize @bullet 1161*c87b03e5Sespie@item 1162*c87b03e5SespieUse of GBEL constructs that most faithfully encapsulate 1163*c87b03e5Sespiethe semantics of Fortran. 1164*c87b03e5Sespie 1165*c87b03e5Sespie@item 1166*c87b03e5SespieAvoidance of GBEL constructs that are so rarely used, 1167*c87b03e5Sespieor limited to use in specialized situations not related to Fortran, 1168*c87b03e5Sespiethat their reliability and performance has not yet been established 1169*c87b03e5Sespieas sufficient for use by the FFE. 1170*c87b03e5Sespie 1171*c87b03e5Sespie@item 1172*c87b03e5SespieFlexible design, to readily accommodate changes to specific 1173*c87b03e5Sespiecode-generation strategies, perhaps governed by command-line options. 1174*c87b03e5Sespie@end itemize 1175*c87b03e5Sespie 1176*c87b03e5Sespie@cindex Bear-poking 1177*c87b03e5Sespie@cindex Poking the bear 1178*c87b03e5Sespie``Don't poke the bear'' somewhat summarizes the above strategies. 1179*c87b03e5SespieThe GBE is the bear. 1180*c87b03e5SespieThe FFE is designed and implemented to avoid poking it 1181*c87b03e5Sespiein ways that are likely to just annoy it. 1182*c87b03e5SespieThe FFE usually either tackles it head-on, 1183*c87b03e5Sespieor avoids treating it in ways dissimilar to how 1184*c87b03e5Sespiethe @code{gcc} front end treats it. 1185*c87b03e5Sespie 1186*c87b03e5SespieFor example, the FFE uses the native array facility in the back end 1187*c87b03e5Sespieinstead of the lower-level pointer-arithmetic facility 1188*c87b03e5Sespieused by @code{gcc} when compiling @code{f2c} output). 1189*c87b03e5SespieTheoretically, this presents more opportunities for optimization, 1190*c87b03e5Sespiefaster compile times, 1191*c87b03e5Sespieand the production of more faithful debugging information. 1192*c87b03e5SespieThese benefits were not, however, immediately realized, 1193*c87b03e5Sespiemainly because @code{gcc} itself makes little or no use 1194*c87b03e5Sespieof the native array facility. 1195*c87b03e5Sespie 1196*c87b03e5SespieComplex arithmetic is a case study of the evolution of this strategy. 1197*c87b03e5SespieWhen originally implemented, 1198*c87b03e5Sespiethe GBEL had just evolved its own native complex-arithmetic facility, 1199*c87b03e5Sespieso the FFE took advantage of that. 1200*c87b03e5Sespie 1201*c87b03e5SespieWhen porting @code{g77} to 64-bit systems, 1202*c87b03e5Sespieit was discovered that the GBE didn't really 1203*c87b03e5Sespieimplement its native complex-arithmetic facility properly. 1204*c87b03e5Sespie 1205*c87b03e5SespieThe short-term solution was to rewrite the FFE 1206*c87b03e5Sespieto instead use the lower-level facilities 1207*c87b03e5Sespiethat'd be used by @code{gcc}-compiled code 1208*c87b03e5Sespie(assuming that code, itself, didn't use the native complex type 1209*c87b03e5Sespieprovided, as an extension, by @code{gcc}), 1210*c87b03e5Sespiesince these were known to work, 1211*c87b03e5Sespieand, in any case, if shown to not work, 1212*c87b03e5Sespiewould likely be rapidly fixed 1213*c87b03e5Sespie(since they'd likely not work for vanilla C code in similar circumstances). 1214*c87b03e5Sespie 1215*c87b03e5SespieHowever, the rewrite accommodated the original, native approach as well 1216*c87b03e5Sespieby offering a command-line option to select it over the emulated approach. 1217*c87b03e5SespieThis allowed users, and especially GBE maintainers, to try out 1218*c87b03e5Sespiefixes to complex-arithmetic support in the GBE 1219*c87b03e5Sespiewhile @code{g77} continued to default to compiling more code correctly, 1220*c87b03e5Sespiealbeit producing (typically) slower executables. 1221*c87b03e5Sespie 1222*c87b03e5SespieAs of April 1999, it appeared that the last few bugs 1223*c87b03e5Sespiein the GBE's support of its native complex-arithmetic facility 1224*c87b03e5Sespiewere worked out. 1225*c87b03e5SespieThe FFE was changed back to default to using that native facility, 1226*c87b03e5Sespieleaving emulation as an option. 1227*c87b03e5Sespie 1228*c87b03e5SespieLater during the release cycle 1229*c87b03e5Sespie(which was called EGCS 1.2, but soon became GCC 2.95), 1230*c87b03e5Sespiebugs in the native facility were found. 1231*c87b03e5SespieReactions among various people included 1232*c87b03e5Sespie``the last thing we should do is change the default back'', 1233*c87b03e5Sespie``we must change the default back'', 1234*c87b03e5Sespieand ``let's figure out whether we can narrow down the bugs to 1235*c87b03e5Sespiefew enough cases to allow the now-months-long-tested default 1236*c87b03e5Sespieto remain the same''. 1237*c87b03e5SespieThe latter viewpoint won that particular time. 1238*c87b03e5SespieThe bugs exposed other concerns regarding ABI compliance 1239*c87b03e5Sespiewhen the ABI specified treatment of complex data as different 1240*c87b03e5Sespiefrom treatment of what Fortran and GNU C consider the equivalent 1241*c87b03e5Sespieaggregation (structure) of real (or float) pairs. 1242*c87b03e5Sespie 1243*c87b03e5SespieOther Fortran constructs---arrays, character strings, 1244*c87b03e5Sespiecomplex division, @code{COMMON} and @code{EQUIVALENCE} aggregates, 1245*c87b03e5Sespieand so on---involve issues similar to those pertaining to complex arithmetic. 1246*c87b03e5Sespie 1247*c87b03e5SespieSo, it is possible that the history 1248*c87b03e5Sespieof how the FFE handled complex arithmetic 1249*c87b03e5Sespiewill be repeated, probably in modified form 1250*c87b03e5Sespie(and hopefully over shorter timeframes), 1251*c87b03e5Sespiefor some of these other facilities. 1252*c87b03e5Sespie 1253*c87b03e5Sespie@node Two-pass Design 1254*c87b03e5Sespie@section Two-pass Design 1255*c87b03e5Sespie 1256*c87b03e5SespieThe FFE does not tell the GBE anything about a program unit 1257*c87b03e5Sespieuntil after the last statement in that unit has been parsed. 1258*c87b03e5Sespie(A program unit is a Fortran concept that corresponds, in the C world, 1259*c87b03e5Sespiemostly closely to functions definitions in ISO C. 1260*c87b03e5SespieThat is, a program unit in Fortran is like a top-level function in C. 1261*c87b03e5SespieNested functions, found among the extensions offered by GNU C, 1262*c87b03e5Sespiecorrespond roughly to Fortran's statement functions.) 1263*c87b03e5Sespie 1264*c87b03e5SespieSo, while parsing the code in a program unit, 1265*c87b03e5Sespiethe FFE saves up all the information 1266*c87b03e5Sespieon statements, expressions, names, and so on, 1267*c87b03e5Sespieuntil it has seen the last statement. 1268*c87b03e5Sespie 1269*c87b03e5SespieAt that point, the FFE revisits the saved information 1270*c87b03e5Sespie(in what amounts to a second @dfn{pass} over the program unit) 1271*c87b03e5Sespieto perform the actual translation of the program unit into GBEL, 1272*c87b03e5Sespieultimating in the generation of assembly code for it. 1273*c87b03e5Sespie 1274*c87b03e5SespieSome lookahead is performed during this second pass, 1275*c87b03e5Sespieso the FFE could be viewed as a ``two-plus-pass'' design. 1276*c87b03e5Sespie 1277*c87b03e5Sespie@menu 1278*c87b03e5Sespie* Two-pass Code:: 1279*c87b03e5Sespie* Why Two Passes:: 1280*c87b03e5Sespie@end menu 1281*c87b03e5Sespie 1282*c87b03e5Sespie@node Two-pass Code 1283*c87b03e5Sespie@subsection Two-pass Code 1284*c87b03e5Sespie 1285*c87b03e5SespieMost of the code that turns the first pass (parsing) 1286*c87b03e5Sespieinto a second pass for code generation 1287*c87b03e5Sespieis in @file{@value{path-g77}/std.c}. 1288*c87b03e5Sespie 1289*c87b03e5SespieIt has external functions, 1290*c87b03e5Sespiecalled mainly by siblings in @file{@value{path-g77}/stc.c}, 1291*c87b03e5Sespiethat record the information on statements and expressions 1292*c87b03e5Sespiein the order they are seen in the source code. 1293*c87b03e5SespieThese functions save that information. 1294*c87b03e5Sespie 1295*c87b03e5SespieIt also has an external function that revisits that information, 1296*c87b03e5Sespiecalling the siblings in @file{@value{path-g77}/ste.c}, 1297*c87b03e5Sespiewhich handles the actual code generation 1298*c87b03e5Sespie(by generating GBEL code, 1299*c87b03e5Sespiethat is, by calling GBE routines 1300*c87b03e5Sespieto represent and specify expressions, statements, and so on). 1301*c87b03e5Sespie 1302*c87b03e5Sespie@node Why Two Passes 1303*c87b03e5Sespie@subsection Why Two Passes 1304*c87b03e5Sespie 1305*c87b03e5SespieThe need for two passes was not immediately evident 1306*c87b03e5Sespieduring the design and implementation of the code in the FFE 1307*c87b03e5Sespiethat was to produce GBEL. 1308*c87b03e5SespieOnly after a few kludges, 1309*c87b03e5Sespieto handle things like incorrectly-guessed @code{ASSIGN} label nature, 1310*c87b03e5Sespiehad been implemented, 1311*c87b03e5Sespiedid enough evidence pile up to make it clear 1312*c87b03e5Sespiethat @file{std.c} had to be introduced to intercept, 1313*c87b03e5Sespiesave, then revisit as part of a second pass, 1314*c87b03e5Sespiethe digested contents of a program unit. 1315*c87b03e5Sespie 1316*c87b03e5SespieOther such missteps have occurred during the evolution of the FFE, 1317*c87b03e5Sespiebecause of the different goals of the FFE and the GBE. 1318*c87b03e5Sespie 1319*c87b03e5SespieBecause the GBE's original, and still primary, goal 1320*c87b03e5Sespiewas to directly support the GNU C language, 1321*c87b03e5Sespiethe GBEL, and the GBE itself, 1322*c87b03e5Sespierequires more complexity 1323*c87b03e5Sespieon the part of most front ends 1324*c87b03e5Sespiethan it requires of @code{gcc}'s. 1325*c87b03e5Sespie 1326*c87b03e5SespieFor example, 1327*c87b03e5Sespiethe GBEL offers an interface that permits the @code{gcc} front end 1328*c87b03e5Sespieto implement most, or all, of the language features it supports, 1329*c87b03e5Sespiewithout the front end having to 1330*c87b03e5Sespiemake use of non-user-defined variables. 1331*c87b03e5Sespie(It's almost certainly the case that all of K&R C, 1332*c87b03e5Sespieand probably ANSI C as well, 1333*c87b03e5Sespieis handled by the @code{gcc} front end 1334*c87b03e5Sespiewithout declaring such variables.) 1335*c87b03e5Sespie 1336*c87b03e5SespieThe FFE, on the other hand, must resort to a variety of ``tricks'' 1337*c87b03e5Sespieto achieve its goals. 1338*c87b03e5Sespie 1339*c87b03e5SespieConsider the following C code: 1340*c87b03e5Sespie 1341*c87b03e5Sespie@smallexample 1342*c87b03e5Sespieint 1343*c87b03e5Sespiefoo (int a, int b) 1344*c87b03e5Sespie@{ 1345*c87b03e5Sespie int c = 0; 1346*c87b03e5Sespie 1347*c87b03e5Sespie if ((c = bar (c)) == 0) 1348*c87b03e5Sespie goto done; 1349*c87b03e5Sespie 1350*c87b03e5Sespie quux (c << 1); 1351*c87b03e5Sespie 1352*c87b03e5Sespiedone: 1353*c87b03e5Sespie return c; 1354*c87b03e5Sespie@} 1355*c87b03e5Sespie@end smallexample 1356*c87b03e5Sespie 1357*c87b03e5SespieNote what kinds of objects are declared, or defined, before their use, 1358*c87b03e5Sespieand before any actual code generation involving them 1359*c87b03e5Sespiewould normally take place: 1360*c87b03e5Sespie 1361*c87b03e5Sespie@itemize @bullet 1362*c87b03e5Sespie@item 1363*c87b03e5SespieReturn type of function 1364*c87b03e5Sespie 1365*c87b03e5Sespie@item 1366*c87b03e5SespieEntry point(s) of function 1367*c87b03e5Sespie 1368*c87b03e5Sespie@item 1369*c87b03e5SespieDummy arguments 1370*c87b03e5Sespie 1371*c87b03e5Sespie@item 1372*c87b03e5SespieVariables 1373*c87b03e5Sespie 1374*c87b03e5Sespie@item 1375*c87b03e5SespieInitial values for variables 1376*c87b03e5Sespie@end itemize 1377*c87b03e5Sespie 1378*c87b03e5SespieWhereas, the following items can, and do, 1379*c87b03e5Sespiesuddenly appear ``out of the blue'' in C: 1380*c87b03e5Sespie 1381*c87b03e5Sespie@itemize @bullet 1382*c87b03e5Sespie@item 1383*c87b03e5SespieLabel references 1384*c87b03e5Sespie 1385*c87b03e5Sespie@item 1386*c87b03e5SespieFunction references 1387*c87b03e5Sespie@end itemize 1388*c87b03e5Sespie 1389*c87b03e5SespieNot surprisingly, the GBE faithfully permits the latter set of items 1390*c87b03e5Sespieto be ``discovered'' partway through GBEL ``programs'', 1391*c87b03e5Sespiejust as they are permitted to in C. 1392*c87b03e5Sespie 1393*c87b03e5SespieYet, the GBE has tended, at least in the past, 1394*c87b03e5Sespieto be reticent to fully support similar ``late'' discovery 1395*c87b03e5Sespieof items in the former set. 1396*c87b03e5Sespie 1397*c87b03e5SespieThis makes Fortran a poor fit for the ``safe'' subset of GBEL. 1398*c87b03e5SespieConsider: 1399*c87b03e5Sespie 1400*c87b03e5Sespie@smallexample 1401*c87b03e5Sespie FUNCTION X (A, ARRAY, ID1) 1402*c87b03e5Sespie CHARACTER*(*) A 1403*c87b03e5Sespie DOUBLE PRECISION X, Y, Z, TMP, EE, PI 1404*c87b03e5Sespie REAL ARRAY(ID1*ID2) 1405*c87b03e5Sespie COMMON ID2 1406*c87b03e5Sespie EXTERNAL FRED 1407*c87b03e5Sespie 1408*c87b03e5Sespie ASSIGN 100 TO J 1409*c87b03e5Sespie CALL FOO (I) 1410*c87b03e5Sespie IF (I .EQ. 0) PRINT *, A(0) 1411*c87b03e5Sespie GOTO 200 1412*c87b03e5Sespie 1413*c87b03e5Sespie ENTRY Y (Z) 1414*c87b03e5Sespie ASSIGN 101 TO J 1415*c87b03e5Sespie200 PRINT *, A(1) 1416*c87b03e5Sespie READ *, TMP 1417*c87b03e5Sespie GOTO J 1418*c87b03e5Sespie100 X = TMP * EE 1419*c87b03e5Sespie RETURN 1420*c87b03e5Sespie101 Y = TMP * PI 1421*c87b03e5Sespie CALL FRED 1422*c87b03e5Sespie DATA EE, PI /2.71D0, 3.14D0/ 1423*c87b03e5Sespie END 1424*c87b03e5Sespie@end smallexample 1425*c87b03e5Sespie 1426*c87b03e5SespieHere are some observations about the above code, 1427*c87b03e5Sespiewhich, while somewhat contrived, 1428*c87b03e5Sespieconforms to the FORTRAN 77 and Fortran 90 standards: 1429*c87b03e5Sespie 1430*c87b03e5Sespie@itemize @bullet 1431*c87b03e5Sespie@item 1432*c87b03e5SespieThe return type of function @samp{X} is not known 1433*c87b03e5Sespieuntil the @samp{DOUBLE PRECISION} line has been parsed. 1434*c87b03e5Sespie 1435*c87b03e5Sespie@item 1436*c87b03e5SespieWhether @samp{A} is a function or a variable 1437*c87b03e5Sespieis not known until the @samp{PRINT *, A(0)} statement 1438*c87b03e5Sespiehas been parsed. 1439*c87b03e5Sespie 1440*c87b03e5Sespie@item 1441*c87b03e5SespieThe bounds of the array of argument @samp{ARRAY} 1442*c87b03e5Sespiedepend on a computation involving 1443*c87b03e5Sespiethe subsequent argument @samp{ID1} 1444*c87b03e5Sespieand the blank-common member @samp{ID2}. 1445*c87b03e5Sespie 1446*c87b03e5Sespie@item 1447*c87b03e5SespieWhether @samp{Y} and @samp{Z} are local variables, 1448*c87b03e5Sespieadditional function entry points, 1449*c87b03e5Sespieor dummy arguments to additional entry points 1450*c87b03e5Sespieis not known 1451*c87b03e5Sespieuntil the @code{ENTRY} statement is parsed. 1452*c87b03e5Sespie 1453*c87b03e5Sespie@item 1454*c87b03e5SespieSimilarly, whether @samp{TMP} is a local variable is not known 1455*c87b03e5Sespieuntil the @samp{READ *, TMP} statement is parsed. 1456*c87b03e5Sespie 1457*c87b03e5Sespie@item 1458*c87b03e5SespieThe initial values for @samp{EE} and @samp{PI} 1459*c87b03e5Sespieare not known until after the @code{DATA} statement is parsed. 1460*c87b03e5Sespie 1461*c87b03e5Sespie@item 1462*c87b03e5SespieWhether @samp{FRED} is a function returning type @code{REAL} 1463*c87b03e5Sespieor a subroutine 1464*c87b03e5Sespie(which can be thought of as returning type @code{void} 1465*c87b03e5Sespie@emph{or}, to support alternate returns in a simple way, 1466*c87b03e5Sespietype @code{int}) 1467*c87b03e5Sespieis not known 1468*c87b03e5Sespieuntil the @samp{CALL FRED} statement is parsed. 1469*c87b03e5Sespie 1470*c87b03e5Sespie@item 1471*c87b03e5SespieWhether @samp{100} is a @code{FORMAT} label 1472*c87b03e5Sespieor the label of an executable statement 1473*c87b03e5Sespieis not known 1474*c87b03e5Sespieuntil the @samp{X =} statement is parsed. 1475*c87b03e5Sespie(These two types of labels get @emph{very} different treatment, 1476*c87b03e5Sespieespecially when @code{ASSIGN}'ed.) 1477*c87b03e5Sespie 1478*c87b03e5Sespie@item 1479*c87b03e5SespieThat @samp{J} is a local variable is not known 1480*c87b03e5Sespieuntil the first @code{ASSIGN} statement is parsed. 1481*c87b03e5Sespie(This happens @emph{after} executable code has been seen.) 1482*c87b03e5Sespie@end itemize 1483*c87b03e5Sespie 1484*c87b03e5SespieVery few of these ``discoveries'' 1485*c87b03e5Sespiecan be accommodated by the GBE as it has evolved over the years. 1486*c87b03e5SespieThe GBEL doesn't support several of them, 1487*c87b03e5Sespieand those it might appear to support 1488*c87b03e5Sespiedon't always work properly, 1489*c87b03e5Sespieespecially in combination with other GBEL and GBE features, 1490*c87b03e5Sespieas implemented in the GBE. 1491*c87b03e5Sespie 1492*c87b03e5Sespie(Had the GBE and its GBEL originally evolved to support @code{g77}, 1493*c87b03e5Sespiethe shoe would be on the other foot, so to speak---most, if not all, 1494*c87b03e5Sespieof the above would be directly supported by the GBEL, 1495*c87b03e5Sespieand a few C constructs would probably not, as they are in reality, 1496*c87b03e5Sespiebe supported. 1497*c87b03e5SespieBoth this mythical, and today's real, GBE caters to its GBEL 1498*c87b03e5Sespieby, sometimes, scrambling around, cleaning up after itself---after 1499*c87b03e5Sespiediscovering that assumptions it made earlier during code generation 1500*c87b03e5Sespieare incorrect. 1501*c87b03e5SespieThat's not a great design, since it indicates significant code 1502*c87b03e5Sespiepaths that might be rarely tested but used in some key production 1503*c87b03e5Sespieenvironments.) 1504*c87b03e5Sespie 1505*c87b03e5SespieSo, the FFE handles these discrepancies---between the order in which 1506*c87b03e5Sespieit discovers facts about the code it is compiling, 1507*c87b03e5Sespieand the order in which the GBEL and GBE support such discoveries---by 1508*c87b03e5Sespieperforming what amounts to two 1509*c87b03e5Sespiepasses over each program unit. 1510*c87b03e5Sespie 1511*c87b03e5Sespie(A few ambiguities can remain at that point, 1512*c87b03e5Sespiesuch as whether, given @samp{EXTERNAL BAZ} 1513*c87b03e5Sespieand no other reference to @samp{BAZ} in the program unit, 1514*c87b03e5Sespieit is a subroutine, a function, or a block-data---which, in C-speak, 1515*c87b03e5Sespiegoverns its declared return type. 1516*c87b03e5SespieFortunately, these distinctions are easily finessed 1517*c87b03e5Sespiefor the procedure, library, and object-file interfaces 1518*c87b03e5Sespiesupported by @code{g77}.) 1519*c87b03e5Sespie 1520*c87b03e5Sespie@node Challenges Posed 1521*c87b03e5Sespie@section Challenges Posed 1522*c87b03e5Sespie 1523*c87b03e5SespieConsider the following Fortran code, which uses various extensions 1524*c87b03e5Sespie(including some to Fortran 90): 1525*c87b03e5Sespie 1526*c87b03e5Sespie@smallexample 1527*c87b03e5SespieSUBROUTINE X(A) 1528*c87b03e5SespieCHARACTER*(*) A 1529*c87b03e5SespieCOMPLEX CFUNC 1530*c87b03e5SespieINTEGER*2 CLOCKS(200) 1531*c87b03e5SespieINTEGER IFUNC 1532*c87b03e5Sespie 1533*c87b03e5SespieCALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')')))) 1534*c87b03e5Sespie@end smallexample 1535*c87b03e5Sespie 1536*c87b03e5SespieThe above poses the following challenges to any Fortran compiler 1537*c87b03e5Sespiethat uses run-time interfaces, and a run-time library, roughly similar 1538*c87b03e5Sespieto those used by @code{g77}: 1539*c87b03e5Sespie 1540*c87b03e5Sespie@itemize @bullet 1541*c87b03e5Sespie@item 1542*c87b03e5SespieAssuming the library routine that supports @code{SYSTEM_CLOCK} 1543*c87b03e5Sespieexpects to set an @code{INTEGER*4} variable via its @code{COUNT} argument, 1544*c87b03e5Sespiethe compiler must make available to it a temporary variable of that type. 1545*c87b03e5Sespie 1546*c87b03e5Sespie@item 1547*c87b03e5SespieFurther, after the @code{SYSTEM_CLOCK} library routine returns, 1548*c87b03e5Sespiethe compiler must ensure that the temporary variable it wrote 1549*c87b03e5Sespieis copied into the appropriate element of the @samp{CLOCKS} array. 1550*c87b03e5Sespie(This assumes the compiler doesn't just reject the code, 1551*c87b03e5Sespiewhich it should if it is compiling under some kind of a ``strict'' option.) 1552*c87b03e5Sespie 1553*c87b03e5Sespie@item 1554*c87b03e5SespieTo determine the correct index into the @samp{CLOCKS} array, 1555*c87b03e5Sespie(putting aside the fact that the index, in this particular case, 1556*c87b03e5Sespieneed not be computed until after 1557*c87b03e5Sespiethe @code{SYSTEM_CLOCK} library routine returns), 1558*c87b03e5Sespiethe compiler must ensure that the @code{IFUNC} function is called. 1559*c87b03e5Sespie 1560*c87b03e5SespieThat requires evaluating its argument, 1561*c87b03e5Sespiewhich requires, for @code{g77} 1562*c87b03e5Sespie(assuming @code{-ff2c} is in force), 1563*c87b03e5Sespiereserving a temporary variable of type @code{COMPLEX} 1564*c87b03e5Sespiefor use as a repository for the return value 1565*c87b03e5Sespiebeing computed by @samp{CFUNC}. 1566*c87b03e5Sespie 1567*c87b03e5Sespie@item 1568*c87b03e5SespieBefore invoking @samp{CFUNC}, 1569*c87b03e5Sespieis argument must be evaluated, 1570*c87b03e5Sespiewhich requires allocating, at run time, 1571*c87b03e5Sespiea temporary large enough to hold the result of the concatenation, 1572*c87b03e5Sespieas well as actually performing the concatenation. 1573*c87b03e5Sespie 1574*c87b03e5Sespie@item 1575*c87b03e5SespieThe large temporary needed during invocation of @code{CFUNC} 1576*c87b03e5Sespieshould, ideally, be deallocated 1577*c87b03e5Sespie(or, at least, left to the GBE to dispose of, as it sees fit) 1578*c87b03e5Sespieas soon as @code{CFUNC} returns, 1579*c87b03e5Sespiewhich means before @code{IFUNC} is called 1580*c87b03e5Sespie(as it might need a lot of dynamically allocated memory). 1581*c87b03e5Sespie@end itemize 1582*c87b03e5Sespie 1583*c87b03e5Sespie@code{g77} currently doesn't support all of the above, 1584*c87b03e5Sespiebut, so that it might someday, it has evolved to handle 1585*c87b03e5Sespieat least some of the above requirements. 1586*c87b03e5Sespie 1587*c87b03e5SespieMeeting the above requirements is made more challenging 1588*c87b03e5Sespieby conforming to the requirements of the GBEL/GBE combination. 1589*c87b03e5Sespie 1590*c87b03e5Sespie@node Transforming Statements 1591*c87b03e5Sespie@section Transforming Statements 1592*c87b03e5Sespie 1593*c87b03e5SespieMost Fortran statements are given their own block, 1594*c87b03e5Sespieand, for temporary variables they might need, their own scope. 1595*c87b03e5Sespie(A block is what distinguishes @samp{@{ foo (); @}} 1596*c87b03e5Sespiefrom just @samp{foo ();} in C. 1597*c87b03e5SespieA scope is included with every such block, 1598*c87b03e5Sespieproviding a distinct name space for local variables.) 1599*c87b03e5Sespie 1600*c87b03e5SespieLabel definitions for the statement precede this block, 1601*c87b03e5Sespieso @samp{10 PRINT *, I} is handled more like 1602*c87b03e5Sespie@samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}} 1603*c87b03e5Sespie(where @samp{fl10} is just a notation meaning ``Fortran Label 10'' 1604*c87b03e5Sespiefor the purposes of this document). 1605*c87b03e5Sespie 1606*c87b03e5Sespie@menu 1607*c87b03e5Sespie* Statements Needing Temporaries:: 1608*c87b03e5Sespie* Transforming DO WHILE:: 1609*c87b03e5Sespie* Transforming Iterative DO:: 1610*c87b03e5Sespie* Transforming Block IF:: 1611*c87b03e5Sespie* Transforming SELECT CASE:: 1612*c87b03e5Sespie@end menu 1613*c87b03e5Sespie 1614*c87b03e5Sespie@node Statements Needing Temporaries 1615*c87b03e5Sespie@subsection Statements Needing Temporaries 1616*c87b03e5Sespie 1617*c87b03e5SespieAny temporaries needed during, but not beyond, 1618*c87b03e5Sespieexecution of a Fortran statement, 1619*c87b03e5Sespieare made local to the scope of that statement's block. 1620*c87b03e5Sespie 1621*c87b03e5SespieThis allows the GBE to share storage for these temporaries 1622*c87b03e5Sespieamong the various statements without the FFE 1623*c87b03e5Sespiehaving to manage that itself. 1624*c87b03e5Sespie 1625*c87b03e5Sespie(The GBE could, of course, decide to optimize 1626*c87b03e5Sespiemanagement of these temporaries. 1627*c87b03e5SespieFor example, it could, theoretically, 1628*c87b03e5Sespieschedule some of the computations involving these temporaries 1629*c87b03e5Sespieto occur in parallel. 1630*c87b03e5SespieMore practically, it might leave the storage for some temporaries 1631*c87b03e5Sespie``live'' beyond their scopes, to reduce the number of 1632*c87b03e5Sespiemanipulations of the stack pointer at run time.) 1633*c87b03e5Sespie 1634*c87b03e5SespieTemporaries needed across distinct statement boundaries usually 1635*c87b03e5Sespieare associated with Fortran blocks (such as @code{DO}/@code{END DO}). 1636*c87b03e5Sespie(Also, there might be temporaries not associated with blocks at all---these 1637*c87b03e5Sespiewould be in the scope of the entire program unit.) 1638*c87b03e5Sespie 1639*c87b03e5SespieEach Fortran block @emph{should} get its own block/scope in the GBE. 1640*c87b03e5SespieThis is best, because it allows temporaries to be more naturally handled. 1641*c87b03e5SespieHowever, it might pose problems when handling labels 1642*c87b03e5Sespie(in particular, when they're the targets of @code{GOTO}s outside the Fortran 1643*c87b03e5Sespieblock), and generally just hassling with replicating 1644*c87b03e5Sespieparts of the @code{gcc} front end 1645*c87b03e5Sespie(because the FFE needs to support 1646*c87b03e5Sespiean arbitrary number of nested back-end blocks 1647*c87b03e5Sespieif each Fortran block gets one). 1648*c87b03e5Sespie 1649*c87b03e5SespieSo, there might still be a need for top-level temporaries, whose 1650*c87b03e5Sespie``owning'' scope is that of the containing procedure. 1651*c87b03e5Sespie 1652*c87b03e5SespieAlso, there seems to be problems declaring new variables after 1653*c87b03e5Sespiegenerating code (within a block) in the back end, leading to, e.g., 1654*c87b03e5Sespie@samp{label not defined before binding contour} or similar messages, 1655*c87b03e5Sespiewhen compiling with @samp{-fstack-check} or 1656*c87b03e5Sespiewhen compiling for certain targets. 1657*c87b03e5Sespie 1658*c87b03e5SespieBecause of that, and because sometimes these temporaries are not 1659*c87b03e5Sespiediscovered until in the middle of of generating code for an expression 1660*c87b03e5Sespiestatement (as in the case of the optimization for @samp{X**I}), 1661*c87b03e5Sespieit seems best to always 1662*c87b03e5Sespiepre-scan all the expressions that'll be expanded for a block 1663*c87b03e5Sespiebefore generating any of the code for that block. 1664*c87b03e5Sespie 1665*c87b03e5SespieThis pre-scan then handles discovering and declaring, to the back end, 1666*c87b03e5Sespiethe temporaries needed for that block. 1667*c87b03e5Sespie 1668*c87b03e5SespieIt's also important to treat distinct items in an I/O list as distinct 1669*c87b03e5Sespiestatements deserving their own blocks. 1670*c87b03e5SespieThat's because there's a requirement 1671*c87b03e5Sespiethat each I/O item be fully processed before the next one, 1672*c87b03e5Sespiewhich matters in cases like @samp{READ (*,*), I, A(I)}---the 1673*c87b03e5Sespieelement of @samp{A} read in the second item 1674*c87b03e5Sespie@emph{must} be determined from the value 1675*c87b03e5Sespieof @samp{I} read in the first item. 1676*c87b03e5Sespie 1677*c87b03e5Sespie@node Transforming DO WHILE 1678*c87b03e5Sespie@subsection Transforming DO WHILE 1679*c87b03e5Sespie 1680*c87b03e5Sespie@samp{DO WHILE(expr)} @emph{must} be implemented 1681*c87b03e5Sespieso that temporaries needed to evaluate @samp{expr} 1682*c87b03e5Sespieare generated just for the test, each time. 1683*c87b03e5Sespie 1684*c87b03e5SespieConsider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed: 1685*c87b03e5Sespie 1686*c87b03e5Sespie@smallexample 1687*c87b03e5Sespiefor (;;) 1688*c87b03e5Sespie @{ 1689*c87b03e5Sespie int temp0; 1690*c87b03e5Sespie 1691*c87b03e5Sespie @{ 1692*c87b03e5Sespie char temp1[large]; 1693*c87b03e5Sespie 1694*c87b03e5Sespie libg77_catenate (temp1, a, b); 1695*c87b03e5Sespie temp0 = libg77_ne (temp1, 'END'); 1696*c87b03e5Sespie @} 1697*c87b03e5Sespie 1698*c87b03e5Sespie if (! temp0) 1699*c87b03e5Sespie break; 1700*c87b03e5Sespie 1701*c87b03e5Sespie @dots{} 1702*c87b03e5Sespie @} 1703*c87b03e5Sespie@end smallexample 1704*c87b03e5Sespie 1705*c87b03e5SespieIn this case, it seems like a time/space tradeoff 1706*c87b03e5Sespiebetween allocating and deallocating @samp{temp1} for each iteration 1707*c87b03e5Sespieand allocating it just once for the entire loop. 1708*c87b03e5Sespie 1709*c87b03e5SespieHowever, if @samp{temp1} is allocated just once for the entire loop, 1710*c87b03e5Sespieit could be the wrong size for subsequent iterations of that loop 1711*c87b03e5Sespiein cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')}, 1712*c87b03e5Sespiebecause the body of the loop might modify @samp{I} or @samp{J}. 1713*c87b03e5Sespie 1714*c87b03e5SespieSo, the above implementation is used, 1715*c87b03e5Sespiethough a more optimal one can be used 1716*c87b03e5Sespiein specific circumstances. 1717*c87b03e5Sespie 1718*c87b03e5Sespie@node Transforming Iterative DO 1719*c87b03e5Sespie@subsection Transforming Iterative DO 1720*c87b03e5Sespie 1721*c87b03e5SespieAn iterative @code{DO} loop 1722*c87b03e5Sespie(one that specifies an iteration variable) 1723*c87b03e5Sespieis required by the Fortran standards 1724*c87b03e5Sespieto be implemented as though an iteration count 1725*c87b03e5Sespieis computed before entering the loop body, 1726*c87b03e5Sespieand that iteration count used to determine 1727*c87b03e5Sespiethe number of times the loop body is to be performed 1728*c87b03e5Sespie(assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}). 1729*c87b03e5Sespie 1730*c87b03e5SespieThe FFE handles this by allocating a temporary variable 1731*c87b03e5Sespieto contain the computed number of iterations. 1732*c87b03e5SespieSince this variable must be in a scope that includes the entire loop, 1733*c87b03e5Sespiea GBEL block is created for that loop, 1734*c87b03e5Sespieand the variable declared as belonging to the scope of that block. 1735*c87b03e5Sespie 1736*c87b03e5Sespie@node Transforming Block IF 1737*c87b03e5Sespie@subsection Transforming Block IF 1738*c87b03e5Sespie 1739*c87b03e5SespieConsider: 1740*c87b03e5Sespie 1741*c87b03e5Sespie@smallexample 1742*c87b03e5SespieSUBROUTINE X(A,B,C) 1743*c87b03e5SespieCHARACTER*(*) A, B, C 1744*c87b03e5SespieLOGICAL LFUNC 1745*c87b03e5Sespie 1746*c87b03e5SespieIF (LFUNC (A//B)) THEN 1747*c87b03e5Sespie CALL SUBR1 1748*c87b03e5SespieELSE IF (LFUNC (A//C)) THEN 1749*c87b03e5Sespie CALL SUBR2 1750*c87b03e5SespieELSE 1751*c87b03e5Sespie CALL SUBR3 1752*c87b03e5SespieEND 1753*c87b03e5Sespie@end smallexample 1754*c87b03e5Sespie 1755*c87b03e5SespieThe arguments to the two calls to @samp{LFUNC} 1756*c87b03e5Sespierequire dynamic allocation (at run time), 1757*c87b03e5Sespiebut are not required during execution of the @code{CALL} statements. 1758*c87b03e5Sespie 1759*c87b03e5SespieSo, the scopes of those temporaries must be within blocks inside 1760*c87b03e5Sespiethe block corresponding to the Fortran @code{IF} block. 1761*c87b03e5Sespie 1762*c87b03e5SespieThis cannot be represented ``naturally'' 1763*c87b03e5Sespiein vanilla C, nor in GBEL. 1764*c87b03e5SespieThe @code{if}, @code{elseif}, @code{else}, 1765*c87b03e5Sespieand @code{endif} constructs 1766*c87b03e5Sespieprovided by both languages must, 1767*c87b03e5Sespiefor a given @code{if} block, 1768*c87b03e5Sespieshare the same C/GBE block. 1769*c87b03e5Sespie 1770*c87b03e5SespieTherefore, any temporaries needed during evaluation of @samp{expr} 1771*c87b03e5Sespiewhile executing @samp{ELSE IF(expr)} 1772*c87b03e5Sespiemust either have been predeclared 1773*c87b03e5Sespieat the top of the corresponding @code{IF} block, 1774*c87b03e5Sespieor declared within a new block for that @code{ELSE IF}---a block that, 1775*c87b03e5Sespiesince it cannot contain the @code{else} or @code{else if} itself 1776*c87b03e5Sespie(due to the above requirement), 1777*c87b03e5Sespieactually implements the rest of the @code{IF} block's 1778*c87b03e5Sespie@code{ELSE IF} and @code{ELSE} statements 1779*c87b03e5Sespiewithin an inner block. 1780*c87b03e5Sespie 1781*c87b03e5SespieThe FFE takes the latter approach. 1782*c87b03e5Sespie 1783*c87b03e5Sespie@node Transforming SELECT CASE 1784*c87b03e5Sespie@subsection Transforming SELECT CASE 1785*c87b03e5Sespie 1786*c87b03e5Sespie@code{SELECT CASE} poses a few interesting problems for code generation, 1787*c87b03e5Sespieif efficiency and frugal stack management are important. 1788*c87b03e5Sespie 1789*c87b03e5SespieConsider @samp{SELECT CASE (I('PREFIX'//A))}, 1790*c87b03e5Sespiewhere @samp{A} is @code{CHARACTER*(*)}. 1791*c87b03e5SespieIn a case like this---basically, 1792*c87b03e5Sespiein any case where largish temporaries are needed 1793*c87b03e5Sespieto evaluate the expression---those temporaries should 1794*c87b03e5Sespienot be ``live'' during execution of any of the @code{CASE} blocks. 1795*c87b03e5Sespie 1796*c87b03e5SespieSo, evaluation of the expression is best done within its own block, 1797*c87b03e5Sespiewhich in turn is within the @code{SELECT CASE} block itself 1798*c87b03e5Sespie(which contains the code for the CASE blocks as well, 1799*c87b03e5Sespiethough each within their own block). 1800*c87b03e5Sespie 1801*c87b03e5SespieOtherwise, we'd have the rough equivalent of this pseudo-code: 1802*c87b03e5Sespie 1803*c87b03e5Sespie@smallexample 1804*c87b03e5Sespie@{ 1805*c87b03e5Sespie char temp[large]; 1806*c87b03e5Sespie 1807*c87b03e5Sespie libg77_catenate (temp, 'prefix', a); 1808*c87b03e5Sespie 1809*c87b03e5Sespie switch (i (temp)) 1810*c87b03e5Sespie @{ 1811*c87b03e5Sespie case 0: 1812*c87b03e5Sespie @dots{} 1813*c87b03e5Sespie @} 1814*c87b03e5Sespie@} 1815*c87b03e5Sespie@end smallexample 1816*c87b03e5Sespie 1817*c87b03e5SespieAnd that would leave temp[large] in scope during the CASE blocks 1818*c87b03e5Sespie(although a clever back end *could* see that it isn't referenced 1819*c87b03e5Sespiein them, and thus free that temp before executing the blocks). 1820*c87b03e5Sespie 1821*c87b03e5SespieSo this approach is used instead: 1822*c87b03e5Sespie 1823*c87b03e5Sespie@smallexample 1824*c87b03e5Sespie@{ 1825*c87b03e5Sespie int temp0; 1826*c87b03e5Sespie 1827*c87b03e5Sespie @{ 1828*c87b03e5Sespie char temp1[large]; 1829*c87b03e5Sespie 1830*c87b03e5Sespie libg77_catenate (temp1, 'prefix', a); 1831*c87b03e5Sespie temp0 = i (temp1); 1832*c87b03e5Sespie @} 1833*c87b03e5Sespie 1834*c87b03e5Sespie switch (temp0) 1835*c87b03e5Sespie @{ 1836*c87b03e5Sespie case 0: 1837*c87b03e5Sespie @dots{} 1838*c87b03e5Sespie @} 1839*c87b03e5Sespie@} 1840*c87b03e5Sespie@end smallexample 1841*c87b03e5Sespie 1842*c87b03e5SespieNote how @samp{temp1} goes out of scope before starting the switch, 1843*c87b03e5Sespiethus making it easy for a back end to free it. 1844*c87b03e5Sespie 1845*c87b03e5SespieThe problem @emph{that} solution has, however, 1846*c87b03e5Sespieis with @samp{SELECT CASE('prefix'//A)} 1847*c87b03e5Sespie(which is currently not supported). 1848*c87b03e5Sespie 1849*c87b03e5SespieUnless the GBEL is extended to support arbitrarily long character strings 1850*c87b03e5Sespiein its @code{case} facility, 1851*c87b03e5Sespiethe FFE has to implement @code{SELECT CASE} on @code{CHARACTER} 1852*c87b03e5Sespie(probably excepting @code{CHARACTER*1}) 1853*c87b03e5Sespieusing a cascade of 1854*c87b03e5Sespie@code{if}, @code{elseif}, @code{else}, and @code{endif} constructs 1855*c87b03e5Sespiein GBEL. 1856*c87b03e5Sespie 1857*c87b03e5SespieTo prevent the (potentially large) temporary, 1858*c87b03e5Sespieneeded to hold the selected expression itself (@samp{'prefix'//A}), 1859*c87b03e5Sespiefrom being in scope during execution of the @code{CASE} blocks, 1860*c87b03e5Sespietwo approaches are available: 1861*c87b03e5Sespie 1862*c87b03e5Sespie@itemize @bullet 1863*c87b03e5Sespie@item 1864*c87b03e5SespiePre-evaluate all the @code{CASE} tests, 1865*c87b03e5Sespieproducing an integer ordinal that is used, 1866*c87b03e5Sespiea la @samp{temp0} in the earlier example, 1867*c87b03e5Sespieas if @samp{SELECT CASE(temp0)} had been written. 1868*c87b03e5Sespie 1869*c87b03e5SespieEach corresponding @code{CASE} is replaced with @samp{CASE(@var{i})}, 1870*c87b03e5Sespiewhere @var{i} is the ordinal for that case, 1871*c87b03e5Sespiedetermined while, or before, 1872*c87b03e5Sespiegenerating the cascade of @code{if}-related constructs 1873*c87b03e5Sespieto cope with @code{CHARACTER} selection. 1874*c87b03e5Sespie 1875*c87b03e5Sespie@item 1876*c87b03e5SespieMake @samp{temp0} above just 1877*c87b03e5Sespielarge enough to hold the longest @code{CASE} string 1878*c87b03e5Sespiethat'll actually be compared against the expression 1879*c87b03e5Sespie(in this case, @samp{'prefix'//A}). 1880*c87b03e5Sespie 1881*c87b03e5SespieSince that length must be constant 1882*c87b03e5Sespie(because @code{CASE} expressions are all constant), 1883*c87b03e5Sespieit won't be so large, 1884*c87b03e5Sespieand, further, @samp{temp1} need not be dynamically allocated, 1885*c87b03e5Sespiesince normal @code{CHARACTER} assignment can be used 1886*c87b03e5Sespieinto the fixed-length @samp{temp0}. 1887*c87b03e5Sespie@end itemize 1888*c87b03e5Sespie 1889*c87b03e5SespieBoth of these solutions require @code{SELECT CASE} implementation 1890*c87b03e5Sespieto be changed so all the corresponding @code{CASE} statements 1891*c87b03e5Sespieare seen during the actual code generation for @code{SELECT CASE}. 1892*c87b03e5Sespie 1893*c87b03e5Sespie@node Transforming Expressions 1894*c87b03e5Sespie@section Transforming Expressions 1895*c87b03e5Sespie 1896*c87b03e5SespieThe interactions between statements, expressions, and subexpressions 1897*c87b03e5Sespieat program run time can be viewed as: 1898*c87b03e5Sespie 1899*c87b03e5Sespie@smallexample 1900*c87b03e5Sespie@var{action}(@var{expr}) 1901*c87b03e5Sespie@end smallexample 1902*c87b03e5Sespie 1903*c87b03e5SespieHere, @var{action} is the series of steps 1904*c87b03e5Sespieperformed to effect the statement, 1905*c87b03e5Sespieand @var{expr} is the expression 1906*c87b03e5Sespiewhose value is used by @var{action}. 1907*c87b03e5Sespie 1908*c87b03e5SespieExpanding the above shows a typical order of events at run time: 1909*c87b03e5Sespie 1910*c87b03e5Sespie@smallexample 1911*c87b03e5SespieEvaluate @var{expr} 1912*c87b03e5SespiePerform @var{action}, using result of evaluation of @var{expr} 1913*c87b03e5SespieClean up after evaluating @var{expr} 1914*c87b03e5Sespie@end smallexample 1915*c87b03e5Sespie 1916*c87b03e5SespieSo, if evaluating @var{expr} requires allocating memory, 1917*c87b03e5Sespiethat memory can be freed before performing @var{action} 1918*c87b03e5Sespieonly if it is not needed to hold the result of evaluating @var{expr}. 1919*c87b03e5SespieOtherwise, it must be freed no sooner than 1920*c87b03e5Sespieafter @var{action} has been performed. 1921*c87b03e5Sespie 1922*c87b03e5SespieThe above are recursive definitions, 1923*c87b03e5Sespiein the sense that they apply to subexpressions of @var{expr}. 1924*c87b03e5Sespie 1925*c87b03e5SespieThat is, evaluating @var{expr} involves 1926*c87b03e5Sespieevaluating all of its subexpressions, 1927*c87b03e5Sespieperforming the @var{action} that computes the 1928*c87b03e5Sespieresult value of @var{expr}, 1929*c87b03e5Sespiethen cleaning up after evaluating those subexpressions. 1930*c87b03e5Sespie 1931*c87b03e5SespieThe recursive nature of this evaluation is implemented 1932*c87b03e5Sespievia recursive-descent transformation of the top-level statements, 1933*c87b03e5Sespietheir expressions, @emph{their} subexpressions, and so on. 1934*c87b03e5Sespie 1935*c87b03e5SespieHowever, that recursive-descent transformation is, 1936*c87b03e5Sespiedue to the nature of the GBEL, 1937*c87b03e5Sespiefocused primarily on generating a @emph{single} stream of code 1938*c87b03e5Sespieto be executed at run time. 1939*c87b03e5Sespie 1940*c87b03e5SespieYet, from the above, it's clear that multiple streams of code 1941*c87b03e5Sespiemust effectively be simultaneously generated 1942*c87b03e5Sespieduring the recursive-descent analysis of statements. 1943*c87b03e5Sespie 1944*c87b03e5SespieThe primary stream implements the primary @var{action} items, 1945*c87b03e5Sespiewhile at least two other streams implement 1946*c87b03e5Sespiethe evaluation and clean-up items. 1947*c87b03e5Sespie 1948*c87b03e5SespieRequirements imposed by expressions include: 1949*c87b03e5Sespie 1950*c87b03e5Sespie@itemize @bullet 1951*c87b03e5Sespie@item 1952*c87b03e5SespieWhether the caller needs to have a temporary ready 1953*c87b03e5Sespieto hold the value of the expression. 1954*c87b03e5Sespie 1955*c87b03e5Sespie@item 1956*c87b03e5SespieOther stuff??? 1957*c87b03e5Sespie@end itemize 1958*c87b03e5Sespie 1959*c87b03e5Sespie@node Internal Naming Conventions 1960*c87b03e5Sespie@section Internal Naming Conventions 1961*c87b03e5Sespie 1962*c87b03e5SespieNames exported by FFE modules have the following (regular-expression) forms. 1963*c87b03e5SespieNote that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}}, 1964*c87b03e5Sespiewhere @var{mod} is lowercase or uppercase alphanumerics, respectively, 1965*c87b03e5Sespieare exported by the module @code{ffe@var{mod}}, 1966*c87b03e5Sespiewith the source code doing the exporting in @file{@var{mod}.h}. 1967*c87b03e5Sespie(Usually, the source code for the implementation is in @file{@var{mod}.c}.) 1968*c87b03e5Sespie 1969*c87b03e5SespieIdentifiers that don't fit the following forms 1970*c87b03e5Sespieare not considered exported, 1971*c87b03e5Sespieeven if they are according to the C language. 1972*c87b03e5Sespie(For example, they might be made available to other modules 1973*c87b03e5Sespiesolely for use within expansions of exported macros, 1974*c87b03e5Sespienot for use within any source code in those other modules.) 1975*c87b03e5Sespie 1976*c87b03e5Sespie@table @code 1977*c87b03e5Sespie@item ffe@var{mod} 1978*c87b03e5SespieThe single typedef exported by the module. 1979*c87b03e5Sespie 1980*c87b03e5Sespie@item FFE@var{umod}_[A-Z][A-Z0-9_]* 1981*c87b03e5Sespie(Where @var{umod} is the uppercase for of @var{mod}.) 1982*c87b03e5Sespie 1983*c87b03e5SespieA @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}. 1984*c87b03e5Sespie 1985*c87b03e5Sespie@item ffe@var{mod}[A-Z][A-Z][a-z0-9]* 1986*c87b03e5SespieA typedef exported by the module. 1987*c87b03e5Sespie 1988*c87b03e5SespieThe portion of the identifier after @code{ffe@var{mod}} is 1989*c87b03e5Sespiereferred to as @code{ctype}, a capitalized (mixed-case) form 1990*c87b03e5Sespieof @code{type}. 1991*c87b03e5Sespie 1992*c87b03e5Sespie@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]? 1993*c87b03e5Sespie(Where @var{umod} is the uppercase for of @var{mod}.) 1994*c87b03e5Sespie 1995*c87b03e5SespieA @code{#define} or @code{enum} constant of the type 1996*c87b03e5Sespie@code{ffe@var{mod}@var{type}}, 1997*c87b03e5Sespiewhere @var{type} is the lowercase form of @var{ctype} 1998*c87b03e5Sespiein an exported typedef. 1999*c87b03e5Sespie 2000*c87b03e5Sespie@item ffe@var{mod}_@var{value} 2001*c87b03e5SespieA function that does or returns something, 2002*c87b03e5Sespieas described by @var{value} (see below). 2003*c87b03e5Sespie 2004*c87b03e5Sespie@item ffe@var{mod}_@var{value}_@var{input} 2005*c87b03e5SespieA function that does or returns something based 2006*c87b03e5Sespieprimarily on the thing described by @var{input} (see below). 2007*c87b03e5Sespie@end table 2008*c87b03e5Sespie 2009*c87b03e5SespieBelow are names used for @var{value} and @var{input}, 2010*c87b03e5Sespiealong with their definitions. 2011*c87b03e5Sespie 2012*c87b03e5Sespie@table @code 2013*c87b03e5Sespie@item col 2014*c87b03e5SespieA column number within a line (first column is number 1). 2015*c87b03e5Sespie 2016*c87b03e5Sespie@item file 2017*c87b03e5SespieAn encapsulation of a file's name. 2018*c87b03e5Sespie 2019*c87b03e5Sespie@item find 2020*c87b03e5SespieLooks up an instance of some type that matches specified criteria, 2021*c87b03e5Sespieand returns that, even if it has to create a new instance or 2022*c87b03e5Sespiecrash trying to find it (as appropriate). 2023*c87b03e5Sespie 2024*c87b03e5Sespie@item initialize 2025*c87b03e5SespieInitializes, usually a module. No type. 2026*c87b03e5Sespie 2027*c87b03e5Sespie@item int 2028*c87b03e5SespieA generic integer of type @code{int}. 2029*c87b03e5Sespie 2030*c87b03e5Sespie@item is 2031*c87b03e5SespieA generic integer that contains a true (nonzero) or false (zero) value. 2032*c87b03e5Sespie 2033*c87b03e5Sespie@item len 2034*c87b03e5SespieA generic integer that contains the length of something. 2035*c87b03e5Sespie 2036*c87b03e5Sespie@item line 2037*c87b03e5SespieA line number within a source file, 2038*c87b03e5Sespieor a global line number. 2039*c87b03e5Sespie 2040*c87b03e5Sespie@item lookup 2041*c87b03e5SespieLooks up an instance of some type that matches specified criteria, 2042*c87b03e5Sespieand returns that, or returns nil. 2043*c87b03e5Sespie 2044*c87b03e5Sespie@item name 2045*c87b03e5SespieA @code{text} that points to a name of something. 2046*c87b03e5Sespie 2047*c87b03e5Sespie@item new 2048*c87b03e5SespieMakes a new instance of the indicated type. 2049*c87b03e5SespieMight return an existing one if appropriate---if so, 2050*c87b03e5Sespiesimilar to @code{find} without crashing. 2051*c87b03e5Sespie 2052*c87b03e5Sespie@item pt 2053*c87b03e5SespiePointer to a particular character (line, column pairs) 2054*c87b03e5Sespiein the input file (source code being compiled). 2055*c87b03e5Sespie 2056*c87b03e5Sespie@item run 2057*c87b03e5SespiePerforms some herculean task. No type. 2058*c87b03e5Sespie 2059*c87b03e5Sespie@item terminate 2060*c87b03e5SespieTerminates, usually a module. No type. 2061*c87b03e5Sespie 2062*c87b03e5Sespie@item text 2063*c87b03e5SespieA @code{char *} that points to generic text. 2064*c87b03e5Sespie@end table 2065