xref: /openbsd/gnu/usr.bin/gcc/gcc/f/ffe.texi (revision c87b03e5)
1@c Copyright (C) 1999 Free Software Foundation, Inc.
2@c This is part of the G77 manual.
3@c For copying conditions, see the file g77.texi.
4
5@node Front End
6@chapter Front End
7@cindex GNU Fortran Front End (FFE)
8@cindex FFE
9@cindex @code{g77}, front end
10@cindex front end, @code{g77}
11
12This chapter describes some aspects of the design and implementation
13of the @code{g77} front end.
14
15To find about things that are ``To Be Determined'' or ``To Be Done'',
16search for the string TBD.
17If you want to help by working on one or more of these items,
18email @email{gcc@@gcc.gnu.org}.
19If you're planning to do more than just research issues and offer comments,
20see @uref{http://gcc.gnu.org/contribute.html} for steps you might
21need to take first.
22
23@menu
24* Overview of Sources::
25* Overview of Translation Process::
26* Philosophy of Code Generation::
27* Two-pass Design::
28* Challenges Posed::
29* Transforming Statements::
30* Transforming Expressions::
31* Internal Naming Conventions::
32@end menu
33
34@node Overview of Sources
35@section Overview of Sources
36
37The current directory layout includes the following:
38
39@table @file
40@item @value{srcdir}/gcc/
41Non-g77 files in gcc
42
43@item @value{srcdir}/gcc/f/
44GNU Fortran front end sources
45
46@item @value{srcdir}/libf2c/
47@code{libg2c} configuration and @code{g2c.h} file generation
48
49@item @value{srcdir}/libf2c/libF77/
50General support and math portion of @code{libg2c}
51
52@item @value{srcdir}/libf2c/libI77/
53I/O portion of @code{libg2c}
54
55@item @value{srcdir}/libf2c/libU77/
56Additional interfaces to Unix @code{libc} for @code{libg2c}
57@end table
58
59Components of note in @code{g77} are described below.
60
61@file{f/} as a whole contains the source for @code{g77},
62while @file{libf2c/} contains a portion of the separate program
63@code{f2c}.
64Note that the @code{libf2c} code is not part of the program @code{g77},
65just distributed with it.
66
67@file{f/} contains text files that document the Fortran compiler, source
68files for the GNU Fortran Front End (FFE), and some other stuff.
69The @code{g77} compiler code is placed in @file{f/} because it,
70along with its contents,
71is designed to be a subdirectory of a @code{gcc} source directory,
72@file{gcc/},
73which is structured so that language-specific front ends can be ``dropped
74in'' as subdirectories.
75The C++ front end (@code{g++}), is an example of this---it resides in
76the @file{cp/} subdirectory.
77Note that the C front end (also referred to as @code{gcc})
78is an exception to this, as its source files reside
79in the @file{gcc/} directory itself.
80
81@file{libf2c/} contains the run-time libraries for the @code{f2c} program,
82also used by @code{g77}.
83These libraries normally referred to collectively as @code{libf2c}.
84When built as part of @code{g77},
85@code{libf2c} is installed under the name @code{libg2c} to avoid
86conflict with any existing version of @code{libf2c},
87and thus is often referred to as @code{libg2c} when the
88@code{g77} version is specifically being referred to.
89
90The @code{netlib} version of @code{libf2c/}
91contains two distinct libraries,
92@code{libF77} and @code{libI77},
93each in their own subdirectories.
94In @code{g77}, this distinction is not made,
95beyond maintaining the subdirectory structure in the source-code tree.
96
97@file{libf2c/} is not part of the program @code{g77},
98just distributed with it.
99It contains files not present
100in the official (@code{netlib}) version of @code{libf2c},
101and also contains some minor changes made from @code{libf2c},
102to fix some bugs,
103and to facilitate automatic configuration, building, and installation of
104@code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
105See @file{libf2c/README} for more information,
106including licensing conditions
107governing distribution of programs containing code from @code{libg2c}.
108
109@code{libg2c}, @code{g77}'s version of @code{libf2c},
110adds Dave Love's implementation of @code{libU77},
111in the @file{libf2c/libU77/} directory.
112This library is distributed under the
113GNU Library General Public License (LGPL)---see the
114file @file{libf2c/libU77/COPYING.LIB}
115for more information,
116as this license
117governs distribution conditions for programs containing code
118from this portion of the library.
119
120Files of note in @file{f/} and @file{libf2c/} are described below:
121
122@table @file
123@item f/BUGS
124Lists some important bugs known to be in g77.
125Or use Info (or GNU Emacs Info mode) to read
126the ``Actual Bugs'' node of the @code{g77} documentation:
127
128@smallexample
129info -f f/g77.info -n "Actual Bugs"
130@end smallexample
131
132@item f/ChangeLog
133Lists recent changes to @code{g77} internals.
134
135@item libf2c/ChangeLog
136Lists recent changes to @code{libg2c} internals.
137
138@item f/NEWS
139Contains the per-release changes.
140These include the user-visible
141changes described in the node ``Changes''
142in the @code{g77} documentation, plus internal
143changes of import.
144Or use:
145
146@smallexample
147info -f f/g77.info -n News
148@end smallexample
149
150@item f/g77.info*
151The @code{g77} documentation, in Info format,
152produced by building @code{g77}.
153
154All users of @code{g77} (not just installers) should read this,
155using the @code{more} command if neither the @code{info} command,
156nor GNU Emacs (with its Info mode), are available, or if users
157aren't yet accustomed to using these tools.
158All of these files are readable as ``plain text'' files,
159though they're easier to navigate using Info readers
160such as @code{info} and GNU Emacs Info mode.
161@end table
162
163If you want to explore the FFE code, which lives entirely in @file{f/},
164here are a few clues.
165The file @file{g77spec.c} contains the @code{g77}-specific source code
166for the @code{g77} command only---this just forms a variant of the
167@code{gcc} command, so,
168just as the @code{gcc} command itself does not contain the C front end,
169the @code{g77} command does not contain the Fortran front end (FFE).
170The FFE code ends up in an executable named @file{f771},
171which does the actual compiling,
172so it contains the FFE plus the @code{gcc} back end (GBE),
173the latter to do most of the optimization, and the code generation.
174
175The file @file{parse.c} is the source file for @code{yyparse()},
176which is invoked by the GBE to start the compilation process,
177for @file{f771}.
178
179The file @file{top.c} contains the top-level FFE function @code{ffe_file}
180and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
181and @samp{FFE_[A-Za-z].*} symbols.
182
183The file @file{fini.c} is a @code{main()} program that is used when building
184the FFE to generate C header and source files for recognizing keywords.
185The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
186that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
187@samp{MALLOC_[A-Za-z].*} symbols.
188
189All other modules named @var{xyz}
190are comprised of all files named @samp{@var{xyz}*.@var{ext}}
191and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
192and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
193If you understand all this, congratulations---it's easier for me to remember
194how it works than to type in these regular expressions.
195But it does make it easy to find where a symbol is defined.
196For example, the symbol @samp{ffexyz_set_something} would be defined
197in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
198
199The ``porting'' files of note currently are:
200
201@table @file
202@item proj.c
203@itemx proj.h
204This defines the ``language'' used by all the other source files,
205the language being Standard C plus some useful things
206like @code{ARRAY_SIZE} and such.
207
208@item target.c
209@itemx target.h
210These describe the target machine
211in terms of what data types are supported,
212how they are denoted
213(to what C type does an @code{INTEGER*8} map, for example),
214how to convert between them,
215and so on.
216Over time, versions of @code{g77} rely less on this file
217and more on run-time configuration based on GBE info
218in @file{com.c}.
219
220@item com.c
221@itemx com.h
222These are the primary interface to the GBE.
223
224@item ste.c
225@itemx ste.h
226This contains code for implementing recognized executable statements
227in the GBE.
228
229@item src.c
230@itemx src.h
231These contain information on the format(s) of source files
232(such as whether they are never to be processed as case-insensitive
233with regard to Fortran keywords).
234@end table
235
236If you want to debug the @file{f771} executable,
237for example if it crashes,
238note that the global variables @code{lineno} and @code{input_filename}
239are usually set to reflect the current line being read by the lexer
240during the first-pass analysis of a program unit and to reflect
241the current line being processed during the second-pass compilation
242of a program unit.
243
244If an invocation of the function @code{ffestd_exec_end} is on the stack,
245the compiler is in the second pass, otherwise it is in the first.
246
247(This information might help you reduce a test case and/or work around
248a bug in @code{g77} until a fix is available.)
249
250@node Overview of Translation Process
251@section Overview of Translation Process
252
253The order of phases translating source code to the form accepted
254by the GBE is:
255
256@enumerate
257@item
258Stripping punched-card sources (@file{g77stripcard.c})
259
260@item
261Lexing (@file{lex.c})
262
263@item
264Stand-alone statement identification (@file{sta.c})
265
266@item
267INCLUDE handling (@file{sti.c})
268
269@item
270Order-dependent statement identification (@file{stq.c})
271
272@item
273Parsing (@file{stb.c} and @file{expr.c})
274
275@item
276Constructing (@file{stc.c})
277
278@item
279Collecting (@file{std.c})
280
281@item
282Expanding (@file{ste.c})
283@end enumerate
284
285To get a rough idea of how a particularly twisted Fortran statement
286gets treated by the passes, consider:
287
288@smallexample
289      FORMAT(I2 4H)=(J/
290     &   I3)
291@end smallexample
292
293The job of @file{lex.c} is to know enough about Fortran syntax rules
294to break the statement up into distinct lexemes without requiring
295any feedback from subsequent phases:
296
297@smallexample
298`FORMAT'
299`('
300`I24H'
301`)'
302`='
303`('
304`J'
305`/'
306`I3'
307`)'
308@end smallexample
309
310The job of @file{sta.c} is to figure out the kind of statement,
311or, at least, statement form, that sequence of lexemes represent.
312
313The sooner it can do this (in terms of using the smallest number of
314lexemes, starting with the first for each statement), the better,
315because that leaves diagnostics for problems beyond the recognition
316of the statement form to subsequent phases,
317which can usually better describe the nature of the problem.
318
319In this case, the @samp{=} at ``level zero''
320(not nested within parentheses)
321tells @file{sta.c} that this is an @emph{assignment-form},
322not @code{FORMAT}, statement.
323
324An assignment-form statement might be a statement-function
325definition or an executable assignment statement.
326
327To make that determination,
328@file{sta.c} looks at the first two lexemes.
329
330Since the second lexeme is @samp{(},
331the first must represent an array for this to be an assignment statement,
332else it's a statement function.
333
334Either way, @file{sta.c} hands off the statement to @file{stq.c}
335(via @file{sti.c}, which expands INCLUDE files).
336@file{stq.c} figures out what a statement that is,
337on its own, ambiguous, must actually be based on the context
338established by previous statements.
339
340So, @file{stq.c} watches the statement stream for executable statements,
341END statements, and so on, so it knows whether @samp{A(B)=C} is
342(intended as) a statement-function definition or an assignment statement.
343
344After establishing the context-aware statement info, @file{stq.c}
345passes the original sample statement on to @file{stb.c}
346(either its statement-function parser or its assignment-statement parser).
347
348@file{stb.c} forms a
349statement-specific record containing the pertinent information.
350That information includes a source expression and,
351for an assignment statement, a destination expression.
352Expressions are parsed by @file{expr.c}.
353
354This record is passed to @file{stc.c},
355which copes with the implications of the statement
356within the context established by previous statements.
357
358For example, if it's the first statement in the file
359or after an @code{END} statement,
360@file{stc.c} recognizes that, first of all,
361a main program unit is now being lexed
362(and tells that to @file{std.c}
363before telling it about the current statement).
364
365@file{stc.c} attaches whatever information it can,
366usually derived from the context established by the preceding statements,
367and passes the information to @file{std.c}.
368
369@file{std.c} saves this information away,
370since the GBE cannot cope with information
371that might be incomplete at this stage.
372
373For example, @samp{I3} might later be determined
374to be an argument to an alternate @code{ENTRY} point.
375
376When @file{std.c} is told about the end of an external (top-level)
377program unit,
378it passes all the information it has saved away
379on statements in that program unit
380to @file{ste.c}.
381
382@file{ste.c} ``expands'' each statement, in sequence, by
383constructing the appropriate GBE information and calling
384the appropriate GBE routines.
385
386Details on the transformational phases follow.
387Keep in mind that Fortran numbering is used,
388so the first character on a line is column 1,
389decimal numbering is used, and so on.
390
391@menu
392* g77stripcard::
393* lex.c::
394* sta.c::
395* sti.c::
396* stq.c::
397* stb.c::
398* expr.c::
399* stc.c::
400* std.c::
401* ste.c::
402
403* Gotchas (Transforming)::
404* TBD (Transforming)::
405@end menu
406
407@node g77stripcard
408@subsection g77stripcard
409
410The @code{g77stripcard} program handles removing content beyond
411column 72 (adjustable via a command-line option),
412optionally warning about that content being something other
413than trailing whitespace or Fortran commentary.
414
415This program is needed because @code{lex.c} doesn't pay attention
416to maximum line lengths at all, to make it easier to maintain,
417as well as faster (for sources that don't depend on the maximum
418column length vis-a-vis trailing non-blank non-commentary content).
419
420Just how this program will be run---whether automatically for
421old source (perhaps as the default for @file{.f} files?)---is not
422yet determined.
423
424In the meantime, it might as well be implemented as a typical UNIX pipe.
425
426It should accept a @samp{-fline-length-@var{n}} option,
427with the default line length set to 72.
428
429When the text it strips off the end of a line is not blank
430(not spaces and tabs),
431it should insert an additional comment line
432(beginning with @samp{!},
433so it works for both fixed-form and free-form files)
434containing the text,
435following the stripped line.
436The inserted comment should have a prefix of some kind,
437TBD, that distinguishes the comment as representing stripped text.
438Users could use that to @code{sed} out such lines, if they wished---it
439seems silly to provide a command-line option to delete information
440when it can be so easily filtered out by another program.
441
442(This inserted comment should be designed to ``fit in'' well
443with whatever the Fortran community is using these days for
444preprocessor, translator, and other such products, like OpenMP.
445What that's all about, and how @code{g77} can elegantly fit its
446special comment conventions into it all, is TBD as well.
447We don't want to reinvent the wheel here, but if there turn out
448to be too many conflicting conventions, we might have to invent
449one that looks nothing like the others, but which offers their
450host products a better infrastructure in which to fit and coexist
451peacefully.)
452
453@code{g77stripcard} probably shouldn't do any tab expansion or other
454fancy stuff.
455People can use @code{expand} or other pre-filtering if they like.
456The idea here is to keep each stage quite simple, while providing
457excellent performance for ``normal'' code.
458
459(Code with junk beyond column 73 is not really ``normal'',
460as it comes from a card-punch heritage,
461and will be increasingly hard for tomorrow's Fortran programmers to read.)
462
463@node lex.c
464@subsection lex.c
465
466To help make the lexer simple, fast, and easy to maintain,
467while also having @code{g77} generally encourage Fortran programmers
468to write simple, maintainable, portable code by maximizing the
469performance of compiling that kind of code:
470
471@itemize @bullet
472@item
473There'll be just one lexer, for both fixed-form and free-form source.
474
475@item
476It'll care about the form only when handling the first 7 columns of
477text, stuff like spaces between strings of alphanumerics, and
478how lines are continued.
479
480Some other distinctions will be handled by subsequent phases,
481so at least one of them will have to know which form is involved.
482
483For example, @samp{I = 2 . 4} is acceptable in fixed form,
484and works in free form as well given the implementation @code{g77}
485presently uses.
486But the standard requires a diagnostic for it in free form,
487so the parser has to be able to recognize that
488the lexemes aren't contiguous
489(information the lexer @emph{does} have to provide)
490and that free-form source is being parsed,
491so it can provide the diagnostic.
492
493The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.
494Otherwise, it'd have to know a whole lot more about how to parse Fortran,
495or subsequent phases (mainly parsing) would have two paths through
496lots of critical code---one to handle the lexeme @samp{2}, @samp{.},
497and @samp{4} in sequence, another to handle the lexeme @samp{2.4}.
498
499@item
500It won't worry about line lengths
501(beyond the first 7 columns for fixed-form source).
502
503That is, once it starts parsing the ``statement'' part of a line
504(column 7 for fixed-form, column 1 for free-form),
505it'll keep going until it finds a newline,
506rather than ignoring everything past a particular column
507(72 or 132).
508
509The implication here is that there shouldn't @emph{be}
510anything past that last column, other than whitespace or
511commentary, because users using typical editors
512(or viewing output as typically printed)
513won't necessarily know just where the last column is.
514
515Code that has ``garbage'' beyond the last column
516(almost certainly only fixed-form code with a punched-card legacy,
517such as code using columns 73-80 for ``sequence numbers'')
518will have to be run through @code{g77stripcard} first.
519
520Also, keeping track of the maximum column position while also watching out
521for the end of a line @emph{and} while reading from a file
522just makes things slower.
523Since a file must be read, and watching for the end of the line
524is necessary (unless the typical input file was preprocessed to
525include the necessary number of trailing spaces),
526dropping the tracking of the maximum column position
527is the only way to reduce the complexity of the pertinent code
528while maintaining high performance.
529
530@item
531ASCII encoding is assumed for the input file.
532
533Code written in other character sets will have to be converted first.
534
535@item
536Tabs (ASCII code 9)
537will be converted to spaces via the straightforward
538approach.
539
540Specifically, a tab is converted to between one and eight spaces
541as necessary to reach column @var{n},
542where dividing @samp{(@var{n} - 1)} by eight
543results in a remainder of zero.
544
545That saves having to pass most source files through @code{expand}.
546
547@item
548Linefeeds (ASCII code 10)
549mark the ends of lines.
550
551@item
552A carriage return (ASCII code 13)
553is accept if it immediately precedes a linefeed,
554in which case it is ignored.
555
556Otherwise, it is rejected (with a diagnostic).
557
558@item
559Any other characters other than the above
560that are not part of the GNU Fortran Character Set
561(@pxref{Character Set})
562are rejected with a diagnostic.
563
564This includes backspaces, form feeds, and the like.
565
566(It might make sense to allow a form feed in column 1
567as long as that's the only character on a line.
568It certainly wouldn't seem to cost much in terms of performance.)
569
570@item
571The end of the input stream (EOF)
572ends the current line.
573
574@item
575The distinction between uppercase and lowercase letters
576will be preserved.
577
578It will be up to subsequent phases to decide to fold case.
579
580Current plans are to permit any casing for Fortran (reserved) keywords
581while preserving casing for user-defined names.
582(This might not be made the default for @file{.f} files, though.)
583
584Preserving case seems necessary to provide more direct access
585to facilities outside of @code{g77}, such as to C or Pascal code.
586
587Names of intrinsics will probably be matchable in any case,
588
589(How @samp{external SiN; r = sin(x)} would be handled is TBD.
590I think old @code{g77} might already handle that pretty elegantly,
591but whether we can cope with allowing the same fragment to reference
592a @emph{different} procedure, even with the same interface,
593via @samp{s = SiN(r)}, needs to be determined.
594If it can't, we need to make sure that when code introduces
595a user-defined name, any intrinsic matching that name
596using a case-insensitive comparison
597is ``turned off''.)
598
599@item
600Backslashes in @code{CHARACTER} and Hollerith constants
601are not allowed.
602
603This avoids the confusion introduced by some Fortran compiler vendors
604providing C-like interpretation of backslashes,
605while others provide straight-through interpretation.
606
607Some kind of lexical construct (TBD) will be provided to allow
608flagging of a @code{CHARACTER}
609(but probably not a Hollerith)
610constant that permits backslashes.
611It'll necessarily be a prefix, such as:
612
613@smallexample
614PRINT *, C'This line has a backspace \b here.'
615PRINT *, F'This line has a straight backslash \ here.'
616@end smallexample
617
618Further, command-line options might be provided to specify that
619one prefix or the other is to be assumed as the default
620for @code{CHARACTER} constants.
621
622However, it seems more helpful for @code{g77} to provide a program
623that converts prefix all constants
624(or just those containing backslashes)
625with the desired designation,
626so printouts of code can be read
627without knowing the compile-time options used when compiling it.
628
629If such a program is provided
630(let's name it @code{g77slash} for now),
631then a command-line option to @code{g77} should not be provided.
632(Though, given that it'll be easy to implement, it might be hard
633to resist user requests for it ``to compile faster than if we
634have to invoke another filter''.)
635
636This program would take a command-line option to specify the
637default interpretation of slashes,
638affecting which prefix it uses for constants.
639
640@code{g77slash} probably should automatically convert Hollerith
641constants that contain slashes
642to the appropriate @code{CHARACTER} constants.
643Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
644constants specifying whether they want C-style or straight-through
645backslashes.
646
647@item
648To allow for form-neutral INCLUDE files without requiring them
649to be preprocessed,
650the fixed-form lexer should offer an extension (if possible)
651allowing a trailing @samp{&} to be ignored, especially if after
652column 72, as it would be using the traditional Unix Fortran source
653model (which ignores @emph{everything} after column 72).
654@end itemize
655
656The above implements nearly exactly what is specified by
657@ref{Character Set},
658and
659@ref{Lines},
660except it also provides automatic conversion of tabs
661and ignoring of newline-related carriage returns,
662as well as accommodating form-neutral INCLUDE files.
663
664It also implements the ``pure visual'' model,
665by which is meant that a user viewing his code
666in a typical text editor
667(assuming it's not preprocessed via @code{g77stripcard} or similar)
668doesn't need any special knowledge
669of whether spaces on the screen are really tabs,
670whether lines end immediately after the last visible non-space character
671or after a number of spaces and tabs that follow it,
672or whether the last line in the file is ended by a newline.
673
674Most editors don't make these distinctions,
675the ANSI FORTRAN 77 standard doesn't require them to,
676and it permits a standard-conforming compiler
677to define a method for transforming source code to
678``standard form'' however it wants.
679
680So, GNU Fortran defines it such that users have the best chance
681of having the code be interpreted the way it looks on the screen
682of the typical editor.
683
684(Fancy editors should @emph{never} be required to correctly read code
685written in classic two-dimensional-plaintext form.
686By correct reading I mean ability to read it, book-like, without
687mistaking text ignored by the compiler for program code and vice versa,
688and without having to count beyond the first several columns.
689The vague meaning of ASCII TAB, among other things, complicates
690this somewhat, but as long as ``everyone'', including the editor,
691other tools, and printer, agrees about the every-eighth-column convention,
692the GNU Fortran ``pure visual'' model meets these requirements.
693Any language or user-visible source form
694requiring special tagging of tabs,
695the ends of lines after spaces/tabs,
696and so on, fails to meet this fairly straightforward specification.
697Fortunately, Fortran @emph{itself} does not mandate such a failure,
698though most vendor-supplied defaults for their Fortran compilers @emph{do}
699fail to meet this specification for readability.)
700
701Further, this model provides a clean interface
702to whatever preprocessors or code-generators are used
703to produce input to this phase of @code{g77}.
704Mainly, they need not worry about long lines.
705
706@node sta.c
707@subsection sta.c
708
709@node sti.c
710@subsection sti.c
711
712@node stq.c
713@subsection stq.c
714
715@node stb.c
716@subsection stb.c
717
718@node expr.c
719@subsection expr.c
720
721@node stc.c
722@subsection stc.c
723
724@node std.c
725@subsection std.c
726
727@node ste.c
728@subsection ste.c
729
730@node Gotchas (Transforming)
731@subsection Gotchas (Transforming)
732
733This section is not about transforming ``gotchas'' into something else.
734It is about the weirder aspects of transforming Fortran,
735however that's defined,
736into a more modern, canonical form.
737
738@subsubsection Multi-character Lexemes
739
740Each lexeme carries with it a pointer to where it appears in the source.
741
742To provide the ability for diagnostics to point to column numbers,
743in addition to line numbers and names,
744lexemes that represent more than one (significant) character
745in the source code need, generally,
746to provide pointers to where each @emph{character} appears in the source.
747
748This provides the ability to properly identify the precise location
749of the problem in code like
750
751@smallexample
752SUBROUTINE X
753END
754BLOCK DATA X
755END
756@end smallexample
757
758which, in fixed-form source, would result in single lexemes
759consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.
760(The problem is that @samp{X} is defined twice,
761so a pointer to the @samp{X} in the second definition,
762as well as a follow-up pointer to the corresponding pointer in the first,
763would be preferable to pointing to the beginnings of the statements.)
764
765This need also arises when parsing (and diagnosing) @code{FORMAT}
766statements.
767
768Further, it arises when diagnosing
769@code{FMT=} specifiers that contain constants
770(or partial constants, or even propagated constants!)
771in I/O statements, as in:
772
773@smallexample
774PRINT '(I2, 3HAB)', J
775@end smallexample
776
777(A pointer to the beginning of the prematurely-terminated Hollerith
778constant, and/or to the close parenthese, is preferable to a pointer
779to the open-parenthese or the apostrophe that precedes it.)
780
781Multi-character lexemes, which would seem to naturally include
782at least digit strings, alphanumeric strings, @code{CHARACTER}
783constants, and Hollerith constants, therefore need to provide
784location information on each character.
785(Maybe Hollerith constants don't, but it's unnecessary to except them.)
786
787The question then arises, what about @emph{other} multi-character lexemes,
788such as @samp{**} and @samp{//},
789and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?
790
791Turns out there's a need to identify the location of the second character
792of these two-character lexemes.
793For example, in @samp{I(/J) = K}, the slash needs to be diagnosed
794as the problem, not the open parenthese.
795Similarly, it is preferable to diagnose the second slash in
796@samp{I = J // K} rather than the first, given the implicit typing
797rules, which would result in the compiler disallowing the attempted
798concatenation of two integers.
799(Though, since that's more of a semantic issue,
800it's not @emph{that} much preferable.)
801
802Even sequences that could be parsed as digit strings could use location info,
803for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.
804(This probably will be parsed as a character string,
805to be consistent with the parsing of @samp{Z'129A'}.)
806
807To avoid the hassle of recording the location of the second character,
808while also preserving the general rule that each significant character
809is distinctly pointed to by the lexeme that contains it,
810it's best to simply not have any fixed-size lexemes
811larger than one character.
812
813This new design is expected to make checking for two
814@samp{*} lexemes in a row much easier than the old design,
815so this is not much of a sacrifice.
816It probably makes the lexer much easier to implement
817than it makes the parser harder.
818
819@subsubsection Space-padding Lexemes
820
821Certain lexemes need to be padded with virtual spaces when the
822end of the line (or file) is encountered.
823
824This is necessary in fixed form, to handle lines that don't
825extend to column 72, assuming that's the line length in effect.
826
827@subsubsection Bizarre Free-form Hollerith Constants
828
829Last I checked, the Fortran 90 standard actually required the compiler
830to silently accept something like
831
832@smallexample
833FORMAT ( 1 2   Htwelve chars )
834@end smallexample
835
836as a valid @code{FORMAT} statement specifying a twelve-character
837Hollerith constant.
838
839The implication here is that, since the new lexer is a zero-feedback one,
840it won't know that the special case of a @code{FORMAT} statement being parsed
841requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as
842a single lexeme.
843
844(This is a horrible misfeature of the Fortran 90 language.
845It's one of many such misfeatures that almost make me want
846to not support them, and forge ahead with designing a new
847``GNU Fortran'' language that has the features,
848but not the misfeatures, of Fortran 90,
849and provide utility programs to do the conversion automatically.)
850
851So, the lexer must gather distinct chunks of decimal strings into
852a single lexeme in contexts where a single decimal lexeme might
853start a Hollerith constant.
854
855(Which probably means it might as well do that all the time
856for all multi-character lexemes, even in free-form mode,
857leaving it to subsequent phases to pull them apart as they see fit.)
858
859Compare the treatment of this to how
860
861@smallexample
862CHARACTER * 4 5 HEY
863@end smallexample
864
865and
866
867@smallexample
868CHARACTER * 12 HEY
869@end smallexample
870
871must be treated---the former must be diagnosed, due to the separation
872between lexemes, the latter must be accepted as a proper declaration.
873
874@subsubsection Hollerith Constants
875
876Recognizing a Hollerith constant---specifically,
877that an @samp{H} or @samp{h} after a digit string begins
878such a constant---requires some knowledge of context.
879
880Hollerith constants (such as @samp{2HAB}) can appear after:
881
882@itemize @bullet
883@item
884@samp{(}
885
886@item
887@samp{,}
888
889@item
890@samp{=}
891
892@item
893@samp{+}, @samp{-}, @samp{/}
894
895@item
896@samp{*}, except as noted below
897@end itemize
898
899Hollerith constants don't appear after:
900
901@itemize @bullet
902@item
903@samp{CHARACTER*},
904which can be treated generally as
905any @samp{*} that is the second lexeme of a statement
906@end itemize
907
908@subsubsection Confusing Function Keyword
909
910While
911
912@smallexample
913REAL FUNCTION FOO ()
914@end smallexample
915
916must be a @code{FUNCTION} statement and
917
918@smallexample
919REAL FUNCTION FOO (5)
920@end smallexample
921
922must be a type-definition statement,
923
924@smallexample
925REAL FUNCTION FOO (@var{names})
926@end smallexample
927
928where @var{names} is a comma-separated list of names,
929can be one or the other.
930
931The only way to disambiguate that statement
932(short of mandating free-form source or a short maximum
933length for name for external procedures)
934is based on the context of the statement.
935
936In particular, the statement is known to be within an
937already-started program unit
938(but not at the outer level of the @code{CONTAINS} block),
939it is a type-declaration statement.
940
941Otherwise, the statement is a @code{FUNCTION} statement,
942in that it begins a function program unit
943(external, or, within @code{CONTAINS}, nested).
944
945@subsubsection Weird READ
946
947The statement
948
949@smallexample
950READ (N)
951@end smallexample
952
953is equivalent to either
954
955@smallexample
956READ (UNIT=(N))
957@end smallexample
958
959or
960
961@smallexample
962READ (FMT=(N))
963@end smallexample
964
965depending on which would be valid in context.
966
967Specifically, if @samp{N} is type @code{INTEGER},
968@samp{READ (FMT=(N))} would not be valid,
969because parentheses may not be used around @samp{N},
970whereas they may around it in @samp{READ (UNIT=(N))}.
971
972Further, if @samp{N} is type @code{CHARACTER},
973the opposite is true---@samp{READ (UNIT=(N))} is not valid,
974but @samp{READ (FMT=(N))} is.
975
976Strictly speaking, if anything follows
977
978@smallexample
979READ (N)
980@end smallexample
981
982in the statement, whether the first lexeme after the close
983parenthese is a comma could be used to disambiguate the two cases,
984without looking at the type of @samp{N},
985because the comma is required for the @samp{READ (FMT=(N))}
986interpretation and disallowed for the @samp{READ (UNIT=(N))}
987interpretation.
988
989However, in practice, many Fortran compilers allow
990the comma for the @samp{READ (UNIT=(N))}
991interpretation anyway
992(in that they generally allow a leading comma before
993an I/O list in an I/O statement),
994and much code takes advantage of this allowance.
995
996(This is quite a reasonable allowance, since the
997juxtaposition of a comma-separated list immediately
998after an I/O control-specification list, which is also comma-separated,
999without an intervening comma,
1000looks sufficiently ``wrong'' to programmers
1001that they can't resist the itch to insert the comma.
1002@samp{READ (I, J), K, L} simply looks cleaner than
1003@samp{READ (I, J) K, L}.)
1004
1005So, type-based disambiguation is needed unless strict adherence
1006to the standard is always assumed, and we're not going to assume that.
1007
1008@node TBD (Transforming)
1009@subsection TBD (Transforming)
1010
1011Continue researching gotchas, designing the transformational process,
1012and implementing it.
1013
1014Specific issues to resolve:
1015
1016@itemize @bullet
1017@item
1018Just where should (if it was implemented) @code{USE} processing take place?
1019
1020This gets into the whole issue of how @code{g77} should handle the concept
1021of modules.
1022I think GNAT already takes on this issue, but don't know more than that.
1023Jim Giles has written extensively on @code{comp.lang.fortran}
1024about his opinions on module handling, as have others.
1025Jim's views should be taken into account.
1026
1027Actually, Richard M. Stallman (RMS) also has written up
1028some guidelines for implementing such things,
1029but I'm not sure where I read them.
1030Perhaps the old @email{gcc2@@cygnus.com} list.
1031
1032If someone could dig references to these up and get them to me,
1033that would be much appreciated!
1034Even though modules are not on the short-term list for implementation,
1035it'd be helpful to know @emph{now} how to avoid making them harder to
1036implement them @emph{later}.
1037
1038@item
1039Should the @code{g77} command become just a script that invokes
1040all the various preprocessing that might be needed,
1041thus making it seem slower than necessary for legacy code
1042that people are unwilling to convert,
1043or should we provide a separate script for that,
1044thus encouraging people to convert their code once and for all?
1045
1046At least, a separate script to behave as old @code{g77} did,
1047perhaps named @code{g77old}, might ease the transition,
1048as might a corresponding one that converts source codes
1049named @code{g77oldnew}.
1050
1051These scripts would take all the pertinent options @code{g77} used
1052to take and run the appropriate filters,
1053passing the results to @code{g77} or just making new sources out of them
1054(in a subdirectory, leaving the user to do the dirty deed of
1055moving or copying them over the old sources).
1056
1057@item
1058Do other Fortran compilers provide a prefix syntax
1059to govern the treatment of backslashes in @code{CHARACTER}
1060(or Hollerith) constants?
1061
1062Knowing what other compilers provide would help.
1063
1064@item
1065Is it okay to drop support for the @samp{-fintrin-case-initcap},
1066@samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap},
1067and @samp{-fcase-initcap} options?
1068
1069I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
1070Not having to support these makes it easier to write the new front end,
1071and might also avoid complicated its design.
1072
1073The consensus to date (1999-11-17) has been to drop this support.
1074Can't recall anybody saying they're using it, in fact.
1075@end itemize
1076
1077@node Philosophy of Code Generation
1078@section Philosophy of Code Generation
1079
1080Don't poke the bear.
1081
1082The @code{g77} front end generates code
1083via the @code{gcc} back end.
1084
1085@cindex GNU Back End (GBE)
1086@cindex GBE
1087@cindex @code{gcc}, back end
1088@cindex back end, gcc
1089@cindex code generator
1090The @code{gcc} back end (GBE) is a large, complex
1091labyrinth of intricate code
1092written in a combination of the C language
1093and specialized languages internal to @code{gcc}.
1094
1095While the @emph{code} that implements the GBE
1096is written in a combination of languages,
1097the GBE itself is,
1098to the front end for a language like Fortran,
1099best viewed as a @emph{compiler}
1100that compiles its own, unique, language.
1101
1102The GBE's ``source'', then, is written in this language,
1103which consists primarily of
1104a combination of calls to GBE functions
1105and @dfn{tree} nodes
1106(which are, themselves, created
1107by calling GBE functions).
1108
1109So, the @code{g77} generates code by, in effect,
1110translating the Fortran code it reads
1111into a form ``written'' in the ``language''
1112of the @code{gcc} back end.
1113
1114@cindex GBEL
1115@cindex GNU Back End Language (GBEL)
1116This language will heretofore be referred to as @dfn{GBEL},
1117for GNU Back End Language.
1118
1119GBEL is an evolving language,
1120not fully specified in any published form
1121as of this writing.
1122It offers many facilities,
1123but its ``core'' facilities
1124are those that corresponding most directly
1125to those needed to support @code{gcc}
1126(compiling code written in GNU C).
1127
1128The @code{g77} Fortran Front End (FFE)
1129is designed and implemented
1130to navigate the currents and eddies
1131of ongoing GBEL and @code{gcc} development
1132while also delivering on the potential
1133of an integrated FFE
1134(as compared to using a converter like @code{f2c}
1135and feeding the output into @code{gcc}).
1136
1137Goals of the FFE's code-generation strategy include:
1138
1139@itemize @bullet
1140@item
1141High likelihood of generation of correct code,
1142or, failing that, producing a fatal diagnostic or crashing.
1143
1144@item
1145Generation of highly optimized code,
1146as directed by the user
1147via GBE-specific (versus @code{g77}-specific) constructs,
1148such as command-line options.
1149
1150@item
1151Fast overall (FFE plus GBE) compilation.
1152
1153@item
1154Preservation of source-level debugging information.
1155@end itemize
1156
1157The strategies historically, and currently, used by the FFE
1158to achieve these goals include:
1159
1160@itemize @bullet
1161@item
1162Use of GBEL constructs that most faithfully encapsulate
1163the semantics of Fortran.
1164
1165@item
1166Avoidance of GBEL constructs that are so rarely used,
1167or limited to use in specialized situations not related to Fortran,
1168that their reliability and performance has not yet been established
1169as sufficient for use by the FFE.
1170
1171@item
1172Flexible design, to readily accommodate changes to specific
1173code-generation strategies, perhaps governed by command-line options.
1174@end itemize
1175
1176@cindex Bear-poking
1177@cindex Poking the bear
1178``Don't poke the bear'' somewhat summarizes the above strategies.
1179The GBE is the bear.
1180The FFE is designed and implemented to avoid poking it
1181in ways that are likely to just annoy it.
1182The FFE usually either tackles it head-on,
1183or avoids treating it in ways dissimilar to how
1184the @code{gcc} front end treats it.
1185
1186For example, the FFE uses the native array facility in the back end
1187instead of the lower-level pointer-arithmetic facility
1188used by @code{gcc} when compiling @code{f2c} output).
1189Theoretically, this presents more opportunities for optimization,
1190faster compile times,
1191and the production of more faithful debugging information.
1192These benefits were not, however, immediately realized,
1193mainly because @code{gcc} itself makes little or no use
1194of the native array facility.
1195
1196Complex arithmetic is a case study of the evolution of this strategy.
1197When originally implemented,
1198the GBEL had just evolved its own native complex-arithmetic facility,
1199so the FFE took advantage of that.
1200
1201When porting @code{g77} to 64-bit systems,
1202it was discovered that the GBE didn't really
1203implement its native complex-arithmetic facility properly.
1204
1205The short-term solution was to rewrite the FFE
1206to instead use the lower-level facilities
1207that'd be used by @code{gcc}-compiled code
1208(assuming that code, itself, didn't use the native complex type
1209provided, as an extension, by @code{gcc}),
1210since these were known to work,
1211and, in any case, if shown to not work,
1212would likely be rapidly fixed
1213(since they'd likely not work for vanilla C code in similar circumstances).
1214
1215However, the rewrite accommodated the original, native approach as well
1216by offering a command-line option to select it over the emulated approach.
1217This allowed users, and especially GBE maintainers, to try out
1218fixes to complex-arithmetic support in the GBE
1219while @code{g77} continued to default to compiling more code correctly,
1220albeit producing (typically) slower executables.
1221
1222As of April 1999, it appeared that the last few bugs
1223in the GBE's support of its native complex-arithmetic facility
1224were worked out.
1225The FFE was changed back to default to using that native facility,
1226leaving emulation as an option.
1227
1228Later during the release cycle
1229(which was called EGCS 1.2, but soon became GCC 2.95),
1230bugs in the native facility were found.
1231Reactions among various people included
1232``the last thing we should do is change the default back'',
1233``we must change the default back'',
1234and ``let's figure out whether we can narrow down the bugs to
1235few enough cases to allow the now-months-long-tested default
1236to remain the same''.
1237The latter viewpoint won that particular time.
1238The bugs exposed other concerns regarding ABI compliance
1239when the ABI specified treatment of complex data as different
1240from treatment of what Fortran and GNU C consider the equivalent
1241aggregation (structure) of real (or float) pairs.
1242
1243Other Fortran constructs---arrays, character strings,
1244complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
1245and so on---involve issues similar to those pertaining to complex arithmetic.
1246
1247So, it is possible that the history
1248of how the FFE handled complex arithmetic
1249will be repeated, probably in modified form
1250(and hopefully over shorter timeframes),
1251for some of these other facilities.
1252
1253@node Two-pass Design
1254@section Two-pass Design
1255
1256The FFE does not tell the GBE anything about a program unit
1257until after the last statement in that unit has been parsed.
1258(A program unit is a Fortran concept that corresponds, in the C world,
1259mostly closely to functions definitions in ISO C.
1260That is, a program unit in Fortran is like a top-level function in C.
1261Nested functions, found among the extensions offered by GNU C,
1262correspond roughly to Fortran's statement functions.)
1263
1264So, while parsing the code in a program unit,
1265the FFE saves up all the information
1266on statements, expressions, names, and so on,
1267until it has seen the last statement.
1268
1269At that point, the FFE revisits the saved information
1270(in what amounts to a second @dfn{pass} over the program unit)
1271to perform the actual translation of the program unit into GBEL,
1272ultimating in the generation of assembly code for it.
1273
1274Some lookahead is performed during this second pass,
1275so the FFE could be viewed as a ``two-plus-pass'' design.
1276
1277@menu
1278* Two-pass Code::
1279* Why Two Passes::
1280@end menu
1281
1282@node Two-pass Code
1283@subsection Two-pass Code
1284
1285Most of the code that turns the first pass (parsing)
1286into a second pass for code generation
1287is in @file{@value{path-g77}/std.c}.
1288
1289It has external functions,
1290called mainly by siblings in @file{@value{path-g77}/stc.c},
1291that record the information on statements and expressions
1292in the order they are seen in the source code.
1293These functions save that information.
1294
1295It also has an external function that revisits that information,
1296calling the siblings in @file{@value{path-g77}/ste.c},
1297which handles the actual code generation
1298(by generating GBEL code,
1299that is, by calling GBE routines
1300to represent and specify expressions, statements, and so on).
1301
1302@node Why Two Passes
1303@subsection Why Two Passes
1304
1305The need for two passes was not immediately evident
1306during the design and implementation of the code in the FFE
1307that was to produce GBEL.
1308Only after a few kludges,
1309to handle things like incorrectly-guessed @code{ASSIGN} label nature,
1310had been implemented,
1311did enough evidence pile up to make it clear
1312that @file{std.c} had to be introduced to intercept,
1313save, then revisit as part of a second pass,
1314the digested contents of a program unit.
1315
1316Other such missteps have occurred during the evolution of the FFE,
1317because of the different goals of the FFE and the GBE.
1318
1319Because the GBE's original, and still primary, goal
1320was to directly support the GNU C language,
1321the GBEL, and the GBE itself,
1322requires more complexity
1323on the part of most front ends
1324than it requires of @code{gcc}'s.
1325
1326For example,
1327the GBEL offers an interface that permits the @code{gcc} front end
1328to implement most, or all, of the language features it supports,
1329without the front end having to
1330make use of non-user-defined variables.
1331(It's almost certainly the case that all of K&R C,
1332and probably ANSI C as well,
1333is handled by the @code{gcc} front end
1334without declaring such variables.)
1335
1336The FFE, on the other hand, must resort to a variety of ``tricks''
1337to achieve its goals.
1338
1339Consider the following C code:
1340
1341@smallexample
1342int
1343foo (int a, int b)
1344@{
1345  int c = 0;
1346
1347  if ((c = bar (c)) == 0)
1348    goto done;
1349
1350  quux (c << 1);
1351
1352done:
1353  return c;
1354@}
1355@end smallexample
1356
1357Note what kinds of objects are declared, or defined, before their use,
1358and before any actual code generation involving them
1359would normally take place:
1360
1361@itemize @bullet
1362@item
1363Return type of function
1364
1365@item
1366Entry point(s) of function
1367
1368@item
1369Dummy arguments
1370
1371@item
1372Variables
1373
1374@item
1375Initial values for variables
1376@end itemize
1377
1378Whereas, the following items can, and do,
1379suddenly appear ``out of the blue'' in C:
1380
1381@itemize @bullet
1382@item
1383Label references
1384
1385@item
1386Function references
1387@end itemize
1388
1389Not surprisingly, the GBE faithfully permits the latter set of items
1390to be ``discovered'' partway through GBEL ``programs'',
1391just as they are permitted to in C.
1392
1393Yet, the GBE has tended, at least in the past,
1394to be reticent to fully support similar ``late'' discovery
1395of items in the former set.
1396
1397This makes Fortran a poor fit for the ``safe'' subset of GBEL.
1398Consider:
1399
1400@smallexample
1401      FUNCTION X (A, ARRAY, ID1)
1402      CHARACTER*(*) A
1403      DOUBLE PRECISION X, Y, Z, TMP, EE, PI
1404      REAL ARRAY(ID1*ID2)
1405      COMMON ID2
1406      EXTERNAL FRED
1407
1408      ASSIGN 100 TO J
1409      CALL FOO (I)
1410      IF (I .EQ. 0) PRINT *, A(0)
1411      GOTO 200
1412
1413      ENTRY Y (Z)
1414      ASSIGN 101 TO J
1415200   PRINT *, A(1)
1416      READ *, TMP
1417      GOTO J
1418100   X = TMP * EE
1419      RETURN
1420101   Y = TMP * PI
1421      CALL FRED
1422      DATA EE, PI /2.71D0, 3.14D0/
1423      END
1424@end smallexample
1425
1426Here are some observations about the above code,
1427which, while somewhat contrived,
1428conforms to the FORTRAN 77 and Fortran 90 standards:
1429
1430@itemize @bullet
1431@item
1432The return type of function @samp{X} is not known
1433until the @samp{DOUBLE PRECISION} line has been parsed.
1434
1435@item
1436Whether @samp{A} is a function or a variable
1437is not known until the @samp{PRINT *, A(0)} statement
1438has been parsed.
1439
1440@item
1441The bounds of the array of argument @samp{ARRAY}
1442depend on a computation involving
1443the subsequent argument @samp{ID1}
1444and the blank-common member @samp{ID2}.
1445
1446@item
1447Whether @samp{Y} and @samp{Z} are local variables,
1448additional function entry points,
1449or dummy arguments to additional entry points
1450is not known
1451until the @code{ENTRY} statement is parsed.
1452
1453@item
1454Similarly, whether @samp{TMP} is a local variable is not known
1455until the @samp{READ *, TMP} statement is parsed.
1456
1457@item
1458The initial values for @samp{EE} and @samp{PI}
1459are not known until after the @code{DATA} statement is parsed.
1460
1461@item
1462Whether @samp{FRED} is a function returning type @code{REAL}
1463or a subroutine
1464(which can be thought of as returning type @code{void}
1465@emph{or}, to support alternate returns in a simple way,
1466type @code{int})
1467is not known
1468until the @samp{CALL FRED} statement is parsed.
1469
1470@item
1471Whether @samp{100} is a @code{FORMAT} label
1472or the label of an executable statement
1473is not known
1474until the @samp{X =} statement is parsed.
1475(These two types of labels get @emph{very} different treatment,
1476especially when @code{ASSIGN}'ed.)
1477
1478@item
1479That @samp{J} is a local variable is not known
1480until the first @code{ASSIGN} statement is parsed.
1481(This happens @emph{after} executable code has been seen.)
1482@end itemize
1483
1484Very few of these ``discoveries''
1485can be accommodated by the GBE as it has evolved over the years.
1486The GBEL doesn't support several of them,
1487and those it might appear to support
1488don't always work properly,
1489especially in combination with other GBEL and GBE features,
1490as implemented in the GBE.
1491
1492(Had the GBE and its GBEL originally evolved to support @code{g77},
1493the shoe would be on the other foot, so to speak---most, if not all,
1494of the above would be directly supported by the GBEL,
1495and a few C constructs would probably not, as they are in reality,
1496be supported.
1497Both this mythical, and today's real, GBE caters to its GBEL
1498by, sometimes, scrambling around, cleaning up after itself---after
1499discovering that assumptions it made earlier during code generation
1500are incorrect.
1501That's not a great design, since it indicates significant code
1502paths that might be rarely tested but used in some key production
1503environments.)
1504
1505So, the FFE handles these discrepancies---between the order in which
1506it discovers facts about the code it is compiling,
1507and the order in which the GBEL and GBE support such discoveries---by
1508performing what amounts to two
1509passes over each program unit.
1510
1511(A few ambiguities can remain at that point,
1512such as whether, given @samp{EXTERNAL BAZ}
1513and no other reference to @samp{BAZ} in the program unit,
1514it is a subroutine, a function, or a block-data---which, in C-speak,
1515governs its declared return type.
1516Fortunately, these distinctions are easily finessed
1517for the procedure, library, and object-file interfaces
1518supported by @code{g77}.)
1519
1520@node Challenges Posed
1521@section Challenges Posed
1522
1523Consider the following Fortran code, which uses various extensions
1524(including some to Fortran 90):
1525
1526@smallexample
1527SUBROUTINE X(A)
1528CHARACTER*(*) A
1529COMPLEX CFUNC
1530INTEGER*2 CLOCKS(200)
1531INTEGER IFUNC
1532
1533CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')'))))
1534@end smallexample
1535
1536The above poses the following challenges to any Fortran compiler
1537that uses run-time interfaces, and a run-time library, roughly similar
1538to those used by @code{g77}:
1539
1540@itemize @bullet
1541@item
1542Assuming the library routine that supports @code{SYSTEM_CLOCK}
1543expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument,
1544the compiler must make available to it a temporary variable of that type.
1545
1546@item
1547Further, after the @code{SYSTEM_CLOCK} library routine returns,
1548the compiler must ensure that the temporary variable it wrote
1549is copied into the appropriate element of the @samp{CLOCKS} array.
1550(This assumes the compiler doesn't just reject the code,
1551which it should if it is compiling under some kind of a ``strict'' option.)
1552
1553@item
1554To determine the correct index into the @samp{CLOCKS} array,
1555(putting aside the fact that the index, in this particular case,
1556need not be computed until after
1557the @code{SYSTEM_CLOCK} library routine returns),
1558the compiler must ensure that the @code{IFUNC} function is called.
1559
1560That requires evaluating its argument,
1561which requires, for @code{g77}
1562(assuming @code{-ff2c} is in force),
1563reserving a temporary variable of type @code{COMPLEX}
1564for use as a repository for the return value
1565being computed by @samp{CFUNC}.
1566
1567@item
1568Before invoking @samp{CFUNC},
1569is argument must be evaluated,
1570which requires allocating, at run time,
1571a temporary large enough to hold the result of the concatenation,
1572as well as actually performing the concatenation.
1573
1574@item
1575The large temporary needed during invocation of @code{CFUNC}
1576should, ideally, be deallocated
1577(or, at least, left to the GBE to dispose of, as it sees fit)
1578as soon as @code{CFUNC} returns,
1579which means before @code{IFUNC} is called
1580(as it might need a lot of dynamically allocated memory).
1581@end itemize
1582
1583@code{g77} currently doesn't support all of the above,
1584but, so that it might someday, it has evolved to handle
1585at least some of the above requirements.
1586
1587Meeting the above requirements is made more challenging
1588by conforming to the requirements of the GBEL/GBE combination.
1589
1590@node Transforming Statements
1591@section Transforming Statements
1592
1593Most Fortran statements are given their own block,
1594and, for temporary variables they might need, their own scope.
1595(A block is what distinguishes @samp{@{ foo (); @}}
1596from just @samp{foo ();} in C.
1597A scope is included with every such block,
1598providing a distinct name space for local variables.)
1599
1600Label definitions for the statement precede this block,
1601so @samp{10 PRINT *, I} is handled more like
1602@samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}}
1603(where @samp{fl10} is just a notation meaning ``Fortran Label 10''
1604for the purposes of this document).
1605
1606@menu
1607* Statements Needing Temporaries::
1608* Transforming DO WHILE::
1609* Transforming Iterative DO::
1610* Transforming Block IF::
1611* Transforming SELECT CASE::
1612@end menu
1613
1614@node Statements Needing Temporaries
1615@subsection Statements Needing Temporaries
1616
1617Any temporaries needed during, but not beyond,
1618execution of a Fortran statement,
1619are made local to the scope of that statement's block.
1620
1621This allows the GBE to share storage for these temporaries
1622among the various statements without the FFE
1623having to manage that itself.
1624
1625(The GBE could, of course, decide to optimize
1626management of these temporaries.
1627For example, it could, theoretically,
1628schedule some of the computations involving these temporaries
1629to occur in parallel.
1630More practically, it might leave the storage for some temporaries
1631``live'' beyond their scopes, to reduce the number of
1632manipulations of the stack pointer at run time.)
1633
1634Temporaries needed across distinct statement boundaries usually
1635are associated with Fortran blocks (such as @code{DO}/@code{END DO}).
1636(Also, there might be temporaries not associated with blocks at all---these
1637would be in the scope of the entire program unit.)
1638
1639Each Fortran block @emph{should} get its own block/scope in the GBE.
1640This is best, because it allows temporaries to be more naturally handled.
1641However, it might pose problems when handling labels
1642(in particular, when they're the targets of @code{GOTO}s outside the Fortran
1643block), and generally just hassling with replicating
1644parts of the @code{gcc} front end
1645(because the FFE needs to support
1646an arbitrary number of nested back-end blocks
1647if each Fortran block gets one).
1648
1649So, there might still be a need for top-level temporaries, whose
1650``owning'' scope is that of the containing procedure.
1651
1652Also, there seems to be problems declaring new variables after
1653generating code (within a block) in the back end, leading to, e.g.,
1654@samp{label not defined before binding contour} or similar messages,
1655when compiling with @samp{-fstack-check} or
1656when compiling for certain targets.
1657
1658Because of that, and because sometimes these temporaries are not
1659discovered until in the middle of of generating code for an expression
1660statement (as in the case of the optimization for @samp{X**I}),
1661it seems best to always
1662pre-scan all the expressions that'll be expanded for a block
1663before generating any of the code for that block.
1664
1665This pre-scan then handles discovering and declaring, to the back end,
1666the temporaries needed for that block.
1667
1668It's also important to treat distinct items in an I/O list as distinct
1669statements deserving their own blocks.
1670That's because there's a requirement
1671that each I/O item be fully processed before the next one,
1672which matters in cases like @samp{READ (*,*), I, A(I)}---the
1673element of @samp{A} read in the second item
1674@emph{must} be determined from the value
1675of @samp{I} read in the first item.
1676
1677@node Transforming DO WHILE
1678@subsection Transforming DO WHILE
1679
1680@samp{DO WHILE(expr)} @emph{must} be implemented
1681so that temporaries needed to evaluate @samp{expr}
1682are generated just for the test, each time.
1683
1684Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed:
1685
1686@smallexample
1687for (;;)
1688  @{
1689    int temp0;
1690
1691    @{
1692      char temp1[large];
1693
1694      libg77_catenate (temp1, a, b);
1695      temp0 = libg77_ne (temp1, 'END');
1696    @}
1697
1698    if (! temp0)
1699      break;
1700
1701    @dots{}
1702  @}
1703@end smallexample
1704
1705In this case, it seems like a time/space tradeoff
1706between allocating and deallocating @samp{temp1} for each iteration
1707and allocating it just once for the entire loop.
1708
1709However, if @samp{temp1} is allocated just once for the entire loop,
1710it could be the wrong size for subsequent iterations of that loop
1711in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')},
1712because the body of the loop might modify @samp{I} or @samp{J}.
1713
1714So, the above implementation is used,
1715though a more optimal one can be used
1716in specific circumstances.
1717
1718@node Transforming Iterative DO
1719@subsection Transforming Iterative DO
1720
1721An iterative @code{DO} loop
1722(one that specifies an iteration variable)
1723is required by the Fortran standards
1724to be implemented as though an iteration count
1725is computed before entering the loop body,
1726and that iteration count used to determine
1727the number of times the loop body is to be performed
1728(assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}).
1729
1730The FFE handles this by allocating a temporary variable
1731to contain the computed number of iterations.
1732Since this variable must be in a scope that includes the entire loop,
1733a GBEL block is created for that loop,
1734and the variable declared as belonging to the scope of that block.
1735
1736@node Transforming Block IF
1737@subsection Transforming Block IF
1738
1739Consider:
1740
1741@smallexample
1742SUBROUTINE X(A,B,C)
1743CHARACTER*(*) A, B, C
1744LOGICAL LFUNC
1745
1746IF (LFUNC (A//B)) THEN
1747  CALL SUBR1
1748ELSE IF (LFUNC (A//C)) THEN
1749  CALL SUBR2
1750ELSE
1751  CALL SUBR3
1752END
1753@end smallexample
1754
1755The arguments to the two calls to @samp{LFUNC}
1756require dynamic allocation (at run time),
1757but are not required during execution of the @code{CALL} statements.
1758
1759So, the scopes of those temporaries must be within blocks inside
1760the block corresponding to the Fortran @code{IF} block.
1761
1762This cannot be represented ``naturally''
1763in vanilla C, nor in GBEL.
1764The @code{if}, @code{elseif}, @code{else},
1765and @code{endif} constructs
1766provided by both languages must,
1767for a given @code{if} block,
1768share the same C/GBE block.
1769
1770Therefore, any temporaries needed during evaluation of @samp{expr}
1771while executing @samp{ELSE IF(expr)}
1772must either have been predeclared
1773at the top of the corresponding @code{IF} block,
1774or declared within a new block for that @code{ELSE IF}---a block that,
1775since it cannot contain the @code{else} or @code{else if} itself
1776(due to the above requirement),
1777actually implements the rest of the @code{IF} block's
1778@code{ELSE IF} and @code{ELSE} statements
1779within an inner block.
1780
1781The FFE takes the latter approach.
1782
1783@node Transforming SELECT CASE
1784@subsection Transforming SELECT CASE
1785
1786@code{SELECT CASE} poses a few interesting problems for code generation,
1787if efficiency and frugal stack management are important.
1788
1789Consider @samp{SELECT CASE (I('PREFIX'//A))},
1790where @samp{A} is @code{CHARACTER*(*)}.
1791In a case like this---basically,
1792in any case where largish temporaries are needed
1793to evaluate the expression---those temporaries should
1794not be ``live'' during execution of any of the @code{CASE} blocks.
1795
1796So, evaluation of the expression is best done within its own block,
1797which in turn is within the @code{SELECT CASE} block itself
1798(which contains the code for the CASE blocks as well,
1799though each within their own block).
1800
1801Otherwise, we'd have the rough equivalent of this pseudo-code:
1802
1803@smallexample
1804@{
1805  char temp[large];
1806
1807  libg77_catenate (temp, 'prefix', a);
1808
1809  switch (i (temp))
1810    @{
1811    case 0:
1812      @dots{}
1813    @}
1814@}
1815@end smallexample
1816
1817And that would leave temp[large] in scope during the CASE blocks
1818(although a clever back end *could* see that it isn't referenced
1819in them, and thus free that temp before executing the blocks).
1820
1821So this approach is used instead:
1822
1823@smallexample
1824@{
1825  int temp0;
1826
1827  @{
1828    char temp1[large];
1829
1830    libg77_catenate (temp1, 'prefix', a);
1831    temp0 = i (temp1);
1832  @}
1833
1834  switch (temp0)
1835    @{
1836    case 0:
1837      @dots{}
1838    @}
1839@}
1840@end smallexample
1841
1842Note how @samp{temp1} goes out of scope before starting the switch,
1843thus making it easy for a back end to free it.
1844
1845The problem @emph{that} solution has, however,
1846is with @samp{SELECT CASE('prefix'//A)}
1847(which is currently not supported).
1848
1849Unless the GBEL is extended to support arbitrarily long character strings
1850in its @code{case} facility,
1851the FFE has to implement @code{SELECT CASE} on @code{CHARACTER}
1852(probably excepting @code{CHARACTER*1})
1853using a cascade of
1854@code{if}, @code{elseif}, @code{else}, and @code{endif} constructs
1855in GBEL.
1856
1857To prevent the (potentially large) temporary,
1858needed to hold the selected expression itself (@samp{'prefix'//A}),
1859from being in scope during execution of the @code{CASE} blocks,
1860two approaches are available:
1861
1862@itemize @bullet
1863@item
1864Pre-evaluate all the @code{CASE} tests,
1865producing an integer ordinal that is used,
1866a la @samp{temp0} in the earlier example,
1867as if @samp{SELECT CASE(temp0)} had been written.
1868
1869Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})},
1870where @var{i} is the ordinal for that case,
1871determined while, or before,
1872generating the cascade of @code{if}-related constructs
1873to cope with @code{CHARACTER} selection.
1874
1875@item
1876Make @samp{temp0} above just
1877large enough to hold the longest @code{CASE} string
1878that'll actually be compared against the expression
1879(in this case, @samp{'prefix'//A}).
1880
1881Since that length must be constant
1882(because @code{CASE} expressions are all constant),
1883it won't be so large,
1884and, further, @samp{temp1} need not be dynamically allocated,
1885since normal @code{CHARACTER} assignment can be used
1886into the fixed-length @samp{temp0}.
1887@end itemize
1888
1889Both of these solutions require @code{SELECT CASE} implementation
1890to be changed so all the corresponding @code{CASE} statements
1891are seen during the actual code generation for @code{SELECT CASE}.
1892
1893@node Transforming Expressions
1894@section Transforming Expressions
1895
1896The interactions between statements, expressions, and subexpressions
1897at program run time can be viewed as:
1898
1899@smallexample
1900@var{action}(@var{expr})
1901@end smallexample
1902
1903Here, @var{action} is the series of steps
1904performed to effect the statement,
1905and @var{expr} is the expression
1906whose value is used by @var{action}.
1907
1908Expanding the above shows a typical order of events at run time:
1909
1910@smallexample
1911Evaluate @var{expr}
1912Perform @var{action}, using result of evaluation of @var{expr}
1913Clean up after evaluating @var{expr}
1914@end smallexample
1915
1916So, if evaluating @var{expr} requires allocating memory,
1917that memory can be freed before performing @var{action}
1918only if it is not needed to hold the result of evaluating @var{expr}.
1919Otherwise, it must be freed no sooner than
1920after @var{action} has been performed.
1921
1922The above are recursive definitions,
1923in the sense that they apply to subexpressions of @var{expr}.
1924
1925That is, evaluating @var{expr} involves
1926evaluating all of its subexpressions,
1927performing the @var{action} that computes the
1928result value of @var{expr},
1929then cleaning up after evaluating those subexpressions.
1930
1931The recursive nature of this evaluation is implemented
1932via recursive-descent transformation of the top-level statements,
1933their expressions, @emph{their} subexpressions, and so on.
1934
1935However, that recursive-descent transformation is,
1936due to the nature of the GBEL,
1937focused primarily on generating a @emph{single} stream of code
1938to be executed at run time.
1939
1940Yet, from the above, it's clear that multiple streams of code
1941must effectively be simultaneously generated
1942during the recursive-descent analysis of statements.
1943
1944The primary stream implements the primary @var{action} items,
1945while at least two other streams implement
1946the evaluation and clean-up items.
1947
1948Requirements imposed by expressions include:
1949
1950@itemize @bullet
1951@item
1952Whether the caller needs to have a temporary ready
1953to hold the value of the expression.
1954
1955@item
1956Other stuff???
1957@end itemize
1958
1959@node Internal Naming Conventions
1960@section Internal Naming Conventions
1961
1962Names exported by FFE modules have the following (regular-expression) forms.
1963Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
1964where @var{mod} is lowercase or uppercase alphanumerics, respectively,
1965are exported by the module @code{ffe@var{mod}},
1966with the source code doing the exporting in @file{@var{mod}.h}.
1967(Usually, the source code for the implementation is in @file{@var{mod}.c}.)
1968
1969Identifiers that don't fit the following forms
1970are not considered exported,
1971even if they are according to the C language.
1972(For example, they might be made available to other modules
1973solely for use within expansions of exported macros,
1974not for use within any source code in those other modules.)
1975
1976@table @code
1977@item ffe@var{mod}
1978The single typedef exported by the module.
1979
1980@item FFE@var{umod}_[A-Z][A-Z0-9_]*
1981(Where @var{umod} is the uppercase for of @var{mod}.)
1982
1983A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
1984
1985@item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
1986A typedef exported by the module.
1987
1988The portion of the identifier after @code{ffe@var{mod}} is
1989referred to as @code{ctype}, a capitalized (mixed-case) form
1990of @code{type}.
1991
1992@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
1993(Where @var{umod} is the uppercase for of @var{mod}.)
1994
1995A @code{#define} or @code{enum} constant of the type
1996@code{ffe@var{mod}@var{type}},
1997where @var{type} is the lowercase form of @var{ctype}
1998in an exported typedef.
1999
2000@item ffe@var{mod}_@var{value}
2001A function that does or returns something,
2002as described by @var{value} (see below).
2003
2004@item ffe@var{mod}_@var{value}_@var{input}
2005A function that does or returns something based
2006primarily on the thing described by @var{input} (see below).
2007@end table
2008
2009Below are names used for @var{value} and @var{input},
2010along with their definitions.
2011
2012@table @code
2013@item col
2014A column number within a line (first column is number 1).
2015
2016@item file
2017An encapsulation of a file's name.
2018
2019@item find
2020Looks up an instance of some type that matches specified criteria,
2021and returns that, even if it has to create a new instance or
2022crash trying to find it (as appropriate).
2023
2024@item initialize
2025Initializes, usually a module.  No type.
2026
2027@item int
2028A generic integer of type @code{int}.
2029
2030@item is
2031A generic integer that contains a true (nonzero) or false (zero) value.
2032
2033@item len
2034A generic integer that contains the length of something.
2035
2036@item line
2037A line number within a source file,
2038or a global line number.
2039
2040@item lookup
2041Looks up an instance of some type that matches specified criteria,
2042and returns that, or returns nil.
2043
2044@item name
2045A @code{text} that points to a name of something.
2046
2047@item new
2048Makes a new instance of the indicated type.
2049Might return an existing one if appropriate---if so,
2050similar to @code{find} without crashing.
2051
2052@item pt
2053Pointer to a particular character (line, column pairs)
2054in the input file (source code being compiled).
2055
2056@item run
2057Performs some herculean task.  No type.
2058
2059@item terminate
2060Terminates, usually a module.  No type.
2061
2062@item text
2063A @code{char *} that points to generic text.
2064@end table
2065