xref: /openbsd/gnu/usr.bin/gcc/gcc/f/ffe.texi (revision c87b03e5)
1*c87b03e5Sespie@c Copyright (C) 1999 Free Software Foundation, Inc.
2*c87b03e5Sespie@c This is part of the G77 manual.
3*c87b03e5Sespie@c For copying conditions, see the file g77.texi.
4*c87b03e5Sespie
5*c87b03e5Sespie@node Front End
6*c87b03e5Sespie@chapter Front End
7*c87b03e5Sespie@cindex GNU Fortran Front End (FFE)
8*c87b03e5Sespie@cindex FFE
9*c87b03e5Sespie@cindex @code{g77}, front end
10*c87b03e5Sespie@cindex front end, @code{g77}
11*c87b03e5Sespie
12*c87b03e5SespieThis chapter describes some aspects of the design and implementation
13*c87b03e5Sespieof the @code{g77} front end.
14*c87b03e5Sespie
15*c87b03e5SespieTo find about things that are ``To Be Determined'' or ``To Be Done'',
16*c87b03e5Sespiesearch for the string TBD.
17*c87b03e5SespieIf you want to help by working on one or more of these items,
18*c87b03e5Sespieemail @email{gcc@@gcc.gnu.org}.
19*c87b03e5SespieIf you're planning to do more than just research issues and offer comments,
20*c87b03e5Sespiesee @uref{http://gcc.gnu.org/contribute.html} for steps you might
21*c87b03e5Sespieneed to take first.
22*c87b03e5Sespie
23*c87b03e5Sespie@menu
24*c87b03e5Sespie* Overview of Sources::
25*c87b03e5Sespie* Overview of Translation Process::
26*c87b03e5Sespie* Philosophy of Code Generation::
27*c87b03e5Sespie* Two-pass Design::
28*c87b03e5Sespie* Challenges Posed::
29*c87b03e5Sespie* Transforming Statements::
30*c87b03e5Sespie* Transforming Expressions::
31*c87b03e5Sespie* Internal Naming Conventions::
32*c87b03e5Sespie@end menu
33*c87b03e5Sespie
34*c87b03e5Sespie@node Overview of Sources
35*c87b03e5Sespie@section Overview of Sources
36*c87b03e5Sespie
37*c87b03e5SespieThe current directory layout includes the following:
38*c87b03e5Sespie
39*c87b03e5Sespie@table @file
40*c87b03e5Sespie@item @value{srcdir}/gcc/
41*c87b03e5SespieNon-g77 files in gcc
42*c87b03e5Sespie
43*c87b03e5Sespie@item @value{srcdir}/gcc/f/
44*c87b03e5SespieGNU Fortran front end sources
45*c87b03e5Sespie
46*c87b03e5Sespie@item @value{srcdir}/libf2c/
47*c87b03e5Sespie@code{libg2c} configuration and @code{g2c.h} file generation
48*c87b03e5Sespie
49*c87b03e5Sespie@item @value{srcdir}/libf2c/libF77/
50*c87b03e5SespieGeneral support and math portion of @code{libg2c}
51*c87b03e5Sespie
52*c87b03e5Sespie@item @value{srcdir}/libf2c/libI77/
53*c87b03e5SespieI/O portion of @code{libg2c}
54*c87b03e5Sespie
55*c87b03e5Sespie@item @value{srcdir}/libf2c/libU77/
56*c87b03e5SespieAdditional interfaces to Unix @code{libc} for @code{libg2c}
57*c87b03e5Sespie@end table
58*c87b03e5Sespie
59*c87b03e5SespieComponents of note in @code{g77} are described below.
60*c87b03e5Sespie
61*c87b03e5Sespie@file{f/} as a whole contains the source for @code{g77},
62*c87b03e5Sespiewhile @file{libf2c/} contains a portion of the separate program
63*c87b03e5Sespie@code{f2c}.
64*c87b03e5SespieNote that the @code{libf2c} code is not part of the program @code{g77},
65*c87b03e5Sespiejust distributed with it.
66*c87b03e5Sespie
67*c87b03e5Sespie@file{f/} contains text files that document the Fortran compiler, source
68*c87b03e5Sespiefiles for the GNU Fortran Front End (FFE), and some other stuff.
69*c87b03e5SespieThe @code{g77} compiler code is placed in @file{f/} because it,
70*c87b03e5Sespiealong with its contents,
71*c87b03e5Sespieis designed to be a subdirectory of a @code{gcc} source directory,
72*c87b03e5Sespie@file{gcc/},
73*c87b03e5Sespiewhich is structured so that language-specific front ends can be ``dropped
74*c87b03e5Sespiein'' as subdirectories.
75*c87b03e5SespieThe C++ front end (@code{g++}), is an example of this---it resides in
76*c87b03e5Sespiethe @file{cp/} subdirectory.
77*c87b03e5SespieNote that the C front end (also referred to as @code{gcc})
78*c87b03e5Sespieis an exception to this, as its source files reside
79*c87b03e5Sespiein the @file{gcc/} directory itself.
80*c87b03e5Sespie
81*c87b03e5Sespie@file{libf2c/} contains the run-time libraries for the @code{f2c} program,
82*c87b03e5Sespiealso used by @code{g77}.
83*c87b03e5SespieThese libraries normally referred to collectively as @code{libf2c}.
84*c87b03e5SespieWhen built as part of @code{g77},
85*c87b03e5Sespie@code{libf2c} is installed under the name @code{libg2c} to avoid
86*c87b03e5Sespieconflict with any existing version of @code{libf2c},
87*c87b03e5Sespieand thus is often referred to as @code{libg2c} when the
88*c87b03e5Sespie@code{g77} version is specifically being referred to.
89*c87b03e5Sespie
90*c87b03e5SespieThe @code{netlib} version of @code{libf2c/}
91*c87b03e5Sespiecontains two distinct libraries,
92*c87b03e5Sespie@code{libF77} and @code{libI77},
93*c87b03e5Sespieeach in their own subdirectories.
94*c87b03e5SespieIn @code{g77}, this distinction is not made,
95*c87b03e5Sespiebeyond maintaining the subdirectory structure in the source-code tree.
96*c87b03e5Sespie
97*c87b03e5Sespie@file{libf2c/} is not part of the program @code{g77},
98*c87b03e5Sespiejust distributed with it.
99*c87b03e5SespieIt contains files not present
100*c87b03e5Sespiein the official (@code{netlib}) version of @code{libf2c},
101*c87b03e5Sespieand also contains some minor changes made from @code{libf2c},
102*c87b03e5Sespieto fix some bugs,
103*c87b03e5Sespieand to facilitate automatic configuration, building, and installation of
104*c87b03e5Sespie@code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
105*c87b03e5SespieSee @file{libf2c/README} for more information,
106*c87b03e5Sespieincluding licensing conditions
107*c87b03e5Sespiegoverning distribution of programs containing code from @code{libg2c}.
108*c87b03e5Sespie
109*c87b03e5Sespie@code{libg2c}, @code{g77}'s version of @code{libf2c},
110*c87b03e5Sespieadds Dave Love's implementation of @code{libU77},
111*c87b03e5Sespiein the @file{libf2c/libU77/} directory.
112*c87b03e5SespieThis library is distributed under the
113*c87b03e5SespieGNU Library General Public License (LGPL)---see the
114*c87b03e5Sespiefile @file{libf2c/libU77/COPYING.LIB}
115*c87b03e5Sespiefor more information,
116*c87b03e5Sespieas this license
117*c87b03e5Sespiegoverns distribution conditions for programs containing code
118*c87b03e5Sespiefrom this portion of the library.
119*c87b03e5Sespie
120*c87b03e5SespieFiles of note in @file{f/} and @file{libf2c/} are described below:
121*c87b03e5Sespie
122*c87b03e5Sespie@table @file
123*c87b03e5Sespie@item f/BUGS
124*c87b03e5SespieLists some important bugs known to be in g77.
125*c87b03e5SespieOr use Info (or GNU Emacs Info mode) to read
126*c87b03e5Sespiethe ``Actual Bugs'' node of the @code{g77} documentation:
127*c87b03e5Sespie
128*c87b03e5Sespie@smallexample
129*c87b03e5Sespieinfo -f f/g77.info -n "Actual Bugs"
130*c87b03e5Sespie@end smallexample
131*c87b03e5Sespie
132*c87b03e5Sespie@item f/ChangeLog
133*c87b03e5SespieLists recent changes to @code{g77} internals.
134*c87b03e5Sespie
135*c87b03e5Sespie@item libf2c/ChangeLog
136*c87b03e5SespieLists recent changes to @code{libg2c} internals.
137*c87b03e5Sespie
138*c87b03e5Sespie@item f/NEWS
139*c87b03e5SespieContains the per-release changes.
140*c87b03e5SespieThese include the user-visible
141*c87b03e5Sespiechanges described in the node ``Changes''
142*c87b03e5Sespiein the @code{g77} documentation, plus internal
143*c87b03e5Sespiechanges of import.
144*c87b03e5SespieOr use:
145*c87b03e5Sespie
146*c87b03e5Sespie@smallexample
147*c87b03e5Sespieinfo -f f/g77.info -n News
148*c87b03e5Sespie@end smallexample
149*c87b03e5Sespie
150*c87b03e5Sespie@item f/g77.info*
151*c87b03e5SespieThe @code{g77} documentation, in Info format,
152*c87b03e5Sespieproduced by building @code{g77}.
153*c87b03e5Sespie
154*c87b03e5SespieAll users of @code{g77} (not just installers) should read this,
155*c87b03e5Sespieusing the @code{more} command if neither the @code{info} command,
156*c87b03e5Sespienor GNU Emacs (with its Info mode), are available, or if users
157*c87b03e5Sespiearen't yet accustomed to using these tools.
158*c87b03e5SespieAll of these files are readable as ``plain text'' files,
159*c87b03e5Sespiethough they're easier to navigate using Info readers
160*c87b03e5Sespiesuch as @code{info} and GNU Emacs Info mode.
161*c87b03e5Sespie@end table
162*c87b03e5Sespie
163*c87b03e5SespieIf you want to explore the FFE code, which lives entirely in @file{f/},
164*c87b03e5Sespiehere are a few clues.
165*c87b03e5SespieThe file @file{g77spec.c} contains the @code{g77}-specific source code
166*c87b03e5Sespiefor the @code{g77} command only---this just forms a variant of the
167*c87b03e5Sespie@code{gcc} command, so,
168*c87b03e5Sespiejust as the @code{gcc} command itself does not contain the C front end,
169*c87b03e5Sespiethe @code{g77} command does not contain the Fortran front end (FFE).
170*c87b03e5SespieThe FFE code ends up in an executable named @file{f771},
171*c87b03e5Sespiewhich does the actual compiling,
172*c87b03e5Sespieso it contains the FFE plus the @code{gcc} back end (GBE),
173*c87b03e5Sespiethe latter to do most of the optimization, and the code generation.
174*c87b03e5Sespie
175*c87b03e5SespieThe file @file{parse.c} is the source file for @code{yyparse()},
176*c87b03e5Sespiewhich is invoked by the GBE to start the compilation process,
177*c87b03e5Sespiefor @file{f771}.
178*c87b03e5Sespie
179*c87b03e5SespieThe file @file{top.c} contains the top-level FFE function @code{ffe_file}
180*c87b03e5Sespieand it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
181*c87b03e5Sespieand @samp{FFE_[A-Za-z].*} symbols.
182*c87b03e5Sespie
183*c87b03e5SespieThe file @file{fini.c} is a @code{main()} program that is used when building
184*c87b03e5Sespiethe FFE to generate C header and source files for recognizing keywords.
185*c87b03e5SespieThe files @file{malloc.c} and @file{malloc.h} comprise a memory manager
186*c87b03e5Sespiethat defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
187*c87b03e5Sespie@samp{MALLOC_[A-Za-z].*} symbols.
188*c87b03e5Sespie
189*c87b03e5SespieAll other modules named @var{xyz}
190*c87b03e5Sespieare comprised of all files named @samp{@var{xyz}*.@var{ext}}
191*c87b03e5Sespieand define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
192*c87b03e5Sespieand @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
193*c87b03e5SespieIf you understand all this, congratulations---it's easier for me to remember
194*c87b03e5Sespiehow it works than to type in these regular expressions.
195*c87b03e5SespieBut it does make it easy to find where a symbol is defined.
196*c87b03e5SespieFor example, the symbol @samp{ffexyz_set_something} would be defined
197*c87b03e5Sespiein @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
198*c87b03e5Sespie
199*c87b03e5SespieThe ``porting'' files of note currently are:
200*c87b03e5Sespie
201*c87b03e5Sespie@table @file
202*c87b03e5Sespie@item proj.c
203*c87b03e5Sespie@itemx proj.h
204*c87b03e5SespieThis defines the ``language'' used by all the other source files,
205*c87b03e5Sespiethe language being Standard C plus some useful things
206*c87b03e5Sespielike @code{ARRAY_SIZE} and such.
207*c87b03e5Sespie
208*c87b03e5Sespie@item target.c
209*c87b03e5Sespie@itemx target.h
210*c87b03e5SespieThese describe the target machine
211*c87b03e5Sespiein terms of what data types are supported,
212*c87b03e5Sespiehow they are denoted
213*c87b03e5Sespie(to what C type does an @code{INTEGER*8} map, for example),
214*c87b03e5Sespiehow to convert between them,
215*c87b03e5Sespieand so on.
216*c87b03e5SespieOver time, versions of @code{g77} rely less on this file
217*c87b03e5Sespieand more on run-time configuration based on GBE info
218*c87b03e5Sespiein @file{com.c}.
219*c87b03e5Sespie
220*c87b03e5Sespie@item com.c
221*c87b03e5Sespie@itemx com.h
222*c87b03e5SespieThese are the primary interface to the GBE.
223*c87b03e5Sespie
224*c87b03e5Sespie@item ste.c
225*c87b03e5Sespie@itemx ste.h
226*c87b03e5SespieThis contains code for implementing recognized executable statements
227*c87b03e5Sespiein the GBE.
228*c87b03e5Sespie
229*c87b03e5Sespie@item src.c
230*c87b03e5Sespie@itemx src.h
231*c87b03e5SespieThese contain information on the format(s) of source files
232*c87b03e5Sespie(such as whether they are never to be processed as case-insensitive
233*c87b03e5Sespiewith regard to Fortran keywords).
234*c87b03e5Sespie@end table
235*c87b03e5Sespie
236*c87b03e5SespieIf you want to debug the @file{f771} executable,
237*c87b03e5Sespiefor example if it crashes,
238*c87b03e5Sespienote that the global variables @code{lineno} and @code{input_filename}
239*c87b03e5Sespieare usually set to reflect the current line being read by the lexer
240*c87b03e5Sespieduring the first-pass analysis of a program unit and to reflect
241*c87b03e5Sespiethe current line being processed during the second-pass compilation
242*c87b03e5Sespieof a program unit.
243*c87b03e5Sespie
244*c87b03e5SespieIf an invocation of the function @code{ffestd_exec_end} is on the stack,
245*c87b03e5Sespiethe compiler is in the second pass, otherwise it is in the first.
246*c87b03e5Sespie
247*c87b03e5Sespie(This information might help you reduce a test case and/or work around
248*c87b03e5Sespiea bug in @code{g77} until a fix is available.)
249*c87b03e5Sespie
250*c87b03e5Sespie@node Overview of Translation Process
251*c87b03e5Sespie@section Overview of Translation Process
252*c87b03e5Sespie
253*c87b03e5SespieThe order of phases translating source code to the form accepted
254*c87b03e5Sespieby the GBE is:
255*c87b03e5Sespie
256*c87b03e5Sespie@enumerate
257*c87b03e5Sespie@item
258*c87b03e5SespieStripping punched-card sources (@file{g77stripcard.c})
259*c87b03e5Sespie
260*c87b03e5Sespie@item
261*c87b03e5SespieLexing (@file{lex.c})
262*c87b03e5Sespie
263*c87b03e5Sespie@item
264*c87b03e5SespieStand-alone statement identification (@file{sta.c})
265*c87b03e5Sespie
266*c87b03e5Sespie@item
267*c87b03e5SespieINCLUDE handling (@file{sti.c})
268*c87b03e5Sespie
269*c87b03e5Sespie@item
270*c87b03e5SespieOrder-dependent statement identification (@file{stq.c})
271*c87b03e5Sespie
272*c87b03e5Sespie@item
273*c87b03e5SespieParsing (@file{stb.c} and @file{expr.c})
274*c87b03e5Sespie
275*c87b03e5Sespie@item
276*c87b03e5SespieConstructing (@file{stc.c})
277*c87b03e5Sespie
278*c87b03e5Sespie@item
279*c87b03e5SespieCollecting (@file{std.c})
280*c87b03e5Sespie
281*c87b03e5Sespie@item
282*c87b03e5SespieExpanding (@file{ste.c})
283*c87b03e5Sespie@end enumerate
284*c87b03e5Sespie
285*c87b03e5SespieTo get a rough idea of how a particularly twisted Fortran statement
286*c87b03e5Sespiegets treated by the passes, consider:
287*c87b03e5Sespie
288*c87b03e5Sespie@smallexample
289*c87b03e5Sespie      FORMAT(I2 4H)=(J/
290*c87b03e5Sespie     &   I3)
291*c87b03e5Sespie@end smallexample
292*c87b03e5Sespie
293*c87b03e5SespieThe job of @file{lex.c} is to know enough about Fortran syntax rules
294*c87b03e5Sespieto break the statement up into distinct lexemes without requiring
295*c87b03e5Sespieany feedback from subsequent phases:
296*c87b03e5Sespie
297*c87b03e5Sespie@smallexample
298*c87b03e5Sespie`FORMAT'
299*c87b03e5Sespie`('
300*c87b03e5Sespie`I24H'
301*c87b03e5Sespie`)'
302*c87b03e5Sespie`='
303*c87b03e5Sespie`('
304*c87b03e5Sespie`J'
305*c87b03e5Sespie`/'
306*c87b03e5Sespie`I3'
307*c87b03e5Sespie`)'
308*c87b03e5Sespie@end smallexample
309*c87b03e5Sespie
310*c87b03e5SespieThe job of @file{sta.c} is to figure out the kind of statement,
311*c87b03e5Sespieor, at least, statement form, that sequence of lexemes represent.
312*c87b03e5Sespie
313*c87b03e5SespieThe sooner it can do this (in terms of using the smallest number of
314*c87b03e5Sespielexemes, starting with the first for each statement), the better,
315*c87b03e5Sespiebecause that leaves diagnostics for problems beyond the recognition
316*c87b03e5Sespieof the statement form to subsequent phases,
317*c87b03e5Sespiewhich can usually better describe the nature of the problem.
318*c87b03e5Sespie
319*c87b03e5SespieIn this case, the @samp{=} at ``level zero''
320*c87b03e5Sespie(not nested within parentheses)
321*c87b03e5Sespietells @file{sta.c} that this is an @emph{assignment-form},
322*c87b03e5Sespienot @code{FORMAT}, statement.
323*c87b03e5Sespie
324*c87b03e5SespieAn assignment-form statement might be a statement-function
325*c87b03e5Sespiedefinition or an executable assignment statement.
326*c87b03e5Sespie
327*c87b03e5SespieTo make that determination,
328*c87b03e5Sespie@file{sta.c} looks at the first two lexemes.
329*c87b03e5Sespie
330*c87b03e5SespieSince the second lexeme is @samp{(},
331*c87b03e5Sespiethe first must represent an array for this to be an assignment statement,
332*c87b03e5Sespieelse it's a statement function.
333*c87b03e5Sespie
334*c87b03e5SespieEither way, @file{sta.c} hands off the statement to @file{stq.c}
335*c87b03e5Sespie(via @file{sti.c}, which expands INCLUDE files).
336*c87b03e5Sespie@file{stq.c} figures out what a statement that is,
337*c87b03e5Sespieon its own, ambiguous, must actually be based on the context
338*c87b03e5Sespieestablished by previous statements.
339*c87b03e5Sespie
340*c87b03e5SespieSo, @file{stq.c} watches the statement stream for executable statements,
341*c87b03e5SespieEND statements, and so on, so it knows whether @samp{A(B)=C} is
342*c87b03e5Sespie(intended as) a statement-function definition or an assignment statement.
343*c87b03e5Sespie
344*c87b03e5SespieAfter establishing the context-aware statement info, @file{stq.c}
345*c87b03e5Sespiepasses the original sample statement on to @file{stb.c}
346*c87b03e5Sespie(either its statement-function parser or its assignment-statement parser).
347*c87b03e5Sespie
348*c87b03e5Sespie@file{stb.c} forms a
349*c87b03e5Sespiestatement-specific record containing the pertinent information.
350*c87b03e5SespieThat information includes a source expression and,
351*c87b03e5Sespiefor an assignment statement, a destination expression.
352*c87b03e5SespieExpressions are parsed by @file{expr.c}.
353*c87b03e5Sespie
354*c87b03e5SespieThis record is passed to @file{stc.c},
355*c87b03e5Sespiewhich copes with the implications of the statement
356*c87b03e5Sespiewithin the context established by previous statements.
357*c87b03e5Sespie
358*c87b03e5SespieFor example, if it's the first statement in the file
359*c87b03e5Sespieor after an @code{END} statement,
360*c87b03e5Sespie@file{stc.c} recognizes that, first of all,
361*c87b03e5Sespiea main program unit is now being lexed
362*c87b03e5Sespie(and tells that to @file{std.c}
363*c87b03e5Sespiebefore telling it about the current statement).
364*c87b03e5Sespie
365*c87b03e5Sespie@file{stc.c} attaches whatever information it can,
366*c87b03e5Sespieusually derived from the context established by the preceding statements,
367*c87b03e5Sespieand passes the information to @file{std.c}.
368*c87b03e5Sespie
369*c87b03e5Sespie@file{std.c} saves this information away,
370*c87b03e5Sespiesince the GBE cannot cope with information
371*c87b03e5Sespiethat might be incomplete at this stage.
372*c87b03e5Sespie
373*c87b03e5SespieFor example, @samp{I3} might later be determined
374*c87b03e5Sespieto be an argument to an alternate @code{ENTRY} point.
375*c87b03e5Sespie
376*c87b03e5SespieWhen @file{std.c} is told about the end of an external (top-level)
377*c87b03e5Sespieprogram unit,
378*c87b03e5Sespieit passes all the information it has saved away
379*c87b03e5Sespieon statements in that program unit
380*c87b03e5Sespieto @file{ste.c}.
381*c87b03e5Sespie
382*c87b03e5Sespie@file{ste.c} ``expands'' each statement, in sequence, by
383*c87b03e5Sespieconstructing the appropriate GBE information and calling
384*c87b03e5Sespiethe appropriate GBE routines.
385*c87b03e5Sespie
386*c87b03e5SespieDetails on the transformational phases follow.
387*c87b03e5SespieKeep in mind that Fortran numbering is used,
388*c87b03e5Sespieso the first character on a line is column 1,
389*c87b03e5Sespiedecimal numbering is used, and so on.
390*c87b03e5Sespie
391*c87b03e5Sespie@menu
392*c87b03e5Sespie* g77stripcard::
393*c87b03e5Sespie* lex.c::
394*c87b03e5Sespie* sta.c::
395*c87b03e5Sespie* sti.c::
396*c87b03e5Sespie* stq.c::
397*c87b03e5Sespie* stb.c::
398*c87b03e5Sespie* expr.c::
399*c87b03e5Sespie* stc.c::
400*c87b03e5Sespie* std.c::
401*c87b03e5Sespie* ste.c::
402*c87b03e5Sespie
403*c87b03e5Sespie* Gotchas (Transforming)::
404*c87b03e5Sespie* TBD (Transforming)::
405*c87b03e5Sespie@end menu
406*c87b03e5Sespie
407*c87b03e5Sespie@node g77stripcard
408*c87b03e5Sespie@subsection g77stripcard
409*c87b03e5Sespie
410*c87b03e5SespieThe @code{g77stripcard} program handles removing content beyond
411*c87b03e5Sespiecolumn 72 (adjustable via a command-line option),
412*c87b03e5Sespieoptionally warning about that content being something other
413*c87b03e5Sespiethan trailing whitespace or Fortran commentary.
414*c87b03e5Sespie
415*c87b03e5SespieThis program is needed because @code{lex.c} doesn't pay attention
416*c87b03e5Sespieto maximum line lengths at all, to make it easier to maintain,
417*c87b03e5Sespieas well as faster (for sources that don't depend on the maximum
418*c87b03e5Sespiecolumn length vis-a-vis trailing non-blank non-commentary content).
419*c87b03e5Sespie
420*c87b03e5SespieJust how this program will be run---whether automatically for
421*c87b03e5Sespieold source (perhaps as the default for @file{.f} files?)---is not
422*c87b03e5Sespieyet determined.
423*c87b03e5Sespie
424*c87b03e5SespieIn the meantime, it might as well be implemented as a typical UNIX pipe.
425*c87b03e5Sespie
426*c87b03e5SespieIt should accept a @samp{-fline-length-@var{n}} option,
427*c87b03e5Sespiewith the default line length set to 72.
428*c87b03e5Sespie
429*c87b03e5SespieWhen the text it strips off the end of a line is not blank
430*c87b03e5Sespie(not spaces and tabs),
431*c87b03e5Sespieit should insert an additional comment line
432*c87b03e5Sespie(beginning with @samp{!},
433*c87b03e5Sespieso it works for both fixed-form and free-form files)
434*c87b03e5Sespiecontaining the text,
435*c87b03e5Sespiefollowing the stripped line.
436*c87b03e5SespieThe inserted comment should have a prefix of some kind,
437*c87b03e5SespieTBD, that distinguishes the comment as representing stripped text.
438*c87b03e5SespieUsers could use that to @code{sed} out such lines, if they wished---it
439*c87b03e5Sespieseems silly to provide a command-line option to delete information
440*c87b03e5Sespiewhen it can be so easily filtered out by another program.
441*c87b03e5Sespie
442*c87b03e5Sespie(This inserted comment should be designed to ``fit in'' well
443*c87b03e5Sespiewith whatever the Fortran community is using these days for
444*c87b03e5Sespiepreprocessor, translator, and other such products, like OpenMP.
445*c87b03e5SespieWhat that's all about, and how @code{g77} can elegantly fit its
446*c87b03e5Sespiespecial comment conventions into it all, is TBD as well.
447*c87b03e5SespieWe don't want to reinvent the wheel here, but if there turn out
448*c87b03e5Sespieto be too many conflicting conventions, we might have to invent
449*c87b03e5Sespieone that looks nothing like the others, but which offers their
450*c87b03e5Sespiehost products a better infrastructure in which to fit and coexist
451*c87b03e5Sespiepeacefully.)
452*c87b03e5Sespie
453*c87b03e5Sespie@code{g77stripcard} probably shouldn't do any tab expansion or other
454*c87b03e5Sespiefancy stuff.
455*c87b03e5SespiePeople can use @code{expand} or other pre-filtering if they like.
456*c87b03e5SespieThe idea here is to keep each stage quite simple, while providing
457*c87b03e5Sespieexcellent performance for ``normal'' code.
458*c87b03e5Sespie
459*c87b03e5Sespie(Code with junk beyond column 73 is not really ``normal'',
460*c87b03e5Sespieas it comes from a card-punch heritage,
461*c87b03e5Sespieand will be increasingly hard for tomorrow's Fortran programmers to read.)
462*c87b03e5Sespie
463*c87b03e5Sespie@node lex.c
464*c87b03e5Sespie@subsection lex.c
465*c87b03e5Sespie
466*c87b03e5SespieTo help make the lexer simple, fast, and easy to maintain,
467*c87b03e5Sespiewhile also having @code{g77} generally encourage Fortran programmers
468*c87b03e5Sespieto write simple, maintainable, portable code by maximizing the
469*c87b03e5Sespieperformance of compiling that kind of code:
470*c87b03e5Sespie
471*c87b03e5Sespie@itemize @bullet
472*c87b03e5Sespie@item
473*c87b03e5SespieThere'll be just one lexer, for both fixed-form and free-form source.
474*c87b03e5Sespie
475*c87b03e5Sespie@item
476*c87b03e5SespieIt'll care about the form only when handling the first 7 columns of
477*c87b03e5Sespietext, stuff like spaces between strings of alphanumerics, and
478*c87b03e5Sespiehow lines are continued.
479*c87b03e5Sespie
480*c87b03e5SespieSome other distinctions will be handled by subsequent phases,
481*c87b03e5Sespieso at least one of them will have to know which form is involved.
482*c87b03e5Sespie
483*c87b03e5SespieFor example, @samp{I = 2 . 4} is acceptable in fixed form,
484*c87b03e5Sespieand works in free form as well given the implementation @code{g77}
485*c87b03e5Sespiepresently uses.
486*c87b03e5SespieBut the standard requires a diagnostic for it in free form,
487*c87b03e5Sespieso the parser has to be able to recognize that
488*c87b03e5Sespiethe lexemes aren't contiguous
489*c87b03e5Sespie(information the lexer @emph{does} have to provide)
490*c87b03e5Sespieand that free-form source is being parsed,
491*c87b03e5Sespieso it can provide the diagnostic.
492*c87b03e5Sespie
493*c87b03e5SespieThe @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.
494*c87b03e5SespieOtherwise, it'd have to know a whole lot more about how to parse Fortran,
495*c87b03e5Sespieor subsequent phases (mainly parsing) would have two paths through
496*c87b03e5Sespielots of critical code---one to handle the lexeme @samp{2}, @samp{.},
497*c87b03e5Sespieand @samp{4} in sequence, another to handle the lexeme @samp{2.4}.
498*c87b03e5Sespie
499*c87b03e5Sespie@item
500*c87b03e5SespieIt won't worry about line lengths
501*c87b03e5Sespie(beyond the first 7 columns for fixed-form source).
502*c87b03e5Sespie
503*c87b03e5SespieThat is, once it starts parsing the ``statement'' part of a line
504*c87b03e5Sespie(column 7 for fixed-form, column 1 for free-form),
505*c87b03e5Sespieit'll keep going until it finds a newline,
506*c87b03e5Sespierather than ignoring everything past a particular column
507*c87b03e5Sespie(72 or 132).
508*c87b03e5Sespie
509*c87b03e5SespieThe implication here is that there shouldn't @emph{be}
510*c87b03e5Sespieanything past that last column, other than whitespace or
511*c87b03e5Sespiecommentary, because users using typical editors
512*c87b03e5Sespie(or viewing output as typically printed)
513*c87b03e5Sespiewon't necessarily know just where the last column is.
514*c87b03e5Sespie
515*c87b03e5SespieCode that has ``garbage'' beyond the last column
516*c87b03e5Sespie(almost certainly only fixed-form code with a punched-card legacy,
517*c87b03e5Sespiesuch as code using columns 73-80 for ``sequence numbers'')
518*c87b03e5Sespiewill have to be run through @code{g77stripcard} first.
519*c87b03e5Sespie
520*c87b03e5SespieAlso, keeping track of the maximum column position while also watching out
521*c87b03e5Sespiefor the end of a line @emph{and} while reading from a file
522*c87b03e5Sespiejust makes things slower.
523*c87b03e5SespieSince a file must be read, and watching for the end of the line
524*c87b03e5Sespieis necessary (unless the typical input file was preprocessed to
525*c87b03e5Sespieinclude the necessary number of trailing spaces),
526*c87b03e5Sespiedropping the tracking of the maximum column position
527*c87b03e5Sespieis the only way to reduce the complexity of the pertinent code
528*c87b03e5Sespiewhile maintaining high performance.
529*c87b03e5Sespie
530*c87b03e5Sespie@item
531*c87b03e5SespieASCII encoding is assumed for the input file.
532*c87b03e5Sespie
533*c87b03e5SespieCode written in other character sets will have to be converted first.
534*c87b03e5Sespie
535*c87b03e5Sespie@item
536*c87b03e5SespieTabs (ASCII code 9)
537*c87b03e5Sespiewill be converted to spaces via the straightforward
538*c87b03e5Sespieapproach.
539*c87b03e5Sespie
540*c87b03e5SespieSpecifically, a tab is converted to between one and eight spaces
541*c87b03e5Sespieas necessary to reach column @var{n},
542*c87b03e5Sespiewhere dividing @samp{(@var{n} - 1)} by eight
543*c87b03e5Sespieresults in a remainder of zero.
544*c87b03e5Sespie
545*c87b03e5SespieThat saves having to pass most source files through @code{expand}.
546*c87b03e5Sespie
547*c87b03e5Sespie@item
548*c87b03e5SespieLinefeeds (ASCII code 10)
549*c87b03e5Sespiemark the ends of lines.
550*c87b03e5Sespie
551*c87b03e5Sespie@item
552*c87b03e5SespieA carriage return (ASCII code 13)
553*c87b03e5Sespieis accept if it immediately precedes a linefeed,
554*c87b03e5Sespiein which case it is ignored.
555*c87b03e5Sespie
556*c87b03e5SespieOtherwise, it is rejected (with a diagnostic).
557*c87b03e5Sespie
558*c87b03e5Sespie@item
559*c87b03e5SespieAny other characters other than the above
560*c87b03e5Sespiethat are not part of the GNU Fortran Character Set
561*c87b03e5Sespie(@pxref{Character Set})
562*c87b03e5Sespieare rejected with a diagnostic.
563*c87b03e5Sespie
564*c87b03e5SespieThis includes backspaces, form feeds, and the like.
565*c87b03e5Sespie
566*c87b03e5Sespie(It might make sense to allow a form feed in column 1
567*c87b03e5Sespieas long as that's the only character on a line.
568*c87b03e5SespieIt certainly wouldn't seem to cost much in terms of performance.)
569*c87b03e5Sespie
570*c87b03e5Sespie@item
571*c87b03e5SespieThe end of the input stream (EOF)
572*c87b03e5Sespieends the current line.
573*c87b03e5Sespie
574*c87b03e5Sespie@item
575*c87b03e5SespieThe distinction between uppercase and lowercase letters
576*c87b03e5Sespiewill be preserved.
577*c87b03e5Sespie
578*c87b03e5SespieIt will be up to subsequent phases to decide to fold case.
579*c87b03e5Sespie
580*c87b03e5SespieCurrent plans are to permit any casing for Fortran (reserved) keywords
581*c87b03e5Sespiewhile preserving casing for user-defined names.
582*c87b03e5Sespie(This might not be made the default for @file{.f} files, though.)
583*c87b03e5Sespie
584*c87b03e5SespiePreserving case seems necessary to provide more direct access
585*c87b03e5Sespieto facilities outside of @code{g77}, such as to C or Pascal code.
586*c87b03e5Sespie
587*c87b03e5SespieNames of intrinsics will probably be matchable in any case,
588*c87b03e5Sespie
589*c87b03e5Sespie(How @samp{external SiN; r = sin(x)} would be handled is TBD.
590*c87b03e5SespieI think old @code{g77} might already handle that pretty elegantly,
591*c87b03e5Sespiebut whether we can cope with allowing the same fragment to reference
592*c87b03e5Sespiea @emph{different} procedure, even with the same interface,
593*c87b03e5Sespievia @samp{s = SiN(r)}, needs to be determined.
594*c87b03e5SespieIf it can't, we need to make sure that when code introduces
595*c87b03e5Sespiea user-defined name, any intrinsic matching that name
596*c87b03e5Sespieusing a case-insensitive comparison
597*c87b03e5Sespieis ``turned off''.)
598*c87b03e5Sespie
599*c87b03e5Sespie@item
600*c87b03e5SespieBackslashes in @code{CHARACTER} and Hollerith constants
601*c87b03e5Sespieare not allowed.
602*c87b03e5Sespie
603*c87b03e5SespieThis avoids the confusion introduced by some Fortran compiler vendors
604*c87b03e5Sespieproviding C-like interpretation of backslashes,
605*c87b03e5Sespiewhile others provide straight-through interpretation.
606*c87b03e5Sespie
607*c87b03e5SespieSome kind of lexical construct (TBD) will be provided to allow
608*c87b03e5Sespieflagging of a @code{CHARACTER}
609*c87b03e5Sespie(but probably not a Hollerith)
610*c87b03e5Sespieconstant that permits backslashes.
611*c87b03e5SespieIt'll necessarily be a prefix, such as:
612*c87b03e5Sespie
613*c87b03e5Sespie@smallexample
614*c87b03e5SespiePRINT *, C'This line has a backspace \b here.'
615*c87b03e5SespiePRINT *, F'This line has a straight backslash \ here.'
616*c87b03e5Sespie@end smallexample
617*c87b03e5Sespie
618*c87b03e5SespieFurther, command-line options might be provided to specify that
619*c87b03e5Sespieone prefix or the other is to be assumed as the default
620*c87b03e5Sespiefor @code{CHARACTER} constants.
621*c87b03e5Sespie
622*c87b03e5SespieHowever, it seems more helpful for @code{g77} to provide a program
623*c87b03e5Sespiethat converts prefix all constants
624*c87b03e5Sespie(or just those containing backslashes)
625*c87b03e5Sespiewith the desired designation,
626*c87b03e5Sespieso printouts of code can be read
627*c87b03e5Sespiewithout knowing the compile-time options used when compiling it.
628*c87b03e5Sespie
629*c87b03e5SespieIf such a program is provided
630*c87b03e5Sespie(let's name it @code{g77slash} for now),
631*c87b03e5Sespiethen a command-line option to @code{g77} should not be provided.
632*c87b03e5Sespie(Though, given that it'll be easy to implement, it might be hard
633*c87b03e5Sespieto resist user requests for it ``to compile faster than if we
634*c87b03e5Sespiehave to invoke another filter''.)
635*c87b03e5Sespie
636*c87b03e5SespieThis program would take a command-line option to specify the
637*c87b03e5Sespiedefault interpretation of slashes,
638*c87b03e5Sespieaffecting which prefix it uses for constants.
639*c87b03e5Sespie
640*c87b03e5Sespie@code{g77slash} probably should automatically convert Hollerith
641*c87b03e5Sespieconstants that contain slashes
642*c87b03e5Sespieto the appropriate @code{CHARACTER} constants.
643*c87b03e5SespieThen @code{g77} wouldn't have to define a prefix syntax for Hollerith
644*c87b03e5Sespieconstants specifying whether they want C-style or straight-through
645*c87b03e5Sespiebackslashes.
646*c87b03e5Sespie
647*c87b03e5Sespie@item
648*c87b03e5SespieTo allow for form-neutral INCLUDE files without requiring them
649*c87b03e5Sespieto be preprocessed,
650*c87b03e5Sespiethe fixed-form lexer should offer an extension (if possible)
651*c87b03e5Sespieallowing a trailing @samp{&} to be ignored, especially if after
652*c87b03e5Sespiecolumn 72, as it would be using the traditional Unix Fortran source
653*c87b03e5Sespiemodel (which ignores @emph{everything} after column 72).
654*c87b03e5Sespie@end itemize
655*c87b03e5Sespie
656*c87b03e5SespieThe above implements nearly exactly what is specified by
657*c87b03e5Sespie@ref{Character Set},
658*c87b03e5Sespieand
659*c87b03e5Sespie@ref{Lines},
660*c87b03e5Sespieexcept it also provides automatic conversion of tabs
661*c87b03e5Sespieand ignoring of newline-related carriage returns,
662*c87b03e5Sespieas well as accommodating form-neutral INCLUDE files.
663*c87b03e5Sespie
664*c87b03e5SespieIt also implements the ``pure visual'' model,
665*c87b03e5Sespieby which is meant that a user viewing his code
666*c87b03e5Sespiein a typical text editor
667*c87b03e5Sespie(assuming it's not preprocessed via @code{g77stripcard} or similar)
668*c87b03e5Sespiedoesn't need any special knowledge
669*c87b03e5Sespieof whether spaces on the screen are really tabs,
670*c87b03e5Sespiewhether lines end immediately after the last visible non-space character
671*c87b03e5Sespieor after a number of spaces and tabs that follow it,
672*c87b03e5Sespieor whether the last line in the file is ended by a newline.
673*c87b03e5Sespie
674*c87b03e5SespieMost editors don't make these distinctions,
675*c87b03e5Sespiethe ANSI FORTRAN 77 standard doesn't require them to,
676*c87b03e5Sespieand it permits a standard-conforming compiler
677*c87b03e5Sespieto define a method for transforming source code to
678*c87b03e5Sespie``standard form'' however it wants.
679*c87b03e5Sespie
680*c87b03e5SespieSo, GNU Fortran defines it such that users have the best chance
681*c87b03e5Sespieof having the code be interpreted the way it looks on the screen
682*c87b03e5Sespieof the typical editor.
683*c87b03e5Sespie
684*c87b03e5Sespie(Fancy editors should @emph{never} be required to correctly read code
685*c87b03e5Sespiewritten in classic two-dimensional-plaintext form.
686*c87b03e5SespieBy correct reading I mean ability to read it, book-like, without
687*c87b03e5Sespiemistaking text ignored by the compiler for program code and vice versa,
688*c87b03e5Sespieand without having to count beyond the first several columns.
689*c87b03e5SespieThe vague meaning of ASCII TAB, among other things, complicates
690*c87b03e5Sespiethis somewhat, but as long as ``everyone'', including the editor,
691*c87b03e5Sespieother tools, and printer, agrees about the every-eighth-column convention,
692*c87b03e5Sespiethe GNU Fortran ``pure visual'' model meets these requirements.
693*c87b03e5SespieAny language or user-visible source form
694*c87b03e5Sespierequiring special tagging of tabs,
695*c87b03e5Sespiethe ends of lines after spaces/tabs,
696*c87b03e5Sespieand so on, fails to meet this fairly straightforward specification.
697*c87b03e5SespieFortunately, Fortran @emph{itself} does not mandate such a failure,
698*c87b03e5Sespiethough most vendor-supplied defaults for their Fortran compilers @emph{do}
699*c87b03e5Sespiefail to meet this specification for readability.)
700*c87b03e5Sespie
701*c87b03e5SespieFurther, this model provides a clean interface
702*c87b03e5Sespieto whatever preprocessors or code-generators are used
703*c87b03e5Sespieto produce input to this phase of @code{g77}.
704*c87b03e5SespieMainly, they need not worry about long lines.
705*c87b03e5Sespie
706*c87b03e5Sespie@node sta.c
707*c87b03e5Sespie@subsection sta.c
708*c87b03e5Sespie
709*c87b03e5Sespie@node sti.c
710*c87b03e5Sespie@subsection sti.c
711*c87b03e5Sespie
712*c87b03e5Sespie@node stq.c
713*c87b03e5Sespie@subsection stq.c
714*c87b03e5Sespie
715*c87b03e5Sespie@node stb.c
716*c87b03e5Sespie@subsection stb.c
717*c87b03e5Sespie
718*c87b03e5Sespie@node expr.c
719*c87b03e5Sespie@subsection expr.c
720*c87b03e5Sespie
721*c87b03e5Sespie@node stc.c
722*c87b03e5Sespie@subsection stc.c
723*c87b03e5Sespie
724*c87b03e5Sespie@node std.c
725*c87b03e5Sespie@subsection std.c
726*c87b03e5Sespie
727*c87b03e5Sespie@node ste.c
728*c87b03e5Sespie@subsection ste.c
729*c87b03e5Sespie
730*c87b03e5Sespie@node Gotchas (Transforming)
731*c87b03e5Sespie@subsection Gotchas (Transforming)
732*c87b03e5Sespie
733*c87b03e5SespieThis section is not about transforming ``gotchas'' into something else.
734*c87b03e5SespieIt is about the weirder aspects of transforming Fortran,
735*c87b03e5Sespiehowever that's defined,
736*c87b03e5Sespieinto a more modern, canonical form.
737*c87b03e5Sespie
738*c87b03e5Sespie@subsubsection Multi-character Lexemes
739*c87b03e5Sespie
740*c87b03e5SespieEach lexeme carries with it a pointer to where it appears in the source.
741*c87b03e5Sespie
742*c87b03e5SespieTo provide the ability for diagnostics to point to column numbers,
743*c87b03e5Sespiein addition to line numbers and names,
744*c87b03e5Sespielexemes that represent more than one (significant) character
745*c87b03e5Sespiein the source code need, generally,
746*c87b03e5Sespieto provide pointers to where each @emph{character} appears in the source.
747*c87b03e5Sespie
748*c87b03e5SespieThis provides the ability to properly identify the precise location
749*c87b03e5Sespieof the problem in code like
750*c87b03e5Sespie
751*c87b03e5Sespie@smallexample
752*c87b03e5SespieSUBROUTINE X
753*c87b03e5SespieEND
754*c87b03e5SespieBLOCK DATA X
755*c87b03e5SespieEND
756*c87b03e5Sespie@end smallexample
757*c87b03e5Sespie
758*c87b03e5Sespiewhich, in fixed-form source, would result in single lexemes
759*c87b03e5Sespieconsisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.
760*c87b03e5Sespie(The problem is that @samp{X} is defined twice,
761*c87b03e5Sespieso a pointer to the @samp{X} in the second definition,
762*c87b03e5Sespieas well as a follow-up pointer to the corresponding pointer in the first,
763*c87b03e5Sespiewould be preferable to pointing to the beginnings of the statements.)
764*c87b03e5Sespie
765*c87b03e5SespieThis need also arises when parsing (and diagnosing) @code{FORMAT}
766*c87b03e5Sespiestatements.
767*c87b03e5Sespie
768*c87b03e5SespieFurther, it arises when diagnosing
769*c87b03e5Sespie@code{FMT=} specifiers that contain constants
770*c87b03e5Sespie(or partial constants, or even propagated constants!)
771*c87b03e5Sespiein I/O statements, as in:
772*c87b03e5Sespie
773*c87b03e5Sespie@smallexample
774*c87b03e5SespiePRINT '(I2, 3HAB)', J
775*c87b03e5Sespie@end smallexample
776*c87b03e5Sespie
777*c87b03e5Sespie(A pointer to the beginning of the prematurely-terminated Hollerith
778*c87b03e5Sespieconstant, and/or to the close parenthese, is preferable to a pointer
779*c87b03e5Sespieto the open-parenthese or the apostrophe that precedes it.)
780*c87b03e5Sespie
781*c87b03e5SespieMulti-character lexemes, which would seem to naturally include
782*c87b03e5Sespieat least digit strings, alphanumeric strings, @code{CHARACTER}
783*c87b03e5Sespieconstants, and Hollerith constants, therefore need to provide
784*c87b03e5Sespielocation information on each character.
785*c87b03e5Sespie(Maybe Hollerith constants don't, but it's unnecessary to except them.)
786*c87b03e5Sespie
787*c87b03e5SespieThe question then arises, what about @emph{other} multi-character lexemes,
788*c87b03e5Sespiesuch as @samp{**} and @samp{//},
789*c87b03e5Sespieand Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?
790*c87b03e5Sespie
791*c87b03e5SespieTurns out there's a need to identify the location of the second character
792*c87b03e5Sespieof these two-character lexemes.
793*c87b03e5SespieFor example, in @samp{I(/J) = K}, the slash needs to be diagnosed
794*c87b03e5Sespieas the problem, not the open parenthese.
795*c87b03e5SespieSimilarly, it is preferable to diagnose the second slash in
796*c87b03e5Sespie@samp{I = J // K} rather than the first, given the implicit typing
797*c87b03e5Sespierules, which would result in the compiler disallowing the attempted
798*c87b03e5Sespieconcatenation of two integers.
799*c87b03e5Sespie(Though, since that's more of a semantic issue,
800*c87b03e5Sespieit's not @emph{that} much preferable.)
801*c87b03e5Sespie
802*c87b03e5SespieEven sequences that could be parsed as digit strings could use location info,
803*c87b03e5Sespiefor example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.
804*c87b03e5Sespie(This probably will be parsed as a character string,
805*c87b03e5Sespieto be consistent with the parsing of @samp{Z'129A'}.)
806*c87b03e5Sespie
807*c87b03e5SespieTo avoid the hassle of recording the location of the second character,
808*c87b03e5Sespiewhile also preserving the general rule that each significant character
809*c87b03e5Sespieis distinctly pointed to by the lexeme that contains it,
810*c87b03e5Sespieit's best to simply not have any fixed-size lexemes
811*c87b03e5Sespielarger than one character.
812*c87b03e5Sespie
813*c87b03e5SespieThis new design is expected to make checking for two
814*c87b03e5Sespie@samp{*} lexemes in a row much easier than the old design,
815*c87b03e5Sespieso this is not much of a sacrifice.
816*c87b03e5SespieIt probably makes the lexer much easier to implement
817*c87b03e5Sespiethan it makes the parser harder.
818*c87b03e5Sespie
819*c87b03e5Sespie@subsubsection Space-padding Lexemes
820*c87b03e5Sespie
821*c87b03e5SespieCertain lexemes need to be padded with virtual spaces when the
822*c87b03e5Sespieend of the line (or file) is encountered.
823*c87b03e5Sespie
824*c87b03e5SespieThis is necessary in fixed form, to handle lines that don't
825*c87b03e5Sespieextend to column 72, assuming that's the line length in effect.
826*c87b03e5Sespie
827*c87b03e5Sespie@subsubsection Bizarre Free-form Hollerith Constants
828*c87b03e5Sespie
829*c87b03e5SespieLast I checked, the Fortran 90 standard actually required the compiler
830*c87b03e5Sespieto silently accept something like
831*c87b03e5Sespie
832*c87b03e5Sespie@smallexample
833*c87b03e5SespieFORMAT ( 1 2   Htwelve chars )
834*c87b03e5Sespie@end smallexample
835*c87b03e5Sespie
836*c87b03e5Sespieas a valid @code{FORMAT} statement specifying a twelve-character
837*c87b03e5SespieHollerith constant.
838*c87b03e5Sespie
839*c87b03e5SespieThe implication here is that, since the new lexer is a zero-feedback one,
840*c87b03e5Sespieit won't know that the special case of a @code{FORMAT} statement being parsed
841*c87b03e5Sespierequires apparently distinct lexemes @samp{1} and @samp{2} to be treated as
842*c87b03e5Sespiea single lexeme.
843*c87b03e5Sespie
844*c87b03e5Sespie(This is a horrible misfeature of the Fortran 90 language.
845*c87b03e5SespieIt's one of many such misfeatures that almost make me want
846*c87b03e5Sespieto not support them, and forge ahead with designing a new
847*c87b03e5Sespie``GNU Fortran'' language that has the features,
848*c87b03e5Sespiebut not the misfeatures, of Fortran 90,
849*c87b03e5Sespieand provide utility programs to do the conversion automatically.)
850*c87b03e5Sespie
851*c87b03e5SespieSo, the lexer must gather distinct chunks of decimal strings into
852*c87b03e5Sespiea single lexeme in contexts where a single decimal lexeme might
853*c87b03e5Sespiestart a Hollerith constant.
854*c87b03e5Sespie
855*c87b03e5Sespie(Which probably means it might as well do that all the time
856*c87b03e5Sespiefor all multi-character lexemes, even in free-form mode,
857*c87b03e5Sespieleaving it to subsequent phases to pull them apart as they see fit.)
858*c87b03e5Sespie
859*c87b03e5SespieCompare the treatment of this to how
860*c87b03e5Sespie
861*c87b03e5Sespie@smallexample
862*c87b03e5SespieCHARACTER * 4 5 HEY
863*c87b03e5Sespie@end smallexample
864*c87b03e5Sespie
865*c87b03e5Sespieand
866*c87b03e5Sespie
867*c87b03e5Sespie@smallexample
868*c87b03e5SespieCHARACTER * 12 HEY
869*c87b03e5Sespie@end smallexample
870*c87b03e5Sespie
871*c87b03e5Sespiemust be treated---the former must be diagnosed, due to the separation
872*c87b03e5Sespiebetween lexemes, the latter must be accepted as a proper declaration.
873*c87b03e5Sespie
874*c87b03e5Sespie@subsubsection Hollerith Constants
875*c87b03e5Sespie
876*c87b03e5SespieRecognizing a Hollerith constant---specifically,
877*c87b03e5Sespiethat an @samp{H} or @samp{h} after a digit string begins
878*c87b03e5Sespiesuch a constant---requires some knowledge of context.
879*c87b03e5Sespie
880*c87b03e5SespieHollerith constants (such as @samp{2HAB}) can appear after:
881*c87b03e5Sespie
882*c87b03e5Sespie@itemize @bullet
883*c87b03e5Sespie@item
884*c87b03e5Sespie@samp{(}
885*c87b03e5Sespie
886*c87b03e5Sespie@item
887*c87b03e5Sespie@samp{,}
888*c87b03e5Sespie
889*c87b03e5Sespie@item
890*c87b03e5Sespie@samp{=}
891*c87b03e5Sespie
892*c87b03e5Sespie@item
893*c87b03e5Sespie@samp{+}, @samp{-}, @samp{/}
894*c87b03e5Sespie
895*c87b03e5Sespie@item
896*c87b03e5Sespie@samp{*}, except as noted below
897*c87b03e5Sespie@end itemize
898*c87b03e5Sespie
899*c87b03e5SespieHollerith constants don't appear after:
900*c87b03e5Sespie
901*c87b03e5Sespie@itemize @bullet
902*c87b03e5Sespie@item
903*c87b03e5Sespie@samp{CHARACTER*},
904*c87b03e5Sespiewhich can be treated generally as
905*c87b03e5Sespieany @samp{*} that is the second lexeme of a statement
906*c87b03e5Sespie@end itemize
907*c87b03e5Sespie
908*c87b03e5Sespie@subsubsection Confusing Function Keyword
909*c87b03e5Sespie
910*c87b03e5SespieWhile
911*c87b03e5Sespie
912*c87b03e5Sespie@smallexample
913*c87b03e5SespieREAL FUNCTION FOO ()
914*c87b03e5Sespie@end smallexample
915*c87b03e5Sespie
916*c87b03e5Sespiemust be a @code{FUNCTION} statement and
917*c87b03e5Sespie
918*c87b03e5Sespie@smallexample
919*c87b03e5SespieREAL FUNCTION FOO (5)
920*c87b03e5Sespie@end smallexample
921*c87b03e5Sespie
922*c87b03e5Sespiemust be a type-definition statement,
923*c87b03e5Sespie
924*c87b03e5Sespie@smallexample
925*c87b03e5SespieREAL FUNCTION FOO (@var{names})
926*c87b03e5Sespie@end smallexample
927*c87b03e5Sespie
928*c87b03e5Sespiewhere @var{names} is a comma-separated list of names,
929*c87b03e5Sespiecan be one or the other.
930*c87b03e5Sespie
931*c87b03e5SespieThe only way to disambiguate that statement
932*c87b03e5Sespie(short of mandating free-form source or a short maximum
933*c87b03e5Sespielength for name for external procedures)
934*c87b03e5Sespieis based on the context of the statement.
935*c87b03e5Sespie
936*c87b03e5SespieIn particular, the statement is known to be within an
937*c87b03e5Sespiealready-started program unit
938*c87b03e5Sespie(but not at the outer level of the @code{CONTAINS} block),
939*c87b03e5Sespieit is a type-declaration statement.
940*c87b03e5Sespie
941*c87b03e5SespieOtherwise, the statement is a @code{FUNCTION} statement,
942*c87b03e5Sespiein that it begins a function program unit
943*c87b03e5Sespie(external, or, within @code{CONTAINS}, nested).
944*c87b03e5Sespie
945*c87b03e5Sespie@subsubsection Weird READ
946*c87b03e5Sespie
947*c87b03e5SespieThe statement
948*c87b03e5Sespie
949*c87b03e5Sespie@smallexample
950*c87b03e5SespieREAD (N)
951*c87b03e5Sespie@end smallexample
952*c87b03e5Sespie
953*c87b03e5Sespieis equivalent to either
954*c87b03e5Sespie
955*c87b03e5Sespie@smallexample
956*c87b03e5SespieREAD (UNIT=(N))
957*c87b03e5Sespie@end smallexample
958*c87b03e5Sespie
959*c87b03e5Sespieor
960*c87b03e5Sespie
961*c87b03e5Sespie@smallexample
962*c87b03e5SespieREAD (FMT=(N))
963*c87b03e5Sespie@end smallexample
964*c87b03e5Sespie
965*c87b03e5Sespiedepending on which would be valid in context.
966*c87b03e5Sespie
967*c87b03e5SespieSpecifically, if @samp{N} is type @code{INTEGER},
968*c87b03e5Sespie@samp{READ (FMT=(N))} would not be valid,
969*c87b03e5Sespiebecause parentheses may not be used around @samp{N},
970*c87b03e5Sespiewhereas they may around it in @samp{READ (UNIT=(N))}.
971*c87b03e5Sespie
972*c87b03e5SespieFurther, if @samp{N} is type @code{CHARACTER},
973*c87b03e5Sespiethe opposite is true---@samp{READ (UNIT=(N))} is not valid,
974*c87b03e5Sespiebut @samp{READ (FMT=(N))} is.
975*c87b03e5Sespie
976*c87b03e5SespieStrictly speaking, if anything follows
977*c87b03e5Sespie
978*c87b03e5Sespie@smallexample
979*c87b03e5SespieREAD (N)
980*c87b03e5Sespie@end smallexample
981*c87b03e5Sespie
982*c87b03e5Sespiein the statement, whether the first lexeme after the close
983*c87b03e5Sespieparenthese is a comma could be used to disambiguate the two cases,
984*c87b03e5Sespiewithout looking at the type of @samp{N},
985*c87b03e5Sespiebecause the comma is required for the @samp{READ (FMT=(N))}
986*c87b03e5Sespieinterpretation and disallowed for the @samp{READ (UNIT=(N))}
987*c87b03e5Sespieinterpretation.
988*c87b03e5Sespie
989*c87b03e5SespieHowever, in practice, many Fortran compilers allow
990*c87b03e5Sespiethe comma for the @samp{READ (UNIT=(N))}
991*c87b03e5Sespieinterpretation anyway
992*c87b03e5Sespie(in that they generally allow a leading comma before
993*c87b03e5Sespiean I/O list in an I/O statement),
994*c87b03e5Sespieand much code takes advantage of this allowance.
995*c87b03e5Sespie
996*c87b03e5Sespie(This is quite a reasonable allowance, since the
997*c87b03e5Sespiejuxtaposition of a comma-separated list immediately
998*c87b03e5Sespieafter an I/O control-specification list, which is also comma-separated,
999*c87b03e5Sespiewithout an intervening comma,
1000*c87b03e5Sespielooks sufficiently ``wrong'' to programmers
1001*c87b03e5Sespiethat they can't resist the itch to insert the comma.
1002*c87b03e5Sespie@samp{READ (I, J), K, L} simply looks cleaner than
1003*c87b03e5Sespie@samp{READ (I, J) K, L}.)
1004*c87b03e5Sespie
1005*c87b03e5SespieSo, type-based disambiguation is needed unless strict adherence
1006*c87b03e5Sespieto the standard is always assumed, and we're not going to assume that.
1007*c87b03e5Sespie
1008*c87b03e5Sespie@node TBD (Transforming)
1009*c87b03e5Sespie@subsection TBD (Transforming)
1010*c87b03e5Sespie
1011*c87b03e5SespieContinue researching gotchas, designing the transformational process,
1012*c87b03e5Sespieand implementing it.
1013*c87b03e5Sespie
1014*c87b03e5SespieSpecific issues to resolve:
1015*c87b03e5Sespie
1016*c87b03e5Sespie@itemize @bullet
1017*c87b03e5Sespie@item
1018*c87b03e5SespieJust where should (if it was implemented) @code{USE} processing take place?
1019*c87b03e5Sespie
1020*c87b03e5SespieThis gets into the whole issue of how @code{g77} should handle the concept
1021*c87b03e5Sespieof modules.
1022*c87b03e5SespieI think GNAT already takes on this issue, but don't know more than that.
1023*c87b03e5SespieJim Giles has written extensively on @code{comp.lang.fortran}
1024*c87b03e5Sespieabout his opinions on module handling, as have others.
1025*c87b03e5SespieJim's views should be taken into account.
1026*c87b03e5Sespie
1027*c87b03e5SespieActually, Richard M. Stallman (RMS) also has written up
1028*c87b03e5Sespiesome guidelines for implementing such things,
1029*c87b03e5Sespiebut I'm not sure where I read them.
1030*c87b03e5SespiePerhaps the old @email{gcc2@@cygnus.com} list.
1031*c87b03e5Sespie
1032*c87b03e5SespieIf someone could dig references to these up and get them to me,
1033*c87b03e5Sespiethat would be much appreciated!
1034*c87b03e5SespieEven though modules are not on the short-term list for implementation,
1035*c87b03e5Sespieit'd be helpful to know @emph{now} how to avoid making them harder to
1036*c87b03e5Sespieimplement them @emph{later}.
1037*c87b03e5Sespie
1038*c87b03e5Sespie@item
1039*c87b03e5SespieShould the @code{g77} command become just a script that invokes
1040*c87b03e5Sespieall the various preprocessing that might be needed,
1041*c87b03e5Sespiethus making it seem slower than necessary for legacy code
1042*c87b03e5Sespiethat people are unwilling to convert,
1043*c87b03e5Sespieor should we provide a separate script for that,
1044*c87b03e5Sespiethus encouraging people to convert their code once and for all?
1045*c87b03e5Sespie
1046*c87b03e5SespieAt least, a separate script to behave as old @code{g77} did,
1047*c87b03e5Sespieperhaps named @code{g77old}, might ease the transition,
1048*c87b03e5Sespieas might a corresponding one that converts source codes
1049*c87b03e5Sespienamed @code{g77oldnew}.
1050*c87b03e5Sespie
1051*c87b03e5SespieThese scripts would take all the pertinent options @code{g77} used
1052*c87b03e5Sespieto take and run the appropriate filters,
1053*c87b03e5Sespiepassing the results to @code{g77} or just making new sources out of them
1054*c87b03e5Sespie(in a subdirectory, leaving the user to do the dirty deed of
1055*c87b03e5Sespiemoving or copying them over the old sources).
1056*c87b03e5Sespie
1057*c87b03e5Sespie@item
1058*c87b03e5SespieDo other Fortran compilers provide a prefix syntax
1059*c87b03e5Sespieto govern the treatment of backslashes in @code{CHARACTER}
1060*c87b03e5Sespie(or Hollerith) constants?
1061*c87b03e5Sespie
1062*c87b03e5SespieKnowing what other compilers provide would help.
1063*c87b03e5Sespie
1064*c87b03e5Sespie@item
1065*c87b03e5SespieIs it okay to drop support for the @samp{-fintrin-case-initcap},
1066*c87b03e5Sespie@samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap},
1067*c87b03e5Sespieand @samp{-fcase-initcap} options?
1068*c87b03e5Sespie
1069*c87b03e5SespieI've asked @email{info-gnu-fortran@@gnu.org} for input on this.
1070*c87b03e5SespieNot having to support these makes it easier to write the new front end,
1071*c87b03e5Sespieand might also avoid complicated its design.
1072*c87b03e5Sespie
1073*c87b03e5SespieThe consensus to date (1999-11-17) has been to drop this support.
1074*c87b03e5SespieCan't recall anybody saying they're using it, in fact.
1075*c87b03e5Sespie@end itemize
1076*c87b03e5Sespie
1077*c87b03e5Sespie@node Philosophy of Code Generation
1078*c87b03e5Sespie@section Philosophy of Code Generation
1079*c87b03e5Sespie
1080*c87b03e5SespieDon't poke the bear.
1081*c87b03e5Sespie
1082*c87b03e5SespieThe @code{g77} front end generates code
1083*c87b03e5Sespievia the @code{gcc} back end.
1084*c87b03e5Sespie
1085*c87b03e5Sespie@cindex GNU Back End (GBE)
1086*c87b03e5Sespie@cindex GBE
1087*c87b03e5Sespie@cindex @code{gcc}, back end
1088*c87b03e5Sespie@cindex back end, gcc
1089*c87b03e5Sespie@cindex code generator
1090*c87b03e5SespieThe @code{gcc} back end (GBE) is a large, complex
1091*c87b03e5Sespielabyrinth of intricate code
1092*c87b03e5Sespiewritten in a combination of the C language
1093*c87b03e5Sespieand specialized languages internal to @code{gcc}.
1094*c87b03e5Sespie
1095*c87b03e5SespieWhile the @emph{code} that implements the GBE
1096*c87b03e5Sespieis written in a combination of languages,
1097*c87b03e5Sespiethe GBE itself is,
1098*c87b03e5Sespieto the front end for a language like Fortran,
1099*c87b03e5Sespiebest viewed as a @emph{compiler}
1100*c87b03e5Sespiethat compiles its own, unique, language.
1101*c87b03e5Sespie
1102*c87b03e5SespieThe GBE's ``source'', then, is written in this language,
1103*c87b03e5Sespiewhich consists primarily of
1104*c87b03e5Sespiea combination of calls to GBE functions
1105*c87b03e5Sespieand @dfn{tree} nodes
1106*c87b03e5Sespie(which are, themselves, created
1107*c87b03e5Sespieby calling GBE functions).
1108*c87b03e5Sespie
1109*c87b03e5SespieSo, the @code{g77} generates code by, in effect,
1110*c87b03e5Sespietranslating the Fortran code it reads
1111*c87b03e5Sespieinto a form ``written'' in the ``language''
1112*c87b03e5Sespieof the @code{gcc} back end.
1113*c87b03e5Sespie
1114*c87b03e5Sespie@cindex GBEL
1115*c87b03e5Sespie@cindex GNU Back End Language (GBEL)
1116*c87b03e5SespieThis language will heretofore be referred to as @dfn{GBEL},
1117*c87b03e5Sespiefor GNU Back End Language.
1118*c87b03e5Sespie
1119*c87b03e5SespieGBEL is an evolving language,
1120*c87b03e5Sespienot fully specified in any published form
1121*c87b03e5Sespieas of this writing.
1122*c87b03e5SespieIt offers many facilities,
1123*c87b03e5Sespiebut its ``core'' facilities
1124*c87b03e5Sespieare those that corresponding most directly
1125*c87b03e5Sespieto those needed to support @code{gcc}
1126*c87b03e5Sespie(compiling code written in GNU C).
1127*c87b03e5Sespie
1128*c87b03e5SespieThe @code{g77} Fortran Front End (FFE)
1129*c87b03e5Sespieis designed and implemented
1130*c87b03e5Sespieto navigate the currents and eddies
1131*c87b03e5Sespieof ongoing GBEL and @code{gcc} development
1132*c87b03e5Sespiewhile also delivering on the potential
1133*c87b03e5Sespieof an integrated FFE
1134*c87b03e5Sespie(as compared to using a converter like @code{f2c}
1135*c87b03e5Sespieand feeding the output into @code{gcc}).
1136*c87b03e5Sespie
1137*c87b03e5SespieGoals of the FFE's code-generation strategy include:
1138*c87b03e5Sespie
1139*c87b03e5Sespie@itemize @bullet
1140*c87b03e5Sespie@item
1141*c87b03e5SespieHigh likelihood of generation of correct code,
1142*c87b03e5Sespieor, failing that, producing a fatal diagnostic or crashing.
1143*c87b03e5Sespie
1144*c87b03e5Sespie@item
1145*c87b03e5SespieGeneration of highly optimized code,
1146*c87b03e5Sespieas directed by the user
1147*c87b03e5Sespievia GBE-specific (versus @code{g77}-specific) constructs,
1148*c87b03e5Sespiesuch as command-line options.
1149*c87b03e5Sespie
1150*c87b03e5Sespie@item
1151*c87b03e5SespieFast overall (FFE plus GBE) compilation.
1152*c87b03e5Sespie
1153*c87b03e5Sespie@item
1154*c87b03e5SespiePreservation of source-level debugging information.
1155*c87b03e5Sespie@end itemize
1156*c87b03e5Sespie
1157*c87b03e5SespieThe strategies historically, and currently, used by the FFE
1158*c87b03e5Sespieto achieve these goals include:
1159*c87b03e5Sespie
1160*c87b03e5Sespie@itemize @bullet
1161*c87b03e5Sespie@item
1162*c87b03e5SespieUse of GBEL constructs that most faithfully encapsulate
1163*c87b03e5Sespiethe semantics of Fortran.
1164*c87b03e5Sespie
1165*c87b03e5Sespie@item
1166*c87b03e5SespieAvoidance of GBEL constructs that are so rarely used,
1167*c87b03e5Sespieor limited to use in specialized situations not related to Fortran,
1168*c87b03e5Sespiethat their reliability and performance has not yet been established
1169*c87b03e5Sespieas sufficient for use by the FFE.
1170*c87b03e5Sespie
1171*c87b03e5Sespie@item
1172*c87b03e5SespieFlexible design, to readily accommodate changes to specific
1173*c87b03e5Sespiecode-generation strategies, perhaps governed by command-line options.
1174*c87b03e5Sespie@end itemize
1175*c87b03e5Sespie
1176*c87b03e5Sespie@cindex Bear-poking
1177*c87b03e5Sespie@cindex Poking the bear
1178*c87b03e5Sespie``Don't poke the bear'' somewhat summarizes the above strategies.
1179*c87b03e5SespieThe GBE is the bear.
1180*c87b03e5SespieThe FFE is designed and implemented to avoid poking it
1181*c87b03e5Sespiein ways that are likely to just annoy it.
1182*c87b03e5SespieThe FFE usually either tackles it head-on,
1183*c87b03e5Sespieor avoids treating it in ways dissimilar to how
1184*c87b03e5Sespiethe @code{gcc} front end treats it.
1185*c87b03e5Sespie
1186*c87b03e5SespieFor example, the FFE uses the native array facility in the back end
1187*c87b03e5Sespieinstead of the lower-level pointer-arithmetic facility
1188*c87b03e5Sespieused by @code{gcc} when compiling @code{f2c} output).
1189*c87b03e5SespieTheoretically, this presents more opportunities for optimization,
1190*c87b03e5Sespiefaster compile times,
1191*c87b03e5Sespieand the production of more faithful debugging information.
1192*c87b03e5SespieThese benefits were not, however, immediately realized,
1193*c87b03e5Sespiemainly because @code{gcc} itself makes little or no use
1194*c87b03e5Sespieof the native array facility.
1195*c87b03e5Sespie
1196*c87b03e5SespieComplex arithmetic is a case study of the evolution of this strategy.
1197*c87b03e5SespieWhen originally implemented,
1198*c87b03e5Sespiethe GBEL had just evolved its own native complex-arithmetic facility,
1199*c87b03e5Sespieso the FFE took advantage of that.
1200*c87b03e5Sespie
1201*c87b03e5SespieWhen porting @code{g77} to 64-bit systems,
1202*c87b03e5Sespieit was discovered that the GBE didn't really
1203*c87b03e5Sespieimplement its native complex-arithmetic facility properly.
1204*c87b03e5Sespie
1205*c87b03e5SespieThe short-term solution was to rewrite the FFE
1206*c87b03e5Sespieto instead use the lower-level facilities
1207*c87b03e5Sespiethat'd be used by @code{gcc}-compiled code
1208*c87b03e5Sespie(assuming that code, itself, didn't use the native complex type
1209*c87b03e5Sespieprovided, as an extension, by @code{gcc}),
1210*c87b03e5Sespiesince these were known to work,
1211*c87b03e5Sespieand, in any case, if shown to not work,
1212*c87b03e5Sespiewould likely be rapidly fixed
1213*c87b03e5Sespie(since they'd likely not work for vanilla C code in similar circumstances).
1214*c87b03e5Sespie
1215*c87b03e5SespieHowever, the rewrite accommodated the original, native approach as well
1216*c87b03e5Sespieby offering a command-line option to select it over the emulated approach.
1217*c87b03e5SespieThis allowed users, and especially GBE maintainers, to try out
1218*c87b03e5Sespiefixes to complex-arithmetic support in the GBE
1219*c87b03e5Sespiewhile @code{g77} continued to default to compiling more code correctly,
1220*c87b03e5Sespiealbeit producing (typically) slower executables.
1221*c87b03e5Sespie
1222*c87b03e5SespieAs of April 1999, it appeared that the last few bugs
1223*c87b03e5Sespiein the GBE's support of its native complex-arithmetic facility
1224*c87b03e5Sespiewere worked out.
1225*c87b03e5SespieThe FFE was changed back to default to using that native facility,
1226*c87b03e5Sespieleaving emulation as an option.
1227*c87b03e5Sespie
1228*c87b03e5SespieLater during the release cycle
1229*c87b03e5Sespie(which was called EGCS 1.2, but soon became GCC 2.95),
1230*c87b03e5Sespiebugs in the native facility were found.
1231*c87b03e5SespieReactions among various people included
1232*c87b03e5Sespie``the last thing we should do is change the default back'',
1233*c87b03e5Sespie``we must change the default back'',
1234*c87b03e5Sespieand ``let's figure out whether we can narrow down the bugs to
1235*c87b03e5Sespiefew enough cases to allow the now-months-long-tested default
1236*c87b03e5Sespieto remain the same''.
1237*c87b03e5SespieThe latter viewpoint won that particular time.
1238*c87b03e5SespieThe bugs exposed other concerns regarding ABI compliance
1239*c87b03e5Sespiewhen the ABI specified treatment of complex data as different
1240*c87b03e5Sespiefrom treatment of what Fortran and GNU C consider the equivalent
1241*c87b03e5Sespieaggregation (structure) of real (or float) pairs.
1242*c87b03e5Sespie
1243*c87b03e5SespieOther Fortran constructs---arrays, character strings,
1244*c87b03e5Sespiecomplex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
1245*c87b03e5Sespieand so on---involve issues similar to those pertaining to complex arithmetic.
1246*c87b03e5Sespie
1247*c87b03e5SespieSo, it is possible that the history
1248*c87b03e5Sespieof how the FFE handled complex arithmetic
1249*c87b03e5Sespiewill be repeated, probably in modified form
1250*c87b03e5Sespie(and hopefully over shorter timeframes),
1251*c87b03e5Sespiefor some of these other facilities.
1252*c87b03e5Sespie
1253*c87b03e5Sespie@node Two-pass Design
1254*c87b03e5Sespie@section Two-pass Design
1255*c87b03e5Sespie
1256*c87b03e5SespieThe FFE does not tell the GBE anything about a program unit
1257*c87b03e5Sespieuntil after the last statement in that unit has been parsed.
1258*c87b03e5Sespie(A program unit is a Fortran concept that corresponds, in the C world,
1259*c87b03e5Sespiemostly closely to functions definitions in ISO C.
1260*c87b03e5SespieThat is, a program unit in Fortran is like a top-level function in C.
1261*c87b03e5SespieNested functions, found among the extensions offered by GNU C,
1262*c87b03e5Sespiecorrespond roughly to Fortran's statement functions.)
1263*c87b03e5Sespie
1264*c87b03e5SespieSo, while parsing the code in a program unit,
1265*c87b03e5Sespiethe FFE saves up all the information
1266*c87b03e5Sespieon statements, expressions, names, and so on,
1267*c87b03e5Sespieuntil it has seen the last statement.
1268*c87b03e5Sespie
1269*c87b03e5SespieAt that point, the FFE revisits the saved information
1270*c87b03e5Sespie(in what amounts to a second @dfn{pass} over the program unit)
1271*c87b03e5Sespieto perform the actual translation of the program unit into GBEL,
1272*c87b03e5Sespieultimating in the generation of assembly code for it.
1273*c87b03e5Sespie
1274*c87b03e5SespieSome lookahead is performed during this second pass,
1275*c87b03e5Sespieso the FFE could be viewed as a ``two-plus-pass'' design.
1276*c87b03e5Sespie
1277*c87b03e5Sespie@menu
1278*c87b03e5Sespie* Two-pass Code::
1279*c87b03e5Sespie* Why Two Passes::
1280*c87b03e5Sespie@end menu
1281*c87b03e5Sespie
1282*c87b03e5Sespie@node Two-pass Code
1283*c87b03e5Sespie@subsection Two-pass Code
1284*c87b03e5Sespie
1285*c87b03e5SespieMost of the code that turns the first pass (parsing)
1286*c87b03e5Sespieinto a second pass for code generation
1287*c87b03e5Sespieis in @file{@value{path-g77}/std.c}.
1288*c87b03e5Sespie
1289*c87b03e5SespieIt has external functions,
1290*c87b03e5Sespiecalled mainly by siblings in @file{@value{path-g77}/stc.c},
1291*c87b03e5Sespiethat record the information on statements and expressions
1292*c87b03e5Sespiein the order they are seen in the source code.
1293*c87b03e5SespieThese functions save that information.
1294*c87b03e5Sespie
1295*c87b03e5SespieIt also has an external function that revisits that information,
1296*c87b03e5Sespiecalling the siblings in @file{@value{path-g77}/ste.c},
1297*c87b03e5Sespiewhich handles the actual code generation
1298*c87b03e5Sespie(by generating GBEL code,
1299*c87b03e5Sespiethat is, by calling GBE routines
1300*c87b03e5Sespieto represent and specify expressions, statements, and so on).
1301*c87b03e5Sespie
1302*c87b03e5Sespie@node Why Two Passes
1303*c87b03e5Sespie@subsection Why Two Passes
1304*c87b03e5Sespie
1305*c87b03e5SespieThe need for two passes was not immediately evident
1306*c87b03e5Sespieduring the design and implementation of the code in the FFE
1307*c87b03e5Sespiethat was to produce GBEL.
1308*c87b03e5SespieOnly after a few kludges,
1309*c87b03e5Sespieto handle things like incorrectly-guessed @code{ASSIGN} label nature,
1310*c87b03e5Sespiehad been implemented,
1311*c87b03e5Sespiedid enough evidence pile up to make it clear
1312*c87b03e5Sespiethat @file{std.c} had to be introduced to intercept,
1313*c87b03e5Sespiesave, then revisit as part of a second pass,
1314*c87b03e5Sespiethe digested contents of a program unit.
1315*c87b03e5Sespie
1316*c87b03e5SespieOther such missteps have occurred during the evolution of the FFE,
1317*c87b03e5Sespiebecause of the different goals of the FFE and the GBE.
1318*c87b03e5Sespie
1319*c87b03e5SespieBecause the GBE's original, and still primary, goal
1320*c87b03e5Sespiewas to directly support the GNU C language,
1321*c87b03e5Sespiethe GBEL, and the GBE itself,
1322*c87b03e5Sespierequires more complexity
1323*c87b03e5Sespieon the part of most front ends
1324*c87b03e5Sespiethan it requires of @code{gcc}'s.
1325*c87b03e5Sespie
1326*c87b03e5SespieFor example,
1327*c87b03e5Sespiethe GBEL offers an interface that permits the @code{gcc} front end
1328*c87b03e5Sespieto implement most, or all, of the language features it supports,
1329*c87b03e5Sespiewithout the front end having to
1330*c87b03e5Sespiemake use of non-user-defined variables.
1331*c87b03e5Sespie(It's almost certainly the case that all of K&R C,
1332*c87b03e5Sespieand probably ANSI C as well,
1333*c87b03e5Sespieis handled by the @code{gcc} front end
1334*c87b03e5Sespiewithout declaring such variables.)
1335*c87b03e5Sespie
1336*c87b03e5SespieThe FFE, on the other hand, must resort to a variety of ``tricks''
1337*c87b03e5Sespieto achieve its goals.
1338*c87b03e5Sespie
1339*c87b03e5SespieConsider the following C code:
1340*c87b03e5Sespie
1341*c87b03e5Sespie@smallexample
1342*c87b03e5Sespieint
1343*c87b03e5Sespiefoo (int a, int b)
1344*c87b03e5Sespie@{
1345*c87b03e5Sespie  int c = 0;
1346*c87b03e5Sespie
1347*c87b03e5Sespie  if ((c = bar (c)) == 0)
1348*c87b03e5Sespie    goto done;
1349*c87b03e5Sespie
1350*c87b03e5Sespie  quux (c << 1);
1351*c87b03e5Sespie
1352*c87b03e5Sespiedone:
1353*c87b03e5Sespie  return c;
1354*c87b03e5Sespie@}
1355*c87b03e5Sespie@end smallexample
1356*c87b03e5Sespie
1357*c87b03e5SespieNote what kinds of objects are declared, or defined, before their use,
1358*c87b03e5Sespieand before any actual code generation involving them
1359*c87b03e5Sespiewould normally take place:
1360*c87b03e5Sespie
1361*c87b03e5Sespie@itemize @bullet
1362*c87b03e5Sespie@item
1363*c87b03e5SespieReturn type of function
1364*c87b03e5Sespie
1365*c87b03e5Sespie@item
1366*c87b03e5SespieEntry point(s) of function
1367*c87b03e5Sespie
1368*c87b03e5Sespie@item
1369*c87b03e5SespieDummy arguments
1370*c87b03e5Sespie
1371*c87b03e5Sespie@item
1372*c87b03e5SespieVariables
1373*c87b03e5Sespie
1374*c87b03e5Sespie@item
1375*c87b03e5SespieInitial values for variables
1376*c87b03e5Sespie@end itemize
1377*c87b03e5Sespie
1378*c87b03e5SespieWhereas, the following items can, and do,
1379*c87b03e5Sespiesuddenly appear ``out of the blue'' in C:
1380*c87b03e5Sespie
1381*c87b03e5Sespie@itemize @bullet
1382*c87b03e5Sespie@item
1383*c87b03e5SespieLabel references
1384*c87b03e5Sespie
1385*c87b03e5Sespie@item
1386*c87b03e5SespieFunction references
1387*c87b03e5Sespie@end itemize
1388*c87b03e5Sespie
1389*c87b03e5SespieNot surprisingly, the GBE faithfully permits the latter set of items
1390*c87b03e5Sespieto be ``discovered'' partway through GBEL ``programs'',
1391*c87b03e5Sespiejust as they are permitted to in C.
1392*c87b03e5Sespie
1393*c87b03e5SespieYet, the GBE has tended, at least in the past,
1394*c87b03e5Sespieto be reticent to fully support similar ``late'' discovery
1395*c87b03e5Sespieof items in the former set.
1396*c87b03e5Sespie
1397*c87b03e5SespieThis makes Fortran a poor fit for the ``safe'' subset of GBEL.
1398*c87b03e5SespieConsider:
1399*c87b03e5Sespie
1400*c87b03e5Sespie@smallexample
1401*c87b03e5Sespie      FUNCTION X (A, ARRAY, ID1)
1402*c87b03e5Sespie      CHARACTER*(*) A
1403*c87b03e5Sespie      DOUBLE PRECISION X, Y, Z, TMP, EE, PI
1404*c87b03e5Sespie      REAL ARRAY(ID1*ID2)
1405*c87b03e5Sespie      COMMON ID2
1406*c87b03e5Sespie      EXTERNAL FRED
1407*c87b03e5Sespie
1408*c87b03e5Sespie      ASSIGN 100 TO J
1409*c87b03e5Sespie      CALL FOO (I)
1410*c87b03e5Sespie      IF (I .EQ. 0) PRINT *, A(0)
1411*c87b03e5Sespie      GOTO 200
1412*c87b03e5Sespie
1413*c87b03e5Sespie      ENTRY Y (Z)
1414*c87b03e5Sespie      ASSIGN 101 TO J
1415*c87b03e5Sespie200   PRINT *, A(1)
1416*c87b03e5Sespie      READ *, TMP
1417*c87b03e5Sespie      GOTO J
1418*c87b03e5Sespie100   X = TMP * EE
1419*c87b03e5Sespie      RETURN
1420*c87b03e5Sespie101   Y = TMP * PI
1421*c87b03e5Sespie      CALL FRED
1422*c87b03e5Sespie      DATA EE, PI /2.71D0, 3.14D0/
1423*c87b03e5Sespie      END
1424*c87b03e5Sespie@end smallexample
1425*c87b03e5Sespie
1426*c87b03e5SespieHere are some observations about the above code,
1427*c87b03e5Sespiewhich, while somewhat contrived,
1428*c87b03e5Sespieconforms to the FORTRAN 77 and Fortran 90 standards:
1429*c87b03e5Sespie
1430*c87b03e5Sespie@itemize @bullet
1431*c87b03e5Sespie@item
1432*c87b03e5SespieThe return type of function @samp{X} is not known
1433*c87b03e5Sespieuntil the @samp{DOUBLE PRECISION} line has been parsed.
1434*c87b03e5Sespie
1435*c87b03e5Sespie@item
1436*c87b03e5SespieWhether @samp{A} is a function or a variable
1437*c87b03e5Sespieis not known until the @samp{PRINT *, A(0)} statement
1438*c87b03e5Sespiehas been parsed.
1439*c87b03e5Sespie
1440*c87b03e5Sespie@item
1441*c87b03e5SespieThe bounds of the array of argument @samp{ARRAY}
1442*c87b03e5Sespiedepend on a computation involving
1443*c87b03e5Sespiethe subsequent argument @samp{ID1}
1444*c87b03e5Sespieand the blank-common member @samp{ID2}.
1445*c87b03e5Sespie
1446*c87b03e5Sespie@item
1447*c87b03e5SespieWhether @samp{Y} and @samp{Z} are local variables,
1448*c87b03e5Sespieadditional function entry points,
1449*c87b03e5Sespieor dummy arguments to additional entry points
1450*c87b03e5Sespieis not known
1451*c87b03e5Sespieuntil the @code{ENTRY} statement is parsed.
1452*c87b03e5Sespie
1453*c87b03e5Sespie@item
1454*c87b03e5SespieSimilarly, whether @samp{TMP} is a local variable is not known
1455*c87b03e5Sespieuntil the @samp{READ *, TMP} statement is parsed.
1456*c87b03e5Sespie
1457*c87b03e5Sespie@item
1458*c87b03e5SespieThe initial values for @samp{EE} and @samp{PI}
1459*c87b03e5Sespieare not known until after the @code{DATA} statement is parsed.
1460*c87b03e5Sespie
1461*c87b03e5Sespie@item
1462*c87b03e5SespieWhether @samp{FRED} is a function returning type @code{REAL}
1463*c87b03e5Sespieor a subroutine
1464*c87b03e5Sespie(which can be thought of as returning type @code{void}
1465*c87b03e5Sespie@emph{or}, to support alternate returns in a simple way,
1466*c87b03e5Sespietype @code{int})
1467*c87b03e5Sespieis not known
1468*c87b03e5Sespieuntil the @samp{CALL FRED} statement is parsed.
1469*c87b03e5Sespie
1470*c87b03e5Sespie@item
1471*c87b03e5SespieWhether @samp{100} is a @code{FORMAT} label
1472*c87b03e5Sespieor the label of an executable statement
1473*c87b03e5Sespieis not known
1474*c87b03e5Sespieuntil the @samp{X =} statement is parsed.
1475*c87b03e5Sespie(These two types of labels get @emph{very} different treatment,
1476*c87b03e5Sespieespecially when @code{ASSIGN}'ed.)
1477*c87b03e5Sespie
1478*c87b03e5Sespie@item
1479*c87b03e5SespieThat @samp{J} is a local variable is not known
1480*c87b03e5Sespieuntil the first @code{ASSIGN} statement is parsed.
1481*c87b03e5Sespie(This happens @emph{after} executable code has been seen.)
1482*c87b03e5Sespie@end itemize
1483*c87b03e5Sespie
1484*c87b03e5SespieVery few of these ``discoveries''
1485*c87b03e5Sespiecan be accommodated by the GBE as it has evolved over the years.
1486*c87b03e5SespieThe GBEL doesn't support several of them,
1487*c87b03e5Sespieand those it might appear to support
1488*c87b03e5Sespiedon't always work properly,
1489*c87b03e5Sespieespecially in combination with other GBEL and GBE features,
1490*c87b03e5Sespieas implemented in the GBE.
1491*c87b03e5Sespie
1492*c87b03e5Sespie(Had the GBE and its GBEL originally evolved to support @code{g77},
1493*c87b03e5Sespiethe shoe would be on the other foot, so to speak---most, if not all,
1494*c87b03e5Sespieof the above would be directly supported by the GBEL,
1495*c87b03e5Sespieand a few C constructs would probably not, as they are in reality,
1496*c87b03e5Sespiebe supported.
1497*c87b03e5SespieBoth this mythical, and today's real, GBE caters to its GBEL
1498*c87b03e5Sespieby, sometimes, scrambling around, cleaning up after itself---after
1499*c87b03e5Sespiediscovering that assumptions it made earlier during code generation
1500*c87b03e5Sespieare incorrect.
1501*c87b03e5SespieThat's not a great design, since it indicates significant code
1502*c87b03e5Sespiepaths that might be rarely tested but used in some key production
1503*c87b03e5Sespieenvironments.)
1504*c87b03e5Sespie
1505*c87b03e5SespieSo, the FFE handles these discrepancies---between the order in which
1506*c87b03e5Sespieit discovers facts about the code it is compiling,
1507*c87b03e5Sespieand the order in which the GBEL and GBE support such discoveries---by
1508*c87b03e5Sespieperforming what amounts to two
1509*c87b03e5Sespiepasses over each program unit.
1510*c87b03e5Sespie
1511*c87b03e5Sespie(A few ambiguities can remain at that point,
1512*c87b03e5Sespiesuch as whether, given @samp{EXTERNAL BAZ}
1513*c87b03e5Sespieand no other reference to @samp{BAZ} in the program unit,
1514*c87b03e5Sespieit is a subroutine, a function, or a block-data---which, in C-speak,
1515*c87b03e5Sespiegoverns its declared return type.
1516*c87b03e5SespieFortunately, these distinctions are easily finessed
1517*c87b03e5Sespiefor the procedure, library, and object-file interfaces
1518*c87b03e5Sespiesupported by @code{g77}.)
1519*c87b03e5Sespie
1520*c87b03e5Sespie@node Challenges Posed
1521*c87b03e5Sespie@section Challenges Posed
1522*c87b03e5Sespie
1523*c87b03e5SespieConsider the following Fortran code, which uses various extensions
1524*c87b03e5Sespie(including some to Fortran 90):
1525*c87b03e5Sespie
1526*c87b03e5Sespie@smallexample
1527*c87b03e5SespieSUBROUTINE X(A)
1528*c87b03e5SespieCHARACTER*(*) A
1529*c87b03e5SespieCOMPLEX CFUNC
1530*c87b03e5SespieINTEGER*2 CLOCKS(200)
1531*c87b03e5SespieINTEGER IFUNC
1532*c87b03e5Sespie
1533*c87b03e5SespieCALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')'))))
1534*c87b03e5Sespie@end smallexample
1535*c87b03e5Sespie
1536*c87b03e5SespieThe above poses the following challenges to any Fortran compiler
1537*c87b03e5Sespiethat uses run-time interfaces, and a run-time library, roughly similar
1538*c87b03e5Sespieto those used by @code{g77}:
1539*c87b03e5Sespie
1540*c87b03e5Sespie@itemize @bullet
1541*c87b03e5Sespie@item
1542*c87b03e5SespieAssuming the library routine that supports @code{SYSTEM_CLOCK}
1543*c87b03e5Sespieexpects to set an @code{INTEGER*4} variable via its @code{COUNT} argument,
1544*c87b03e5Sespiethe compiler must make available to it a temporary variable of that type.
1545*c87b03e5Sespie
1546*c87b03e5Sespie@item
1547*c87b03e5SespieFurther, after the @code{SYSTEM_CLOCK} library routine returns,
1548*c87b03e5Sespiethe compiler must ensure that the temporary variable it wrote
1549*c87b03e5Sespieis copied into the appropriate element of the @samp{CLOCKS} array.
1550*c87b03e5Sespie(This assumes the compiler doesn't just reject the code,
1551*c87b03e5Sespiewhich it should if it is compiling under some kind of a ``strict'' option.)
1552*c87b03e5Sespie
1553*c87b03e5Sespie@item
1554*c87b03e5SespieTo determine the correct index into the @samp{CLOCKS} array,
1555*c87b03e5Sespie(putting aside the fact that the index, in this particular case,
1556*c87b03e5Sespieneed not be computed until after
1557*c87b03e5Sespiethe @code{SYSTEM_CLOCK} library routine returns),
1558*c87b03e5Sespiethe compiler must ensure that the @code{IFUNC} function is called.
1559*c87b03e5Sespie
1560*c87b03e5SespieThat requires evaluating its argument,
1561*c87b03e5Sespiewhich requires, for @code{g77}
1562*c87b03e5Sespie(assuming @code{-ff2c} is in force),
1563*c87b03e5Sespiereserving a temporary variable of type @code{COMPLEX}
1564*c87b03e5Sespiefor use as a repository for the return value
1565*c87b03e5Sespiebeing computed by @samp{CFUNC}.
1566*c87b03e5Sespie
1567*c87b03e5Sespie@item
1568*c87b03e5SespieBefore invoking @samp{CFUNC},
1569*c87b03e5Sespieis argument must be evaluated,
1570*c87b03e5Sespiewhich requires allocating, at run time,
1571*c87b03e5Sespiea temporary large enough to hold the result of the concatenation,
1572*c87b03e5Sespieas well as actually performing the concatenation.
1573*c87b03e5Sespie
1574*c87b03e5Sespie@item
1575*c87b03e5SespieThe large temporary needed during invocation of @code{CFUNC}
1576*c87b03e5Sespieshould, ideally, be deallocated
1577*c87b03e5Sespie(or, at least, left to the GBE to dispose of, as it sees fit)
1578*c87b03e5Sespieas soon as @code{CFUNC} returns,
1579*c87b03e5Sespiewhich means before @code{IFUNC} is called
1580*c87b03e5Sespie(as it might need a lot of dynamically allocated memory).
1581*c87b03e5Sespie@end itemize
1582*c87b03e5Sespie
1583*c87b03e5Sespie@code{g77} currently doesn't support all of the above,
1584*c87b03e5Sespiebut, so that it might someday, it has evolved to handle
1585*c87b03e5Sespieat least some of the above requirements.
1586*c87b03e5Sespie
1587*c87b03e5SespieMeeting the above requirements is made more challenging
1588*c87b03e5Sespieby conforming to the requirements of the GBEL/GBE combination.
1589*c87b03e5Sespie
1590*c87b03e5Sespie@node Transforming Statements
1591*c87b03e5Sespie@section Transforming Statements
1592*c87b03e5Sespie
1593*c87b03e5SespieMost Fortran statements are given their own block,
1594*c87b03e5Sespieand, for temporary variables they might need, their own scope.
1595*c87b03e5Sespie(A block is what distinguishes @samp{@{ foo (); @}}
1596*c87b03e5Sespiefrom just @samp{foo ();} in C.
1597*c87b03e5SespieA scope is included with every such block,
1598*c87b03e5Sespieproviding a distinct name space for local variables.)
1599*c87b03e5Sespie
1600*c87b03e5SespieLabel definitions for the statement precede this block,
1601*c87b03e5Sespieso @samp{10 PRINT *, I} is handled more like
1602*c87b03e5Sespie@samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}}
1603*c87b03e5Sespie(where @samp{fl10} is just a notation meaning ``Fortran Label 10''
1604*c87b03e5Sespiefor the purposes of this document).
1605*c87b03e5Sespie
1606*c87b03e5Sespie@menu
1607*c87b03e5Sespie* Statements Needing Temporaries::
1608*c87b03e5Sespie* Transforming DO WHILE::
1609*c87b03e5Sespie* Transforming Iterative DO::
1610*c87b03e5Sespie* Transforming Block IF::
1611*c87b03e5Sespie* Transforming SELECT CASE::
1612*c87b03e5Sespie@end menu
1613*c87b03e5Sespie
1614*c87b03e5Sespie@node Statements Needing Temporaries
1615*c87b03e5Sespie@subsection Statements Needing Temporaries
1616*c87b03e5Sespie
1617*c87b03e5SespieAny temporaries needed during, but not beyond,
1618*c87b03e5Sespieexecution of a Fortran statement,
1619*c87b03e5Sespieare made local to the scope of that statement's block.
1620*c87b03e5Sespie
1621*c87b03e5SespieThis allows the GBE to share storage for these temporaries
1622*c87b03e5Sespieamong the various statements without the FFE
1623*c87b03e5Sespiehaving to manage that itself.
1624*c87b03e5Sespie
1625*c87b03e5Sespie(The GBE could, of course, decide to optimize
1626*c87b03e5Sespiemanagement of these temporaries.
1627*c87b03e5SespieFor example, it could, theoretically,
1628*c87b03e5Sespieschedule some of the computations involving these temporaries
1629*c87b03e5Sespieto occur in parallel.
1630*c87b03e5SespieMore practically, it might leave the storage for some temporaries
1631*c87b03e5Sespie``live'' beyond their scopes, to reduce the number of
1632*c87b03e5Sespiemanipulations of the stack pointer at run time.)
1633*c87b03e5Sespie
1634*c87b03e5SespieTemporaries needed across distinct statement boundaries usually
1635*c87b03e5Sespieare associated with Fortran blocks (such as @code{DO}/@code{END DO}).
1636*c87b03e5Sespie(Also, there might be temporaries not associated with blocks at all---these
1637*c87b03e5Sespiewould be in the scope of the entire program unit.)
1638*c87b03e5Sespie
1639*c87b03e5SespieEach Fortran block @emph{should} get its own block/scope in the GBE.
1640*c87b03e5SespieThis is best, because it allows temporaries to be more naturally handled.
1641*c87b03e5SespieHowever, it might pose problems when handling labels
1642*c87b03e5Sespie(in particular, when they're the targets of @code{GOTO}s outside the Fortran
1643*c87b03e5Sespieblock), and generally just hassling with replicating
1644*c87b03e5Sespieparts of the @code{gcc} front end
1645*c87b03e5Sespie(because the FFE needs to support
1646*c87b03e5Sespiean arbitrary number of nested back-end blocks
1647*c87b03e5Sespieif each Fortran block gets one).
1648*c87b03e5Sespie
1649*c87b03e5SespieSo, there might still be a need for top-level temporaries, whose
1650*c87b03e5Sespie``owning'' scope is that of the containing procedure.
1651*c87b03e5Sespie
1652*c87b03e5SespieAlso, there seems to be problems declaring new variables after
1653*c87b03e5Sespiegenerating code (within a block) in the back end, leading to, e.g.,
1654*c87b03e5Sespie@samp{label not defined before binding contour} or similar messages,
1655*c87b03e5Sespiewhen compiling with @samp{-fstack-check} or
1656*c87b03e5Sespiewhen compiling for certain targets.
1657*c87b03e5Sespie
1658*c87b03e5SespieBecause of that, and because sometimes these temporaries are not
1659*c87b03e5Sespiediscovered until in the middle of of generating code for an expression
1660*c87b03e5Sespiestatement (as in the case of the optimization for @samp{X**I}),
1661*c87b03e5Sespieit seems best to always
1662*c87b03e5Sespiepre-scan all the expressions that'll be expanded for a block
1663*c87b03e5Sespiebefore generating any of the code for that block.
1664*c87b03e5Sespie
1665*c87b03e5SespieThis pre-scan then handles discovering and declaring, to the back end,
1666*c87b03e5Sespiethe temporaries needed for that block.
1667*c87b03e5Sespie
1668*c87b03e5SespieIt's also important to treat distinct items in an I/O list as distinct
1669*c87b03e5Sespiestatements deserving their own blocks.
1670*c87b03e5SespieThat's because there's a requirement
1671*c87b03e5Sespiethat each I/O item be fully processed before the next one,
1672*c87b03e5Sespiewhich matters in cases like @samp{READ (*,*), I, A(I)}---the
1673*c87b03e5Sespieelement of @samp{A} read in the second item
1674*c87b03e5Sespie@emph{must} be determined from the value
1675*c87b03e5Sespieof @samp{I} read in the first item.
1676*c87b03e5Sespie
1677*c87b03e5Sespie@node Transforming DO WHILE
1678*c87b03e5Sespie@subsection Transforming DO WHILE
1679*c87b03e5Sespie
1680*c87b03e5Sespie@samp{DO WHILE(expr)} @emph{must} be implemented
1681*c87b03e5Sespieso that temporaries needed to evaluate @samp{expr}
1682*c87b03e5Sespieare generated just for the test, each time.
1683*c87b03e5Sespie
1684*c87b03e5SespieConsider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed:
1685*c87b03e5Sespie
1686*c87b03e5Sespie@smallexample
1687*c87b03e5Sespiefor (;;)
1688*c87b03e5Sespie  @{
1689*c87b03e5Sespie    int temp0;
1690*c87b03e5Sespie
1691*c87b03e5Sespie    @{
1692*c87b03e5Sespie      char temp1[large];
1693*c87b03e5Sespie
1694*c87b03e5Sespie      libg77_catenate (temp1, a, b);
1695*c87b03e5Sespie      temp0 = libg77_ne (temp1, 'END');
1696*c87b03e5Sespie    @}
1697*c87b03e5Sespie
1698*c87b03e5Sespie    if (! temp0)
1699*c87b03e5Sespie      break;
1700*c87b03e5Sespie
1701*c87b03e5Sespie    @dots{}
1702*c87b03e5Sespie  @}
1703*c87b03e5Sespie@end smallexample
1704*c87b03e5Sespie
1705*c87b03e5SespieIn this case, it seems like a time/space tradeoff
1706*c87b03e5Sespiebetween allocating and deallocating @samp{temp1} for each iteration
1707*c87b03e5Sespieand allocating it just once for the entire loop.
1708*c87b03e5Sespie
1709*c87b03e5SespieHowever, if @samp{temp1} is allocated just once for the entire loop,
1710*c87b03e5Sespieit could be the wrong size for subsequent iterations of that loop
1711*c87b03e5Sespiein cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')},
1712*c87b03e5Sespiebecause the body of the loop might modify @samp{I} or @samp{J}.
1713*c87b03e5Sespie
1714*c87b03e5SespieSo, the above implementation is used,
1715*c87b03e5Sespiethough a more optimal one can be used
1716*c87b03e5Sespiein specific circumstances.
1717*c87b03e5Sespie
1718*c87b03e5Sespie@node Transforming Iterative DO
1719*c87b03e5Sespie@subsection Transforming Iterative DO
1720*c87b03e5Sespie
1721*c87b03e5SespieAn iterative @code{DO} loop
1722*c87b03e5Sespie(one that specifies an iteration variable)
1723*c87b03e5Sespieis required by the Fortran standards
1724*c87b03e5Sespieto be implemented as though an iteration count
1725*c87b03e5Sespieis computed before entering the loop body,
1726*c87b03e5Sespieand that iteration count used to determine
1727*c87b03e5Sespiethe number of times the loop body is to be performed
1728*c87b03e5Sespie(assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}).
1729*c87b03e5Sespie
1730*c87b03e5SespieThe FFE handles this by allocating a temporary variable
1731*c87b03e5Sespieto contain the computed number of iterations.
1732*c87b03e5SespieSince this variable must be in a scope that includes the entire loop,
1733*c87b03e5Sespiea GBEL block is created for that loop,
1734*c87b03e5Sespieand the variable declared as belonging to the scope of that block.
1735*c87b03e5Sespie
1736*c87b03e5Sespie@node Transforming Block IF
1737*c87b03e5Sespie@subsection Transforming Block IF
1738*c87b03e5Sespie
1739*c87b03e5SespieConsider:
1740*c87b03e5Sespie
1741*c87b03e5Sespie@smallexample
1742*c87b03e5SespieSUBROUTINE X(A,B,C)
1743*c87b03e5SespieCHARACTER*(*) A, B, C
1744*c87b03e5SespieLOGICAL LFUNC
1745*c87b03e5Sespie
1746*c87b03e5SespieIF (LFUNC (A//B)) THEN
1747*c87b03e5Sespie  CALL SUBR1
1748*c87b03e5SespieELSE IF (LFUNC (A//C)) THEN
1749*c87b03e5Sespie  CALL SUBR2
1750*c87b03e5SespieELSE
1751*c87b03e5Sespie  CALL SUBR3
1752*c87b03e5SespieEND
1753*c87b03e5Sespie@end smallexample
1754*c87b03e5Sespie
1755*c87b03e5SespieThe arguments to the two calls to @samp{LFUNC}
1756*c87b03e5Sespierequire dynamic allocation (at run time),
1757*c87b03e5Sespiebut are not required during execution of the @code{CALL} statements.
1758*c87b03e5Sespie
1759*c87b03e5SespieSo, the scopes of those temporaries must be within blocks inside
1760*c87b03e5Sespiethe block corresponding to the Fortran @code{IF} block.
1761*c87b03e5Sespie
1762*c87b03e5SespieThis cannot be represented ``naturally''
1763*c87b03e5Sespiein vanilla C, nor in GBEL.
1764*c87b03e5SespieThe @code{if}, @code{elseif}, @code{else},
1765*c87b03e5Sespieand @code{endif} constructs
1766*c87b03e5Sespieprovided by both languages must,
1767*c87b03e5Sespiefor a given @code{if} block,
1768*c87b03e5Sespieshare the same C/GBE block.
1769*c87b03e5Sespie
1770*c87b03e5SespieTherefore, any temporaries needed during evaluation of @samp{expr}
1771*c87b03e5Sespiewhile executing @samp{ELSE IF(expr)}
1772*c87b03e5Sespiemust either have been predeclared
1773*c87b03e5Sespieat the top of the corresponding @code{IF} block,
1774*c87b03e5Sespieor declared within a new block for that @code{ELSE IF}---a block that,
1775*c87b03e5Sespiesince it cannot contain the @code{else} or @code{else if} itself
1776*c87b03e5Sespie(due to the above requirement),
1777*c87b03e5Sespieactually implements the rest of the @code{IF} block's
1778*c87b03e5Sespie@code{ELSE IF} and @code{ELSE} statements
1779*c87b03e5Sespiewithin an inner block.
1780*c87b03e5Sespie
1781*c87b03e5SespieThe FFE takes the latter approach.
1782*c87b03e5Sespie
1783*c87b03e5Sespie@node Transforming SELECT CASE
1784*c87b03e5Sespie@subsection Transforming SELECT CASE
1785*c87b03e5Sespie
1786*c87b03e5Sespie@code{SELECT CASE} poses a few interesting problems for code generation,
1787*c87b03e5Sespieif efficiency and frugal stack management are important.
1788*c87b03e5Sespie
1789*c87b03e5SespieConsider @samp{SELECT CASE (I('PREFIX'//A))},
1790*c87b03e5Sespiewhere @samp{A} is @code{CHARACTER*(*)}.
1791*c87b03e5SespieIn a case like this---basically,
1792*c87b03e5Sespiein any case where largish temporaries are needed
1793*c87b03e5Sespieto evaluate the expression---those temporaries should
1794*c87b03e5Sespienot be ``live'' during execution of any of the @code{CASE} blocks.
1795*c87b03e5Sespie
1796*c87b03e5SespieSo, evaluation of the expression is best done within its own block,
1797*c87b03e5Sespiewhich in turn is within the @code{SELECT CASE} block itself
1798*c87b03e5Sespie(which contains the code for the CASE blocks as well,
1799*c87b03e5Sespiethough each within their own block).
1800*c87b03e5Sespie
1801*c87b03e5SespieOtherwise, we'd have the rough equivalent of this pseudo-code:
1802*c87b03e5Sespie
1803*c87b03e5Sespie@smallexample
1804*c87b03e5Sespie@{
1805*c87b03e5Sespie  char temp[large];
1806*c87b03e5Sespie
1807*c87b03e5Sespie  libg77_catenate (temp, 'prefix', a);
1808*c87b03e5Sespie
1809*c87b03e5Sespie  switch (i (temp))
1810*c87b03e5Sespie    @{
1811*c87b03e5Sespie    case 0:
1812*c87b03e5Sespie      @dots{}
1813*c87b03e5Sespie    @}
1814*c87b03e5Sespie@}
1815*c87b03e5Sespie@end smallexample
1816*c87b03e5Sespie
1817*c87b03e5SespieAnd that would leave temp[large] in scope during the CASE blocks
1818*c87b03e5Sespie(although a clever back end *could* see that it isn't referenced
1819*c87b03e5Sespiein them, and thus free that temp before executing the blocks).
1820*c87b03e5Sespie
1821*c87b03e5SespieSo this approach is used instead:
1822*c87b03e5Sespie
1823*c87b03e5Sespie@smallexample
1824*c87b03e5Sespie@{
1825*c87b03e5Sespie  int temp0;
1826*c87b03e5Sespie
1827*c87b03e5Sespie  @{
1828*c87b03e5Sespie    char temp1[large];
1829*c87b03e5Sespie
1830*c87b03e5Sespie    libg77_catenate (temp1, 'prefix', a);
1831*c87b03e5Sespie    temp0 = i (temp1);
1832*c87b03e5Sespie  @}
1833*c87b03e5Sespie
1834*c87b03e5Sespie  switch (temp0)
1835*c87b03e5Sespie    @{
1836*c87b03e5Sespie    case 0:
1837*c87b03e5Sespie      @dots{}
1838*c87b03e5Sespie    @}
1839*c87b03e5Sespie@}
1840*c87b03e5Sespie@end smallexample
1841*c87b03e5Sespie
1842*c87b03e5SespieNote how @samp{temp1} goes out of scope before starting the switch,
1843*c87b03e5Sespiethus making it easy for a back end to free it.
1844*c87b03e5Sespie
1845*c87b03e5SespieThe problem @emph{that} solution has, however,
1846*c87b03e5Sespieis with @samp{SELECT CASE('prefix'//A)}
1847*c87b03e5Sespie(which is currently not supported).
1848*c87b03e5Sespie
1849*c87b03e5SespieUnless the GBEL is extended to support arbitrarily long character strings
1850*c87b03e5Sespiein its @code{case} facility,
1851*c87b03e5Sespiethe FFE has to implement @code{SELECT CASE} on @code{CHARACTER}
1852*c87b03e5Sespie(probably excepting @code{CHARACTER*1})
1853*c87b03e5Sespieusing a cascade of
1854*c87b03e5Sespie@code{if}, @code{elseif}, @code{else}, and @code{endif} constructs
1855*c87b03e5Sespiein GBEL.
1856*c87b03e5Sespie
1857*c87b03e5SespieTo prevent the (potentially large) temporary,
1858*c87b03e5Sespieneeded to hold the selected expression itself (@samp{'prefix'//A}),
1859*c87b03e5Sespiefrom being in scope during execution of the @code{CASE} blocks,
1860*c87b03e5Sespietwo approaches are available:
1861*c87b03e5Sespie
1862*c87b03e5Sespie@itemize @bullet
1863*c87b03e5Sespie@item
1864*c87b03e5SespiePre-evaluate all the @code{CASE} tests,
1865*c87b03e5Sespieproducing an integer ordinal that is used,
1866*c87b03e5Sespiea la @samp{temp0} in the earlier example,
1867*c87b03e5Sespieas if @samp{SELECT CASE(temp0)} had been written.
1868*c87b03e5Sespie
1869*c87b03e5SespieEach corresponding @code{CASE} is replaced with @samp{CASE(@var{i})},
1870*c87b03e5Sespiewhere @var{i} is the ordinal for that case,
1871*c87b03e5Sespiedetermined while, or before,
1872*c87b03e5Sespiegenerating the cascade of @code{if}-related constructs
1873*c87b03e5Sespieto cope with @code{CHARACTER} selection.
1874*c87b03e5Sespie
1875*c87b03e5Sespie@item
1876*c87b03e5SespieMake @samp{temp0} above just
1877*c87b03e5Sespielarge enough to hold the longest @code{CASE} string
1878*c87b03e5Sespiethat'll actually be compared against the expression
1879*c87b03e5Sespie(in this case, @samp{'prefix'//A}).
1880*c87b03e5Sespie
1881*c87b03e5SespieSince that length must be constant
1882*c87b03e5Sespie(because @code{CASE} expressions are all constant),
1883*c87b03e5Sespieit won't be so large,
1884*c87b03e5Sespieand, further, @samp{temp1} need not be dynamically allocated,
1885*c87b03e5Sespiesince normal @code{CHARACTER} assignment can be used
1886*c87b03e5Sespieinto the fixed-length @samp{temp0}.
1887*c87b03e5Sespie@end itemize
1888*c87b03e5Sespie
1889*c87b03e5SespieBoth of these solutions require @code{SELECT CASE} implementation
1890*c87b03e5Sespieto be changed so all the corresponding @code{CASE} statements
1891*c87b03e5Sespieare seen during the actual code generation for @code{SELECT CASE}.
1892*c87b03e5Sespie
1893*c87b03e5Sespie@node Transforming Expressions
1894*c87b03e5Sespie@section Transforming Expressions
1895*c87b03e5Sespie
1896*c87b03e5SespieThe interactions between statements, expressions, and subexpressions
1897*c87b03e5Sespieat program run time can be viewed as:
1898*c87b03e5Sespie
1899*c87b03e5Sespie@smallexample
1900*c87b03e5Sespie@var{action}(@var{expr})
1901*c87b03e5Sespie@end smallexample
1902*c87b03e5Sespie
1903*c87b03e5SespieHere, @var{action} is the series of steps
1904*c87b03e5Sespieperformed to effect the statement,
1905*c87b03e5Sespieand @var{expr} is the expression
1906*c87b03e5Sespiewhose value is used by @var{action}.
1907*c87b03e5Sespie
1908*c87b03e5SespieExpanding the above shows a typical order of events at run time:
1909*c87b03e5Sespie
1910*c87b03e5Sespie@smallexample
1911*c87b03e5SespieEvaluate @var{expr}
1912*c87b03e5SespiePerform @var{action}, using result of evaluation of @var{expr}
1913*c87b03e5SespieClean up after evaluating @var{expr}
1914*c87b03e5Sespie@end smallexample
1915*c87b03e5Sespie
1916*c87b03e5SespieSo, if evaluating @var{expr} requires allocating memory,
1917*c87b03e5Sespiethat memory can be freed before performing @var{action}
1918*c87b03e5Sespieonly if it is not needed to hold the result of evaluating @var{expr}.
1919*c87b03e5SespieOtherwise, it must be freed no sooner than
1920*c87b03e5Sespieafter @var{action} has been performed.
1921*c87b03e5Sespie
1922*c87b03e5SespieThe above are recursive definitions,
1923*c87b03e5Sespiein the sense that they apply to subexpressions of @var{expr}.
1924*c87b03e5Sespie
1925*c87b03e5SespieThat is, evaluating @var{expr} involves
1926*c87b03e5Sespieevaluating all of its subexpressions,
1927*c87b03e5Sespieperforming the @var{action} that computes the
1928*c87b03e5Sespieresult value of @var{expr},
1929*c87b03e5Sespiethen cleaning up after evaluating those subexpressions.
1930*c87b03e5Sespie
1931*c87b03e5SespieThe recursive nature of this evaluation is implemented
1932*c87b03e5Sespievia recursive-descent transformation of the top-level statements,
1933*c87b03e5Sespietheir expressions, @emph{their} subexpressions, and so on.
1934*c87b03e5Sespie
1935*c87b03e5SespieHowever, that recursive-descent transformation is,
1936*c87b03e5Sespiedue to the nature of the GBEL,
1937*c87b03e5Sespiefocused primarily on generating a @emph{single} stream of code
1938*c87b03e5Sespieto be executed at run time.
1939*c87b03e5Sespie
1940*c87b03e5SespieYet, from the above, it's clear that multiple streams of code
1941*c87b03e5Sespiemust effectively be simultaneously generated
1942*c87b03e5Sespieduring the recursive-descent analysis of statements.
1943*c87b03e5Sespie
1944*c87b03e5SespieThe primary stream implements the primary @var{action} items,
1945*c87b03e5Sespiewhile at least two other streams implement
1946*c87b03e5Sespiethe evaluation and clean-up items.
1947*c87b03e5Sespie
1948*c87b03e5SespieRequirements imposed by expressions include:
1949*c87b03e5Sespie
1950*c87b03e5Sespie@itemize @bullet
1951*c87b03e5Sespie@item
1952*c87b03e5SespieWhether the caller needs to have a temporary ready
1953*c87b03e5Sespieto hold the value of the expression.
1954*c87b03e5Sespie
1955*c87b03e5Sespie@item
1956*c87b03e5SespieOther stuff???
1957*c87b03e5Sespie@end itemize
1958*c87b03e5Sespie
1959*c87b03e5Sespie@node Internal Naming Conventions
1960*c87b03e5Sespie@section Internal Naming Conventions
1961*c87b03e5Sespie
1962*c87b03e5SespieNames exported by FFE modules have the following (regular-expression) forms.
1963*c87b03e5SespieNote that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
1964*c87b03e5Sespiewhere @var{mod} is lowercase or uppercase alphanumerics, respectively,
1965*c87b03e5Sespieare exported by the module @code{ffe@var{mod}},
1966*c87b03e5Sespiewith the source code doing the exporting in @file{@var{mod}.h}.
1967*c87b03e5Sespie(Usually, the source code for the implementation is in @file{@var{mod}.c}.)
1968*c87b03e5Sespie
1969*c87b03e5SespieIdentifiers that don't fit the following forms
1970*c87b03e5Sespieare not considered exported,
1971*c87b03e5Sespieeven if they are according to the C language.
1972*c87b03e5Sespie(For example, they might be made available to other modules
1973*c87b03e5Sespiesolely for use within expansions of exported macros,
1974*c87b03e5Sespienot for use within any source code in those other modules.)
1975*c87b03e5Sespie
1976*c87b03e5Sespie@table @code
1977*c87b03e5Sespie@item ffe@var{mod}
1978*c87b03e5SespieThe single typedef exported by the module.
1979*c87b03e5Sespie
1980*c87b03e5Sespie@item FFE@var{umod}_[A-Z][A-Z0-9_]*
1981*c87b03e5Sespie(Where @var{umod} is the uppercase for of @var{mod}.)
1982*c87b03e5Sespie
1983*c87b03e5SespieA @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
1984*c87b03e5Sespie
1985*c87b03e5Sespie@item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
1986*c87b03e5SespieA typedef exported by the module.
1987*c87b03e5Sespie
1988*c87b03e5SespieThe portion of the identifier after @code{ffe@var{mod}} is
1989*c87b03e5Sespiereferred to as @code{ctype}, a capitalized (mixed-case) form
1990*c87b03e5Sespieof @code{type}.
1991*c87b03e5Sespie
1992*c87b03e5Sespie@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
1993*c87b03e5Sespie(Where @var{umod} is the uppercase for of @var{mod}.)
1994*c87b03e5Sespie
1995*c87b03e5SespieA @code{#define} or @code{enum} constant of the type
1996*c87b03e5Sespie@code{ffe@var{mod}@var{type}},
1997*c87b03e5Sespiewhere @var{type} is the lowercase form of @var{ctype}
1998*c87b03e5Sespiein an exported typedef.
1999*c87b03e5Sespie
2000*c87b03e5Sespie@item ffe@var{mod}_@var{value}
2001*c87b03e5SespieA function that does or returns something,
2002*c87b03e5Sespieas described by @var{value} (see below).
2003*c87b03e5Sespie
2004*c87b03e5Sespie@item ffe@var{mod}_@var{value}_@var{input}
2005*c87b03e5SespieA function that does or returns something based
2006*c87b03e5Sespieprimarily on the thing described by @var{input} (see below).
2007*c87b03e5Sespie@end table
2008*c87b03e5Sespie
2009*c87b03e5SespieBelow are names used for @var{value} and @var{input},
2010*c87b03e5Sespiealong with their definitions.
2011*c87b03e5Sespie
2012*c87b03e5Sespie@table @code
2013*c87b03e5Sespie@item col
2014*c87b03e5SespieA column number within a line (first column is number 1).
2015*c87b03e5Sespie
2016*c87b03e5Sespie@item file
2017*c87b03e5SespieAn encapsulation of a file's name.
2018*c87b03e5Sespie
2019*c87b03e5Sespie@item find
2020*c87b03e5SespieLooks up an instance of some type that matches specified criteria,
2021*c87b03e5Sespieand returns that, even if it has to create a new instance or
2022*c87b03e5Sespiecrash trying to find it (as appropriate).
2023*c87b03e5Sespie
2024*c87b03e5Sespie@item initialize
2025*c87b03e5SespieInitializes, usually a module.  No type.
2026*c87b03e5Sespie
2027*c87b03e5Sespie@item int
2028*c87b03e5SespieA generic integer of type @code{int}.
2029*c87b03e5Sespie
2030*c87b03e5Sespie@item is
2031*c87b03e5SespieA generic integer that contains a true (nonzero) or false (zero) value.
2032*c87b03e5Sespie
2033*c87b03e5Sespie@item len
2034*c87b03e5SespieA generic integer that contains the length of something.
2035*c87b03e5Sespie
2036*c87b03e5Sespie@item line
2037*c87b03e5SespieA line number within a source file,
2038*c87b03e5Sespieor a global line number.
2039*c87b03e5Sespie
2040*c87b03e5Sespie@item lookup
2041*c87b03e5SespieLooks up an instance of some type that matches specified criteria,
2042*c87b03e5Sespieand returns that, or returns nil.
2043*c87b03e5Sespie
2044*c87b03e5Sespie@item name
2045*c87b03e5SespieA @code{text} that points to a name of something.
2046*c87b03e5Sespie
2047*c87b03e5Sespie@item new
2048*c87b03e5SespieMakes a new instance of the indicated type.
2049*c87b03e5SespieMight return an existing one if appropriate---if so,
2050*c87b03e5Sespiesimilar to @code{find} without crashing.
2051*c87b03e5Sespie
2052*c87b03e5Sespie@item pt
2053*c87b03e5SespiePointer to a particular character (line, column pairs)
2054*c87b03e5Sespiein the input file (source code being compiled).
2055*c87b03e5Sespie
2056*c87b03e5Sespie@item run
2057*c87b03e5SespiePerforms some herculean task.  No type.
2058*c87b03e5Sespie
2059*c87b03e5Sespie@item terminate
2060*c87b03e5SespieTerminates, usually a module.  No type.
2061*c87b03e5Sespie
2062*c87b03e5Sespie@item text
2063*c87b03e5SespieA @code{char *} that points to generic text.
2064*c87b03e5Sespie@end table
2065