xref: /openbsd/gnu/gcc/gcc/treelang/treelang.texi (revision f6aab3d8)
1\input texinfo  @c -*-texinfo-*-
2
3@c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!!
4@c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!!
5
6
7@c %**start of header
8@setfilename treelang.info
9
10@include gcc-common.texi
11
12@set copyrights-treelang 1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005
13
14@set email-general gcc@@gcc.gnu.org
15@set email-bugs gcc-bugs@@gcc.gnu.org or bug-gcc@@gnu.org
16@set email-patches gcc-patches@@gcc.gnu.org
17@set path-treelang gcc/gcc/treelang
18
19@set which-treelang GCC-@value{version-GCC}
20@set which-GCC GCC
21
22@set email-josling tej@@melbpc.org.au
23@set www-josling http://www.geocities.com/timjosling
24
25@c This tells @include'd files that they're part of the overall TREELANG doc
26@c set.  (They might be part of a higher-level doc set too.)
27@set DOC-TREELANG
28
29@c @setfilename usetreelang.info
30@c @setfilename maintaintreelang.info
31@c To produce the full manual, use the "treelang.info" setfilename, and
32@c make sure the following do NOT begin with '@c' (and the @clear lines DO)
33@set INTERNALS
34@set USING
35@c To produce a user-only manual, use the "usetreelang.info" setfilename, and
36@c make sure the following does NOT begin with '@c':
37@c @clear INTERNALS
38@c To produce a maintainer-only manual, use the "maintaintreelang.info" setfilename,
39@c and make sure the following does NOT begin with '@c':
40@c @clear USING
41
42@ifset INTERNALS
43@ifset USING
44@settitle Using and Maintaining GNU Treelang
45@end ifset
46@end ifset
47@c seems reasonable to assume at least one of INTERNALS or USING is set...
48@ifclear INTERNALS
49@settitle Using GNU Treelang
50@end ifclear
51@ifclear USING
52@settitle Maintaining GNU Treelang
53@end ifclear
54@c then again, have some fun
55@ifclear INTERNALS
56@ifclear USING
57@settitle Doing Very Little at all with GNU Treelang
58@end ifclear
59@end ifclear
60
61@syncodeindex fn cp
62@syncodeindex vr cp
63@c %**end of header
64
65@c Cause even numbered pages to be printed on the left hand side of
66@c the page and odd numbered pages to be printed on the right hand
67@c side of the page.  Using this, you can print on both sides of a
68@c sheet of paper and have the text on the same part of the sheet.
69
70@c The text on right hand pages is pushed towards the right hand
71@c margin and the text on left hand pages is pushed toward the left
72@c hand margin.
73@c (To provide the reverse effect, set bindingoffset to -0.75in.)
74
75@c @tex
76@c \global\bindingoffset=0.75in
77@c \global\normaloffset =0.75in
78@c @end tex
79
80@copying
81Copyright @copyright{} @value{copyrights-treelang} Free Software Foundation, Inc.
82
83Permission is granted to copy, distribute and/or modify this document
84under the terms of the GNU Free Documentation License, Version 1.2 or
85any later version published by the Free Software Foundation; with the
86Invariant Sections being ``GNU General Public License'', the Front-Cover
87texts being (a) (see below), and with the Back-Cover Texts being (b)
88(see below).  A copy of the license is included in the section entitled
89``GNU Free Documentation License''.
90
91(a) The FSF's Front-Cover Text is:
92
93     A GNU Manual
94
95(b) The FSF's Back-Cover Text is:
96
97     You have freedom to copy and modify this GNU Manual, like GNU
98     software.  Copies published by the Free Software Foundation raise
99     funds for GNU development.
100@end copying
101
102@ifnottex
103@dircategory Software development
104@direntry
105* treelang: (treelang).                  The GNU Treelang compiler.
106@end direntry
107@ifset INTERNALS
108@ifset USING
109This file documents the use and the internals of the GNU Treelang
110(@code{treelang}) compiler.  At the moment this manual is not
111incorporated into the main GCC manual as it is incomplete.  It
112corresponds to the @value{which-treelang} version of @code{treelang}.
113@end ifset
114@end ifset
115@ifclear USING
116This file documents the internals of the GNU Treelang (@code{treelang}) compiler.
117It corresponds to the @value{which-treelang} version of @code{treelang}.
118@end ifclear
119@ifclear INTERNALS
120This file documents the use of the GNU Treelang (@code{treelang}) compiler.
121It corresponds to the @value{which-treelang} version of @code{treelang}.
122@end ifclear
123
124Published by the Free Software Foundation
12551 Franklin Street, Fifth Floor
126Boston, MA 02110-1301 USA
127
128@insertcopying
129@end ifnottex
130
131@setchapternewpage odd
132@c @finalout
133@titlepage
134@ifset INTERNALS
135@ifset USING
136@title Using and Maintaining GNU Treelang
137@end ifset
138@end ifset
139@ifclear INTERNALS
140@title Using GNU Treelang
141@end ifclear
142@ifclear USING
143@title Maintaining GNU Treelang
144@end ifclear
145@versionsubtitle
146@author Tim Josling
147@page
148@vskip 0pt plus 1filll
149Published by the Free Software Foundation @*
15051 Franklin Street, Fifth Floor@*
151Boston, MA 02110-1301, USA@*
152@c Last printed ??ber, 19??.@*
153@c Printed copies are available for $? each.@*
154@c ISBN ???
155@sp 1
156@insertcopying
157@end titlepage
158@page
159
160@ifnottex
161
162@node Top, Copying,, (dir)
163@top Introduction
164@cindex Introduction
165
166@ifset INTERNALS
167@ifset USING
168This manual documents how to run, install and maintain @code{treelang}.
169It also documents the features and incompatibilities in the @value{which-treelang}
170version of @code{treelang}.
171@end ifset
172@end ifset
173
174@ifclear INTERNALS
175This manual documents how to run and install @code{treelang}.
176It also documents the features and incompatibilities in the @value{which-treelang}
177version of @code{treelang}.
178@end ifclear
179@ifclear USING
180This manual documents how to maintain @code{treelang}.
181It also documents the features and incompatibilities in the @value{which-treelang}
182version of @code{treelang}.
183@end ifclear
184
185@end ifnottex
186
187@menu
188* Copying::
189* Contributors::
190* GNU Free Documentation License::
191* Funding::
192* Getting Started::
193* What is GNU Treelang?::
194* Lexical Syntax::
195* Parsing Syntax::
196* Compiler Overview::
197* TREELANG and GCC::
198* Compiler::
199* Other Languages::
200* treelang internals::
201* Open Questions::
202* Bugs::
203* Service::
204* Projects::
205* Index::
206
207@detailmenu
208 --- The Detailed Node Listing ---
209
210Other Languages
211
212* Interoperating with C and C++::
213
214treelang internals
215
216* treelang files::
217* treelang compiler interfaces::
218* Hints and tips::
219
220treelang compiler interfaces
221
222* treelang driver::
223* treelang main compiler::
224
225treelang main compiler
226
227* Interfacing to toplev.c::
228* Interfacing to the garbage collection::
229* Interfacing to the code generation code. ::
230
231Reporting Bugs
232
233* Sending Patches::
234
235@end detailmenu
236@end menu
237
238@include gpl.texi
239
240@include fdl.texi
241
242@node Contributors
243
244@unnumbered Contributors to GNU Treelang
245@cindex contributors
246@cindex credits
247
248Treelang was based on 'toy' by Richard Kenner, and also uses code from
249the GCC core code tree.  Tim Josling first created the language and
250documentation, based on the GCC Fortran compiler's documentation
251framework.  Treelang was updated to use the TreeSSA infrastructure by
252James A. Morrison.
253
254@itemize @bullet
255@item
256The packaging and compiler portions of GNU Treelang are based largely
257on the GCC compiler.
258@xref{Contributors,,Contributors to GCC,GCC,Using and Maintaining GCC},
259for more information.
260
261@item
262There is no specific run-time library for treelang, other than the
263standard C runtime.
264
265@item
266It would have been difficult to build treelang without access to Joachim
267Nadler's guide to writing a front end to GCC (written in German).  A
268translation of this document into English is available via the
269CobolForGCC project or via the documentation links from the GCC home
270page @uref{http://gcc.gnu.org}.
271@end itemize
272
273@include funding.texi
274
275@node Getting Started
276@chapter Getting Started
277@cindex getting started
278@cindex new users
279@cindex newbies
280@cindex beginners
281
282Treelang is a sample language, useful only to help people understand how
283to implement a new language front end to GCC.  It is not a useful
284language in itself other than as an example or basis for building a new
285language.  Therefore only language developers are likely to have an
286interest in it.
287
288This manual assumes familiarity with GCC, which you can obtain by using
289it and by reading the manuals @samp{Using the GNU Compiler Collection (GCC)}
290and @samp{GNU Compiler Collection (GCC) Internals}.
291
292To install treelang, follow the GCC installation instructions,
293taking care to ensure you specify treelang in the configure step by adding
294treelang to the list of languages specified by @option{--enable-languages},
295e.g.@: @samp{--enable-languages=all,treelang}.
296
297If you're generally curious about the future of
298@code{treelang}, see @ref{Projects}.
299If you're curious about its past,
300see @ref{Contributors}.
301
302To see a few of the questions maintainers of @code{treelang} have,
303and that you might be able to answer,
304see @ref{Open Questions}.
305
306@ifset USING
307@node What is GNU Treelang?, Lexical Syntax, Getting Started, Top
308@chapter What is GNU Treelang?
309@cindex concepts, basic
310@cindex basic concepts
311
312GNU Treelang, or @code{treelang}, is designed initially as a free
313replacement for, or alternative to, the 'toy' language, but which is
314amenable to inclusion within the GCC source tree.
315
316@code{treelang} is largely a cut down version of C, designed to showcase
317the features of the GCC code generation back end.  Only those features
318that are directly supported by the GCC code generation back end are
319implemented.  Features are implemented in a manner which is easiest and
320clearest to implement.  Not all or even most code generation back end
321features are implemented.  The intention is to add features incrementally
322until most features of the GCC back end are implemented in treelang.
323
324The main features missing are structures, arrays and pointers.
325
326A sample program follows:
327
328@smallexample
329// @r{function prototypes}
330// @r{function 'add' taking two ints and returning an int}
331external_definition int add(int arg1, int arg2);
332external_definition int subtract(int arg3, int arg4);
333external_definition int first_nonzero(int arg5, int arg6);
334external_definition int double_plus_one(int arg7);
335
336// @r{function definition}
337add
338@{
339  // @r{return the sum of arg1 and arg2}
340  return arg1 + arg2;
341@}
342
343
344subtract
345@{
346  return arg3 - arg4;
347@}
348
349double_plus_one
350@{
351  // @r{aaa is a variable, of type integer and allocated at the start of}
352  // @r{the function}
353  automatic int aaa;
354  // @r{set aaa to the value returned from add, when passed arg7 and arg7 as}
355  // @r{the two parameters}
356  aaa=add(arg7, arg7);
357  aaa=add(aaa, aaa);
358  aaa=subtract(subtract(aaa, arg7), arg7) + 1;
359  return aaa;
360@}
361
362first_nonzero
363@{
364  // @r{C-like if statement}
365  if (arg5)
366    @{
367      return arg5;
368    @}
369  else
370    @{
371    @}
372  return arg6;
373@}
374@end smallexample
375
376@node Lexical Syntax, Parsing Syntax, What is GNU Treelang?, Top
377@chapter Lexical Syntax
378@cindex Lexical Syntax
379
380Treelang programs consist of whitespace, comments, keywords and names.
381@itemize @bullet
382
383@item
384Whitespace consists of the space character, a tab, and the end of line
385character.  Line terminations are as defined by the
386standard C library.  Whitespace is ignored except within comments,
387and where it separates parts of the program.  In the example below, A and
388B are two separate names separated by whitespace.
389
390@smallexample
391A B
392@end smallexample
393
394@item
395Comments consist of @samp{//} followed by any characters up to the end
396of the line.  C style comments (/* */) are not supported.  For example,
397the assignment below is followed by a not very helpful comment.
398
399@smallexample
400x = 1; // @r{Set X to 1}
401@end smallexample
402
403@item
404Keywords consist of any of the following reserved words or symbols:
405
406@itemize @bullet
407@item @{
408used to start the statements in a function
409@item @}
410used to end the statements in a function
411@item (
412start list of function arguments, or to change the precedence of operators in
413an expression
414@item )
415end list or prioritized operators in expression
416@item ,
417used to separate parameters in a function prototype or in a function call
418@item ;
419used to end a statement
420@item +
421addition, or unary plus for signed literals
422@item -
423subtraction, or unary minus for signed literals
424@item =
425assignment
426@item ==
427equality test
428@item if
429begin IF statement
430@item else
431begin 'else' portion of IF statement
432@item static
433indicate variable is permanent, or function has file scope only
434@item automatic
435indicate that variable is allocated for the life of the current scope
436@item external_reference
437indicate that variable or function is defined in another file
438@item external_definition
439indicate that variable or function is to be accessible from other files
440@item int
441variable is an integer (same as C int)
442@item char
443variable is a character (same as C char)
444@item unsigned
445variable is unsigned. If this is not present, the variable is signed
446@item return
447start function return statement
448@item void
449used as function type to indicate function returns nothing
450@end itemize
451
452
453@item
454Names consist of any letter or "_" followed by any number of letters,
455numbers, or "_".  "$" is not allowed in a name.  All names must be globally
456unique, i.e. may not be used twice in any context, and must
457not be a keyword.  Names and keywords are case sensitive.  For example:
458
459@smallexample
460a A _a a_ IF_X
461@end smallexample
462
463are all different names.
464
465@end itemize
466
467@node Parsing Syntax, Compiler Overview, Lexical Syntax, Top
468@chapter Parsing Syntax
469@cindex Parsing Syntax
470
471Declarations are built up from the lexical elements described above.  A
472file may contain one of more declarations.
473
474@itemize @bullet
475
476@item
477declaration: variable declaration OR function prototype OR function declaration
478
479@item
480Function Prototype: storage type NAME ( optional_parameter_list )
481
482@smallexample
483static int add (int a, int b)
484@end smallexample
485
486@item
487variable_declaration: storage type NAME initial;
488
489Example:
490
491@smallexample
492int temp1 = 1;
493@end smallexample
494
495A variable declaration can be outside a function, or at the start of a
496function.
497
498@item
499storage: automatic OR static OR external_reference OR external_definition
500
501This defines the scope, duration and visibility of a function or variable
502
503@enumerate 1
504
505@item
506automatic: This means a variable is allocated at start of the current scope and
507released when the current scope is exited.  This can only be used for variables
508within functions.  It cannot be used for functions.
509
510@item
511static: This means a variable is allocated at start of program and
512remains allocated until the program as a whole ends.  For a function, it
513means that the function is only visible within the current file.
514
515@item
516external_definition: For a variable, which must be defined outside a
517function, it means that the variable is visible from other files.  For a
518function, it means that the function is visible from another file.
519
520@item
521external_reference: For a variable, which must be defined outside a
522function, it means that the variable is defined in another file.  For a
523function, it means that the function is defined in another file.
524
525@end enumerate
526
527@item
528type: int OR unsigned int OR char OR unsigned char OR void
529
530This defines the data type of a variable or the return type of a function.
531
532@enumerate a
533
534@item
535int: The variable is a signed integer.  The function returns a signed integer.
536
537@item
538unsigned int: The variable is an unsigned integer.  The function returns an unsigned integer.
539
540@item
541char: The variable is a signed character.  The function returns a signed character.
542
543@item
544unsigned char: The variable is an unsigned character.  The function returns an unsigned character.
545
546@end enumerate
547
548@item
549parameter_list OR parameter [, parameter]...
550
551@item
552parameter: variable_declaration ,
553
554The variable declarations must not have initializations.
555
556@item
557initial: = value
558
559@item
560value: integer_constant
561
562Values without a unary plus or minus are considered to be unsigned.
563@smallexample
564e.g.@: 1 +2 -3
565@end smallexample
566
567@item
568function_declaration: name @{ variable_declarations statements @}
569
570A function consists of the function name then the declarations (if any)
571and statements (if any) within one pair of braces.
572
573The details of the function arguments come from the function
574prototype.  The function prototype must precede the function declaration
575in the file.
576
577@item
578statement: if_statement OR expression_statement OR return_statement
579
580@item
581if_statement: if ( expression ) @{ variable_declarations statements @}
582else @{ variable_declarations statements @}
583
584The first lot of statements is executed if the expression is
585nonzero.  Otherwise the second lot of statements is executed.  Either
586list of statements may be empty, but both sets of braces and the else must be present.
587
588@smallexample
589if (a==b)
590@{
591// @r{nothing}
592@}
593else
594@{
595a=b;
596@}
597@end smallexample
598
599@item
600expression_statement: expression;
601
602The expression is executed, including any side effects.
603
604@item
605return_statement: return expression_opt;
606
607Returns from the function. If the function is void, the expression must
608be absent, and if the function is not void the expression must be
609present.
610
611@item
612expression: variable OR integer_constant OR expression + expression
613OR expression - expression OR expression == expression OR ( expression )
614OR variable = expression OR function_call
615
616An expression can be a constant or a variable reference or a
617function_call.  Expressions can be combined as a sum of two expressions
618or the difference of two expressions, or an equality test of two
619expressions.  An assignment is also an expression.  Expressions and operator
620precedence work as in C.
621
622@item
623function_call: function_name ( optional_comma_separated_expressions )
624
625This invokes the function, passing to it the values of the expressions
626as actual parameters.
627
628@end itemize
629
630@cindex compilers
631@node Compiler Overview, TREELANG and GCC, Parsing Syntax, Top
632@chapter Compiler Overview
633treelang is run as part of the GCC compiler.
634
635@itemize @bullet
636@cindex source code
637@cindex file, source
638@cindex code, source
639@cindex source file
640@item
641It reads a user's program, stored in a file and containing instructions
642written in the appropriate language (Treelang, C, and so on).  This file
643contains @dfn{source code}.
644
645@cindex translation of user programs
646@cindex machine code
647@cindex code, machine
648@cindex mistakes
649@item
650It translates the user's program into instructions a computer can carry
651out more quickly than it takes to translate the instructions in the
652first place.  These instructions are called @dfn{machine code}---code
653designed to be efficiently translated and processed by a machine such as
654a computer.  Humans usually aren't as good writing machine code as they
655are at writing Treelang or C, because it is easy to make tiny mistakes
656writing machine code.  When writing Treelang or C, it is easy to make
657big mistakes. But you can only make one mistake, because the compiler
658stops after it finds any problem.
659
660@cindex debugger
661@cindex bugs, finding
662@cindex @code{gdb}, command
663@cindex commands, @code{gdb}
664@item
665It provides information in the generated machine code
666that can make it easier to find bugs in the program
667(using a debugging tool, called a @dfn{debugger},
668such as @code{gdb}).
669
670@cindex libraries
671@cindex linking
672@cindex @code{ld} command
673@cindex commands, @code{ld}
674@item
675It locates and gathers machine code already generated to perform actions
676requested by statements in the user's program.  This machine code is
677organized into @dfn{libraries} and is located and gathered during the
678@dfn{link} phase of the compilation process.  (Linking often is thought
679of as a separate step, because it can be directly invoked via the
680@code{ld} command.  However, the @code{gcc} command, as with most
681compiler commands, automatically performs the linking step by calling on
682@code{ld} directly, unless asked to not do so by the user.)
683
684@cindex language, incorrect use of
685@cindex incorrect use of language
686@item
687It attempts to diagnose cases where the user's program contains
688incorrect usages of the language.  The @dfn{diagnostics} produced by the
689compiler indicate the problem and the location in the user's source file
690where the problem was first noticed.  The user can use this information
691to locate and fix the problem.
692
693The compiler stops after the first error.  There are no plans to fix
694this, ever, as it would vastly complicate the implementation of treelang
695to little or no benefit.
696
697@cindex diagnostics, incorrect
698@cindex incorrect diagnostics
699@cindex error messages, incorrect
700@cindex incorrect error messages
701(Sometimes an incorrect usage of the language leads to a situation where
702the compiler can not make any sense of what it reads---while a human
703might be able to---and thus ends up complaining about an incorrect
704``problem'' it encounters that, in fact, reflects a misunderstanding of
705the programmer's intention.)
706
707@cindex warnings
708@cindex questionable instructions
709@item
710There are a few warnings in treelang.  For example an unused static function
711generate a warnings when -Wunused-function is specified, similarly an unused
712static variable generates a warning when -Wunused-variable are specified.
713The only treelang specific warning is a warning when an expression is in a
714return statement for functions that return void.
715@end itemize
716
717@cindex components of treelang
718@cindex @code{treelang}, components of
719@code{treelang} consists of several components:
720
721@cindex @code{gcc}, command
722@cindex commands, @code{gcc}
723@itemize @bullet
724@item
725A modified version of the @code{gcc} command, which also might be
726installed as the system's @code{cc} command.
727(In many cases, @code{cc} refers to the
728system's ``native'' C compiler, which
729might be a non-GNU compiler, or an older version
730of @code{GCC} considered more stable or that is
731used to build the operating system kernel.)
732
733@cindex @code{treelang}, command
734@cindex commands, @code{treelang}
735@item
736The @code{treelang} command itself.
737
738@item
739The @code{libc} run-time library.  This library contains the machine
740code needed to support capabilities of the Treelang language that are
741not directly provided by the machine code generated by the
742@code{treelang} compilation phase.  This is the same library that the
743main C compiler uses (libc).
744
745@cindex @code{tree1}, program
746@cindex programs, @code{tree1}
747@cindex assembler
748@cindex @code{as} command
749@cindex commands, @code{as}
750@cindex assembly code
751@cindex code, assembly
752@item
753The compiler itself, is internally named @code{tree1}.
754
755Note that @code{tree1} does not generate machine code directly---it
756generates @dfn{assembly code} that is a more readable form
757of machine code, leaving the conversion to actual machine code
758to an @dfn{assembler}, usually named @code{as}.
759@end itemize
760
761@code{GCC} is often thought of as ``the C compiler'' only,
762but it does more than that.
763Based on command-line options and the names given for files
764on the command line, @code{gcc} determines which actions to perform, including
765preprocessing, compiling (in a variety of possible languages), assembling,
766and linking.
767
768@cindex driver, gcc command as
769@cindex @code{gcc}, command as driver
770@cindex executable file
771@cindex files, executable
772@cindex cc1 program
773@cindex programs, cc1
774@cindex preprocessor
775@cindex cpp program
776@cindex programs, cpp
777For example, the command @samp{gcc foo.c} @dfn{drives} the file
778@file{foo.c} through the preprocessor @code{cpp}, then
779the C compiler (internally named
780@code{cc1}), then the assembler (usually @code{as}), then the linker
781(@code{ld}), producing an executable program named @file{a.out} (on
782UNIX systems).
783
784@cindex treelang program
785@cindex programs, treelang
786As another example, the command @samp{gcc foo.tree} would do much the
787same as @samp{gcc foo.c}, but instead of using the C compiler named
788@code{cc1}, @code{gcc} would use the treelang compiler (named
789@code{tree1}). However there is no preprocessor for treelang.
790
791@cindex @code{tree1}, program
792@cindex programs, @code{tree1}
793In a GNU Treelang installation, @code{gcc} recognizes Treelang source
794files by name just like it does C and C++ source files.  It knows to use
795the Treelang compiler named @code{tree1}, instead of @code{cc1} or
796@code{cc1plus}, to compile Treelang files.  If a file's name ends in
797@code{.tree} then GCC knows that the program is written in treelang.  You
798can also manually override the language.
799
800@cindex @code{gcc}, not recognizing Treelang source
801@cindex unrecognized file format
802@cindex file format not recognized
803Non-Treelang-related operation of @code{gcc} is generally
804unaffected by installing the GNU Treelang version of @code{gcc}.
805However, without the installed version of @code{gcc} being the
806GNU Treelang version, @code{gcc} will not be able to compile
807and link Treelang programs.
808
809@cindex printing version information
810@cindex version information, printing
811The command @samp{gcc -v x.tree} where @samp{x.tree} is a file which
812must exist but whose contents are ignored, is a quick way to display
813version information for the various programs used to compile a typical
814Treelang source file.
815
816The @code{tree1} program represents most of what is unique to GNU
817Treelang; @code{tree1} is a combination of two rather large chunks of
818code.
819
820@cindex GCC Back End (GBE)
821@cindex GBE
822@cindex @code{GCC}, back end
823@cindex back end, GCC
824@cindex code generator
825One chunk is the so-called @dfn{GNU Back End}, or GBE,
826which knows how to generate fast code for a wide variety of processors.
827The same GBE is used by the C, C++, and Treelang compiler programs @code{cc1},
828@code{cc1plus}, and @code{tree1}, plus others.
829Often the GBE is referred to as the ``GCC back end'' or
830even just ``GCC''---in this manual, the term GBE is used
831whenever the distinction is important.
832
833@cindex GNU Treelang Front End (TFE)
834@cindex tree1
835@cindex @code{treelang}, front end
836@cindex front end, @code{treelang}
837The other chunk of @code{tree1} is the majority of what is unique about
838GNU Treelang---the code that knows how to interpret Treelang programs to
839determine what they are intending to do, and then communicate that
840knowledge to the GBE for actual compilation of those programs.  This
841chunk is called the @dfn{Treelang Front End} (TFE).  The @code{cc1} and
842@code{cc1plus} programs have their own front ends, for the C and C++
843languages, respectively.  These fronts ends are responsible for
844diagnosing incorrect usage of their respective languages by the programs
845the process, and are responsible for most of the warnings about
846questionable constructs as well.  (The GBE in principle handles
847producing some warnings, like those concerning possible references to
848undefined variables, but these warnings should not occur in treelang
849programs as the front end is meant to pick them up first).
850
851Because so much is shared among the compilers for various languages,
852much of the behavior and many of the user-selectable options for these
853compilers are similar.
854For example, diagnostics (error messages and
855warnings) are similar in appearance; command-line
856options like @samp{-Wall} have generally similar effects; and the quality
857of generated code (in terms of speed and size) is roughly similar
858(since that work is done by the shared GBE).
859
860@node TREELANG and GCC, Compiler, Compiler Overview, Top
861@chapter Compile Treelang, C, or Other Programs
862@cindex compiling programs
863@cindex programs, compiling
864
865@cindex @code{gcc}, command
866@cindex commands, @code{gcc}
867A GNU Treelang installation includes a modified version of the @code{gcc}
868command.
869
870In a non-Treelang installation, @code{gcc} recognizes C, C++,
871and Objective-C source files.
872
873In a GNU Treelang installation, @code{gcc} also recognizes Treelang source
874files and accepts Treelang-specific command-line options, plus some
875command-line options that are designed to cater to Treelang users
876but apply to other languages as well.
877
878@xref{G++ and GCC,,Programming Languages Supported by GCC,GCC,Using
879the GNU Compiler Collection (GCC)},
880for information on the way different languages are handled
881by the GCC compiler (@code{gcc}).
882
883You can use this, combined with the output of the @samp{gcc -v x.tree}
884command to get the options applicable to treelang.  Treelang programs
885must end with the suffix @samp{.tree}.
886
887@cindex preprocessor
888
889Treelang programs are not by default run through the C
890preprocessor by @code{gcc}. There is no reason why they cannot be run through the
891preprocessor manually, but you would need to prevent the preprocessor
892from generating #line directives, using the @samp{-P} option, otherwise
893tree1 will not accept the input.
894
895@node Compiler, Other Languages, TREELANG and GCC, Top
896@chapter The GNU Treelang Compiler
897
898The GNU Treelang compiler, @code{treelang}, supports programs written
899in the GNU Treelang language.
900
901@node Other Languages, treelang internals, Compiler, Top
902@chapter Other Languages
903
904@menu
905* Interoperating with C and C++::
906@end menu
907
908@node Interoperating with C and C++,  , Other Languages, Other Languages
909@section Tools and advice for interoperating with C and C++
910
911The output of treelang programs looks like C program code to the linker
912and everybody else, so you should be able to freely mix treelang and C
913(and C++) code, with one proviso.
914
915C promotes small integer types to 'int' when used as function parameters and
916return values in non-prototyped functions.  Since treelang has no
917non-prototyped functions, the treelang compiler does not do this.
918
919@ifset INTERNALS
920@node treelang internals, Open Questions, Other Languages, Top
921@chapter treelang internals
922
923@menu
924* treelang files::
925* treelang compiler interfaces::
926* Hints and tips::
927@end menu
928
929@node treelang files, treelang compiler interfaces, treelang internals, treelang internals
930@section treelang files
931
932To create a compiler that integrates into GCC, you need create many
933files.  Some of the files are integrated into the main GCC makefile, to
934build the various parts of the compiler and to run the test
935suite.  Others are incorporated into various GCC programs such as
936@file{gcc.c}.  Finally you must provide the actual programs comprising your
937compiler.
938
939@cindex files
940
941The files are:
942
943@enumerate 1
944
945@item
946COPYING.  This is the copyright file, assuming you are going to use the
947GNU General Public License.  You probably need to use the GPL because if
948you use the GCC back end your program and the back end are one program,
949and the back end is GPLed.
950
951This need not be present if the language is incorporated into the main
952GCC tree, as the main GCC directory has this file.
953
954@item
955COPYING.LIB.  This is the copyright file for those parts of your program
956that are not to be covered by the GPL, but are instead to be covered by
957the LGPL (Library or Lesser GPL).  This license may be appropriate for
958the library routines associated with your compiler. These are the
959routines that are linked with the @emph{output} of the compiler.  Using
960the LGPL for these programs allows programs written using your compiler
961to be closed source. For example LIBC is under the LGPL.
962
963This need not be present if the language is incorporated into the main
964GCC tree, as the main GCC directory has this file.
965
966@item
967ChangeLog.  Record all the changes to your compiler.  Use the same format
968as used in treelang as it is supported by an emacs editing mode and is
969part of the FSF coding standard.  Normally each directory has its own
970changelog.  The FSF standard allows but does not require a meaningful
971comment on why the changes were made, above and beyond @emph{why} they
972were made.  In the author's opinion it is useful to provide this
973information.
974
975@item
976treelang.texi.  The manual, written in texinfo. Your manual would have a
977different file name.  You need not write it in texinfo if you don't want
978do, but a lot of GNU software does use texinfo.
979
980@cindex Make-lang.in
981@item
982Make-lang.in.  This file is part of the make file which in incorporated
983with the GCC make file skeleton (Makefile.in in the GCC directory) to
984make Makefile, as part of the configuration process.
985
986Makefile in turn is the main instruction to actually build
987everything.  The build instructions are held in the main GCC manual and
988web site so they are not repeated here.
989
990There are some comments at the top which will help you understand what
991you need to do.
992
993There are make commands to build things, remove generated files with
994various degrees of thoroughness, count the lines of code (so you know
995how much progress you are making), build info and html files from the
996texinfo source, run the tests etc.
997
998@item
999README.  Just a brief informative text file saying what is in this
1000directory.
1001
1002@cindex config-lang.in
1003@item
1004config-lang.in.  This file is read by the configuration progress and must
1005be present. You specify the name of your language, the name(s) of the
1006compiler(s) including preprocessors you are going to build, whether any,
1007usually generated, files should be excluded from diffs (ie when making
1008diff files to send in patches).  Whether the equate 'stagestuff' is used
1009is unknown (???).
1010
1011@cindex lang.opt
1012@item
1013lang.opt.  This file is included into @file{gcc.c}, the main GCC driver, and
1014tells it what options your language supports.  This is also used to
1015display help.
1016
1017@cindex lang-specs.h
1018@item
1019lang-specs.h.  This file is also included in @file{gcc.c}. It tells
1020@file{gcc.c} when to call your programs and what options to send them.  The
1021mini-language 'specs' is documented in the source of @file{gcc.c}.  Do not
1022attempt to write a specs file from scratch - use an existing one as the base
1023and enhance it.
1024
1025@item
1026Your texi files.  Texinfo can be used to build documentation in HTML,
1027info, dvi and postscript formats. It is a tagged language, is documented
1028in its own manual, and has its own emacs mode.
1029
1030@item
1031Your programs.  The relationships between all the programs are explained
1032in the next section.  You need to write or use the following programs:
1033
1034@itemize @bullet
1035
1036@item
1037lexer.  This breaks the input into words and passes these to the
1038parser.  This is @file{lex.l} in treelang, which is passed through flex, a lex
1039variant, to produce C code @file{lex.c}.  Note there is a school of thought
1040that says real men hand code their own lexers.  However, you may prefer to
1041write far less code and use flex, as was done with treelang.
1042
1043@item
1044parser.  This breaks the program into recognizable constructs such as
1045expressions, statements etc.  This is @file{parse.y} in treelang, which is
1046passed through bison, which is a yacc variant, to produce C code
1047@file{parse.c}.
1048
1049@item
1050back end interface.  This interfaces to the code generation back end.  In
1051treelang, this is @file{tree1.c} which mainly interfaces to @file{toplev.c} and
1052@file{treetree.c} which mainly interfaces to everything else. Many languages
1053mix up the back end interface with the parser, as in the C compiler for
1054example.  It is a matter of taste which way to do it, but with treelang
1055it is separated out to make the back end interface cleaner and easier to
1056understand.
1057
1058@item
1059header files.  For function prototypes and common data items.  One point
1060to note here is that bison can generate a header files with all the
1061numbers is has assigned to the keywords and symbols, and you can include
1062the same header in your lexer.  This technique is demonstrated in
1063treelang.
1064
1065@item
1066compiler main file.  GCC comes with a file @file{toplev.c} which is a
1067perfectly serviceable main program for your compiler.  GNU Treelang uses
1068@file{toplev.c} but other languages have been known to replace it with their
1069own main program.  Again this is a matter of taste and how much code you
1070want to write.
1071
1072@end itemize
1073
1074@end enumerate
1075
1076@node treelang compiler interfaces, Hints and tips, treelang files, treelang internals
1077@section treelang compiler interfaces
1078
1079@cindex driver
1080@cindex toplev.c
1081
1082@menu
1083* treelang driver::
1084* treelang main compiler::
1085@end menu
1086
1087@node treelang driver, treelang main compiler, treelang compiler interfaces, treelang compiler interfaces
1088@subsection treelang driver
1089
1090The GCC compiler consists of a driver, which then executes the various
1091compiler phases based on the instructions in the specs files.
1092
1093Typically a program's language will be identified from its suffix
1094(e.g., @file{.tree}) for treelang programs.
1095
1096The driver (@file{gcc.c}) will then drive (exec) in turn a preprocessor,
1097the main compiler, the assembler and the link editor. Options to GCC allow you
1098to override all of this. In the case of treelang programs there is no
1099preprocessor, and mostly these days the C preprocessor is run within the
1100main C compiler rather than as a separate process, apparently for reasons of speed.
1101
1102You will be using the standard assembler and linkage editor so these are
1103ignored from now on.
1104
1105You have to write your own preprocessor if you want one.  This is usually
1106totally language specific.  The main point to be aware of is to ensure
1107that you find some way to pass file name and line number information
1108through to the main compiler so that it can tell the back end this
1109information and so the debugger can find the right source line for each
1110piece of code.  That is all there is to say about the preprocessor except
1111that the preprocessor will probably not be the slowest part of the
1112compiler and will probably not use the most memory so don't waste too
1113much time tuning it until you know you need to do so.
1114
1115@node treelang main compiler,  , treelang driver, treelang compiler interfaces
1116@subsection treelang main compiler
1117
1118The main compiler for treelang consists of @file{toplev.c} from the main GCC
1119compiler, the parser, lexer and back end interface routines, and the
1120back end routines themselves, of which there are many.
1121
1122@file{toplev.c} does a lot of work for you and you should almost certainly
1123use it.
1124
1125Writing this code is the hard part of creating a compiler using GCC.  The
1126back end interface documentation is incomplete and the interface is
1127complex.
1128
1129There are three main aspects to interfacing to the other GCC code.
1130
1131@menu
1132* Interfacing to toplev.c::
1133* Interfacing to the garbage collection::
1134* Interfacing to the code generation code. ::
1135@end menu
1136
1137@node Interfacing to toplev.c, Interfacing to the garbage collection, treelang main compiler, treelang main compiler
1138@subsubsection Interfacing to toplev.c
1139
1140In treelang this is handled mainly in tree1.c
1141and partly in treetree.c. Peruse toplev.c for details of what you need
1142to do.
1143
1144@node Interfacing to the garbage collection, Interfacing to the code generation code. , Interfacing to toplev.c, treelang main compiler
1145@subsubsection Interfacing to the garbage collection
1146
1147Interfacing to the garbage collection. In treelang this is mainly in
1148tree1.c.
1149
1150Memory allocation in the compiler should be done using the ggc_alloc and
1151kindred routines in ggc*.*. At the end of every 'function' in your language, toplev.c calls
1152the garbage collection several times. The garbage collection calls mark
1153routines which go through the memory which is still used, telling the
1154garbage collection not to free it. Then all the memory not used is
1155freed.
1156
1157What this means is that you need a way to hook into this marking
1158process. This is done by calling ggc_add_root. This provides the address
1159of a callback routine which will be called duing garbage collection and
1160which can call ggc_mark to save the storage. If storage is only
1161used within the parsing of a function, you do not need to provide a way
1162to mark it.
1163
1164Note that you can also call ggc_mark_tree to mark any of the back end
1165internal 'tree' nodes. This routine will follow the branches of the
1166trees and mark all the subordinate structures. This is useful for
1167example when you have created a variable declaration that will be used
1168across multiple functions, or for a function declaration (from a
1169prototype) that may be used later on. See the next item for more on the
1170tree nodes.
1171
1172@node Interfacing to the code generation code. ,  , Interfacing to the garbage collection, treelang main compiler
1173@subsubsection Interfacing to the code generation code.
1174
1175In treelang this is done in treetree.c. A typedef called 'tree' which is
1176defined in tree.h and tree.def in the GCC directory and largely
1177implemented in tree.c and stmt.c forms the basic interface to the
1178compiler back end.
1179
1180In general you call various tree routines to generate code, either
1181directly or through toplev.c. You build up data structures and
1182expressions in similar ways.
1183
1184You can read some documentation on this which can be found via the GCC
1185main web page. In particular, the documentation produced by Joachim
1186Nadler and translated by Tim Josling can be quite useful. the C compiler
1187also has documentation in the main GCC manual (particularly the current
1188CVS version) which is useful on a lot of the details.
1189
1190In time it is hoped to enhance this document to provide a more
1191comprehensive overview of this topic. The main gap is in explaining how
1192it all works together.
1193
1194@node Hints and tips,  , treelang compiler interfaces, treelang internals
1195@section Hints and tips
1196
1197@itemize @bullet
1198
1199@item
1200TAGS: Use the make ETAGS commands to create TAGS files which can be used in
1201emacs to jump to any symbol quickly.
1202
1203@item
1204GREP: grep is also a useful way to find all uses of a symbol.
1205
1206@item
1207TREE: The main routines to look at are tree.h and tree.def. You will
1208probably want a hardcopy of these.
1209
1210@item
1211SAMPLE: look at the sample interfacing code in treetree.c. You can use
1212gdb to trace through the code and learn about how it all works.
1213
1214@item
1215GDB: the GCC back end works well with gdb. It traps abort() and allows
1216you to trace back what went wrong.
1217
1218@item
1219Error Checking: The compiler back end does some error and consistency
1220checking. Often the result of an error is just no code being
1221generated. You will then need to trace through and find out what is
1222going wrong. The rtl dump files can help here also.
1223
1224@item
1225rtl dump files: The main compiler documents these files which are dumps
1226of the rtl (intermediate code) which is manipulated doing the code
1227generation process. This can provide useful clues about what is going
1228wrong. The rtl 'language' is documented in the main GCC manual.
1229
1230@end itemize
1231
1232@end ifset
1233
1234@node Open Questions, Bugs, treelang internals, Top
1235@chapter Open Questions
1236
1237If you know GCC well, please consider looking at the file treetree.c and
1238resolving any questions marked "???".
1239
1240@node Bugs, Service, Open Questions, Top
1241@chapter Reporting Bugs
1242@cindex bugs
1243@cindex reporting bugs
1244
1245You can report bugs to @email{@value{email-bugs}}. Please make
1246sure bugs are real before reporting them. Follow the guidelines in the
1247main GCC manual for submitting bug reports.
1248
1249@menu
1250* Sending Patches::
1251@end menu
1252
1253@node Sending Patches,  , Bugs, Bugs
1254@section Sending Patches for GNU Treelang
1255
1256If you would like to write bug fixes or improvements for the GNU
1257Treelang compiler, that is very helpful.  Send suggested fixes to
1258@email{@value{email-patches}}.
1259
1260@node Service, Projects, Bugs, Top
1261@chapter How To Get Help with GNU Treelang
1262
1263If you need help installing, using or changing GNU Treelang, there are two
1264ways to find it:
1265
1266@itemize @bullet
1267
1268@item
1269Look in the service directory for someone who might help you for a fee.
1270The service directory is found in the file named @file{SERVICE} in the
1271GCC distribution.
1272
1273@item
1274Send a message to @email{@value{email-general}}.
1275
1276@end itemize
1277
1278@end ifset
1279@ifset INTERNALS
1280
1281@node Projects, Index, Service, Top
1282@chapter Projects
1283@cindex projects
1284
1285If you want to contribute to @code{treelang} by doing research,
1286design, specification, documentation, coding, or testing,
1287the following information should give you some ideas.
1288
1289Send a message to @email{@value{email-general}} if you plan to add a
1290feature.
1291
1292The main requirement for treelang is to add features and to add
1293documentation. Features are things that the GCC back end can do but
1294which are not reflected in treelang. Examples include structures,
1295unions, pointers, arrays.
1296
1297@end ifset
1298
1299@node Index,  , Projects, Top
1300@unnumbered Index
1301
1302@printindex cp
1303@summarycontents
1304@contents
1305@bye
1306