1\input texinfo @c -*-texinfo-*- 2 3@c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!! 4@c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!! 5 6 7@c %**start of header 8@setfilename treelang.info 9 10@include gcc-common.texi 11 12@set copyrights-treelang 1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005 13 14@set email-general gcc@@gcc.gnu.org 15@set email-bugs gcc-bugs@@gcc.gnu.org or bug-gcc@@gnu.org 16@set email-patches gcc-patches@@gcc.gnu.org 17@set path-treelang gcc/gcc/treelang 18 19@set which-treelang GCC-@value{version-GCC} 20@set which-GCC GCC 21 22@set email-josling tej@@melbpc.org.au 23@set www-josling http://www.geocities.com/timjosling 24 25@c This tells @include'd files that they're part of the overall TREELANG doc 26@c set. (They might be part of a higher-level doc set too.) 27@set DOC-TREELANG 28 29@c @setfilename usetreelang.info 30@c @setfilename maintaintreelang.info 31@c To produce the full manual, use the "treelang.info" setfilename, and 32@c make sure the following do NOT begin with '@c' (and the @clear lines DO) 33@set INTERNALS 34@set USING 35@c To produce a user-only manual, use the "usetreelang.info" setfilename, and 36@c make sure the following does NOT begin with '@c': 37@c @clear INTERNALS 38@c To produce a maintainer-only manual, use the "maintaintreelang.info" setfilename, 39@c and make sure the following does NOT begin with '@c': 40@c @clear USING 41 42@ifset INTERNALS 43@ifset USING 44@settitle Using and Maintaining GNU Treelang 45@end ifset 46@end ifset 47@c seems reasonable to assume at least one of INTERNALS or USING is set... 48@ifclear INTERNALS 49@settitle Using GNU Treelang 50@end ifclear 51@ifclear USING 52@settitle Maintaining GNU Treelang 53@end ifclear 54@c then again, have some fun 55@ifclear INTERNALS 56@ifclear USING 57@settitle Doing Very Little at all with GNU Treelang 58@end ifclear 59@end ifclear 60 61@syncodeindex fn cp 62@syncodeindex vr cp 63@c %**end of header 64 65@c Cause even numbered pages to be printed on the left hand side of 66@c the page and odd numbered pages to be printed on the right hand 67@c side of the page. Using this, you can print on both sides of a 68@c sheet of paper and have the text on the same part of the sheet. 69 70@c The text on right hand pages is pushed towards the right hand 71@c margin and the text on left hand pages is pushed toward the left 72@c hand margin. 73@c (To provide the reverse effect, set bindingoffset to -0.75in.) 74 75@c @tex 76@c \global\bindingoffset=0.75in 77@c \global\normaloffset =0.75in 78@c @end tex 79 80@copying 81Copyright @copyright{} @value{copyrights-treelang} Free Software Foundation, Inc. 82 83Permission is granted to copy, distribute and/or modify this document 84under the terms of the GNU Free Documentation License, Version 1.2 or 85any later version published by the Free Software Foundation; with the 86Invariant Sections being ``GNU General Public License'', the Front-Cover 87texts being (a) (see below), and with the Back-Cover Texts being (b) 88(see below). A copy of the license is included in the section entitled 89``GNU Free Documentation License''. 90 91(a) The FSF's Front-Cover Text is: 92 93 A GNU Manual 94 95(b) The FSF's Back-Cover Text is: 96 97 You have freedom to copy and modify this GNU Manual, like GNU 98 software. Copies published by the Free Software Foundation raise 99 funds for GNU development. 100@end copying 101 102@ifnottex 103@dircategory Software development 104@direntry 105* treelang: (treelang). The GNU Treelang compiler. 106@end direntry 107@ifset INTERNALS 108@ifset USING 109This file documents the use and the internals of the GNU Treelang 110(@code{treelang}) compiler. At the moment this manual is not 111incorporated into the main GCC manual as it is incomplete. It 112corresponds to the @value{which-treelang} version of @code{treelang}. 113@end ifset 114@end ifset 115@ifclear USING 116This file documents the internals of the GNU Treelang (@code{treelang}) compiler. 117It corresponds to the @value{which-treelang} version of @code{treelang}. 118@end ifclear 119@ifclear INTERNALS 120This file documents the use of the GNU Treelang (@code{treelang}) compiler. 121It corresponds to the @value{which-treelang} version of @code{treelang}. 122@end ifclear 123 124Published by the Free Software Foundation 12551 Franklin Street, Fifth Floor 126Boston, MA 02110-1301 USA 127 128@insertcopying 129@end ifnottex 130 131@setchapternewpage odd 132@c @finalout 133@titlepage 134@ifset INTERNALS 135@ifset USING 136@title Using and Maintaining GNU Treelang 137@end ifset 138@end ifset 139@ifclear INTERNALS 140@title Using GNU Treelang 141@end ifclear 142@ifclear USING 143@title Maintaining GNU Treelang 144@end ifclear 145@versionsubtitle 146@author Tim Josling 147@page 148@vskip 0pt plus 1filll 149Published by the Free Software Foundation @* 15051 Franklin Street, Fifth Floor@* 151Boston, MA 02110-1301, USA@* 152@c Last printed ??ber, 19??.@* 153@c Printed copies are available for $? each.@* 154@c ISBN ??? 155@sp 1 156@insertcopying 157@end titlepage 158@page 159 160@ifnottex 161 162@node Top, Copying,, (dir) 163@top Introduction 164@cindex Introduction 165 166@ifset INTERNALS 167@ifset USING 168This manual documents how to run, install and maintain @code{treelang}. 169It also documents the features and incompatibilities in the @value{which-treelang} 170version of @code{treelang}. 171@end ifset 172@end ifset 173 174@ifclear INTERNALS 175This manual documents how to run and install @code{treelang}. 176It also documents the features and incompatibilities in the @value{which-treelang} 177version of @code{treelang}. 178@end ifclear 179@ifclear USING 180This manual documents how to maintain @code{treelang}. 181It also documents the features and incompatibilities in the @value{which-treelang} 182version of @code{treelang}. 183@end ifclear 184 185@end ifnottex 186 187@menu 188* Copying:: 189* Contributors:: 190* GNU Free Documentation License:: 191* Funding:: 192* Getting Started:: 193* What is GNU Treelang?:: 194* Lexical Syntax:: 195* Parsing Syntax:: 196* Compiler Overview:: 197* TREELANG and GCC:: 198* Compiler:: 199* Other Languages:: 200* treelang internals:: 201* Open Questions:: 202* Bugs:: 203* Service:: 204* Projects:: 205* Index:: 206 207@detailmenu 208 --- The Detailed Node Listing --- 209 210Other Languages 211 212* Interoperating with C and C++:: 213 214treelang internals 215 216* treelang files:: 217* treelang compiler interfaces:: 218* Hints and tips:: 219 220treelang compiler interfaces 221 222* treelang driver:: 223* treelang main compiler:: 224 225treelang main compiler 226 227* Interfacing to toplev.c:: 228* Interfacing to the garbage collection:: 229* Interfacing to the code generation code. :: 230 231Reporting Bugs 232 233* Sending Patches:: 234 235@end detailmenu 236@end menu 237 238@include gpl.texi 239 240@include fdl.texi 241 242@node Contributors 243 244@unnumbered Contributors to GNU Treelang 245@cindex contributors 246@cindex credits 247 248Treelang was based on 'toy' by Richard Kenner, and also uses code from 249the GCC core code tree. Tim Josling first created the language and 250documentation, based on the GCC Fortran compiler's documentation 251framework. Treelang was updated to use the TreeSSA infrastructure by 252James A. Morrison. 253 254@itemize @bullet 255@item 256The packaging and compiler portions of GNU Treelang are based largely 257on the GCC compiler. 258@xref{Contributors,,Contributors to GCC,GCC,Using and Maintaining GCC}, 259for more information. 260 261@item 262There is no specific run-time library for treelang, other than the 263standard C runtime. 264 265@item 266It would have been difficult to build treelang without access to Joachim 267Nadler's guide to writing a front end to GCC (written in German). A 268translation of this document into English is available via the 269CobolForGCC project or via the documentation links from the GCC home 270page @uref{http://gcc.gnu.org}. 271@end itemize 272 273@include funding.texi 274 275@node Getting Started 276@chapter Getting Started 277@cindex getting started 278@cindex new users 279@cindex newbies 280@cindex beginners 281 282Treelang is a sample language, useful only to help people understand how 283to implement a new language front end to GCC. It is not a useful 284language in itself other than as an example or basis for building a new 285language. Therefore only language developers are likely to have an 286interest in it. 287 288This manual assumes familiarity with GCC, which you can obtain by using 289it and by reading the manuals @samp{Using the GNU Compiler Collection (GCC)} 290and @samp{GNU Compiler Collection (GCC) Internals}. 291 292To install treelang, follow the GCC installation instructions, 293taking care to ensure you specify treelang in the configure step by adding 294treelang to the list of languages specified by @option{--enable-languages}, 295e.g.@: @samp{--enable-languages=all,treelang}. 296 297If you're generally curious about the future of 298@code{treelang}, see @ref{Projects}. 299If you're curious about its past, 300see @ref{Contributors}. 301 302To see a few of the questions maintainers of @code{treelang} have, 303and that you might be able to answer, 304see @ref{Open Questions}. 305 306@ifset USING 307@node What is GNU Treelang?, Lexical Syntax, Getting Started, Top 308@chapter What is GNU Treelang? 309@cindex concepts, basic 310@cindex basic concepts 311 312GNU Treelang, or @code{treelang}, is designed initially as a free 313replacement for, or alternative to, the 'toy' language, but which is 314amenable to inclusion within the GCC source tree. 315 316@code{treelang} is largely a cut down version of C, designed to showcase 317the features of the GCC code generation back end. Only those features 318that are directly supported by the GCC code generation back end are 319implemented. Features are implemented in a manner which is easiest and 320clearest to implement. Not all or even most code generation back end 321features are implemented. The intention is to add features incrementally 322until most features of the GCC back end are implemented in treelang. 323 324The main features missing are structures, arrays and pointers. 325 326A sample program follows: 327 328@smallexample 329// @r{function prototypes} 330// @r{function 'add' taking two ints and returning an int} 331external_definition int add(int arg1, int arg2); 332external_definition int subtract(int arg3, int arg4); 333external_definition int first_nonzero(int arg5, int arg6); 334external_definition int double_plus_one(int arg7); 335 336// @r{function definition} 337add 338@{ 339 // @r{return the sum of arg1 and arg2} 340 return arg1 + arg2; 341@} 342 343 344subtract 345@{ 346 return arg3 - arg4; 347@} 348 349double_plus_one 350@{ 351 // @r{aaa is a variable, of type integer and allocated at the start of} 352 // @r{the function} 353 automatic int aaa; 354 // @r{set aaa to the value returned from add, when passed arg7 and arg7 as} 355 // @r{the two parameters} 356 aaa=add(arg7, arg7); 357 aaa=add(aaa, aaa); 358 aaa=subtract(subtract(aaa, arg7), arg7) + 1; 359 return aaa; 360@} 361 362first_nonzero 363@{ 364 // @r{C-like if statement} 365 if (arg5) 366 @{ 367 return arg5; 368 @} 369 else 370 @{ 371 @} 372 return arg6; 373@} 374@end smallexample 375 376@node Lexical Syntax, Parsing Syntax, What is GNU Treelang?, Top 377@chapter Lexical Syntax 378@cindex Lexical Syntax 379 380Treelang programs consist of whitespace, comments, keywords and names. 381@itemize @bullet 382 383@item 384Whitespace consists of the space character, a tab, and the end of line 385character. Line terminations are as defined by the 386standard C library. Whitespace is ignored except within comments, 387and where it separates parts of the program. In the example below, A and 388B are two separate names separated by whitespace. 389 390@smallexample 391A B 392@end smallexample 393 394@item 395Comments consist of @samp{//} followed by any characters up to the end 396of the line. C style comments (/* */) are not supported. For example, 397the assignment below is followed by a not very helpful comment. 398 399@smallexample 400x = 1; // @r{Set X to 1} 401@end smallexample 402 403@item 404Keywords consist of any of the following reserved words or symbols: 405 406@itemize @bullet 407@item @{ 408used to start the statements in a function 409@item @} 410used to end the statements in a function 411@item ( 412start list of function arguments, or to change the precedence of operators in 413an expression 414@item ) 415end list or prioritized operators in expression 416@item , 417used to separate parameters in a function prototype or in a function call 418@item ; 419used to end a statement 420@item + 421addition, or unary plus for signed literals 422@item - 423subtraction, or unary minus for signed literals 424@item = 425assignment 426@item == 427equality test 428@item if 429begin IF statement 430@item else 431begin 'else' portion of IF statement 432@item static 433indicate variable is permanent, or function has file scope only 434@item automatic 435indicate that variable is allocated for the life of the current scope 436@item external_reference 437indicate that variable or function is defined in another file 438@item external_definition 439indicate that variable or function is to be accessible from other files 440@item int 441variable is an integer (same as C int) 442@item char 443variable is a character (same as C char) 444@item unsigned 445variable is unsigned. If this is not present, the variable is signed 446@item return 447start function return statement 448@item void 449used as function type to indicate function returns nothing 450@end itemize 451 452 453@item 454Names consist of any letter or "_" followed by any number of letters, 455numbers, or "_". "$" is not allowed in a name. All names must be globally 456unique, i.e. may not be used twice in any context, and must 457not be a keyword. Names and keywords are case sensitive. For example: 458 459@smallexample 460a A _a a_ IF_X 461@end smallexample 462 463are all different names. 464 465@end itemize 466 467@node Parsing Syntax, Compiler Overview, Lexical Syntax, Top 468@chapter Parsing Syntax 469@cindex Parsing Syntax 470 471Declarations are built up from the lexical elements described above. A 472file may contain one of more declarations. 473 474@itemize @bullet 475 476@item 477declaration: variable declaration OR function prototype OR function declaration 478 479@item 480Function Prototype: storage type NAME ( optional_parameter_list ) 481 482@smallexample 483static int add (int a, int b) 484@end smallexample 485 486@item 487variable_declaration: storage type NAME initial; 488 489Example: 490 491@smallexample 492int temp1 = 1; 493@end smallexample 494 495A variable declaration can be outside a function, or at the start of a 496function. 497 498@item 499storage: automatic OR static OR external_reference OR external_definition 500 501This defines the scope, duration and visibility of a function or variable 502 503@enumerate 1 504 505@item 506automatic: This means a variable is allocated at start of the current scope and 507released when the current scope is exited. This can only be used for variables 508within functions. It cannot be used for functions. 509 510@item 511static: This means a variable is allocated at start of program and 512remains allocated until the program as a whole ends. For a function, it 513means that the function is only visible within the current file. 514 515@item 516external_definition: For a variable, which must be defined outside a 517function, it means that the variable is visible from other files. For a 518function, it means that the function is visible from another file. 519 520@item 521external_reference: For a variable, which must be defined outside a 522function, it means that the variable is defined in another file. For a 523function, it means that the function is defined in another file. 524 525@end enumerate 526 527@item 528type: int OR unsigned int OR char OR unsigned char OR void 529 530This defines the data type of a variable or the return type of a function. 531 532@enumerate a 533 534@item 535int: The variable is a signed integer. The function returns a signed integer. 536 537@item 538unsigned int: The variable is an unsigned integer. The function returns an unsigned integer. 539 540@item 541char: The variable is a signed character. The function returns a signed character. 542 543@item 544unsigned char: The variable is an unsigned character. The function returns an unsigned character. 545 546@end enumerate 547 548@item 549parameter_list OR parameter [, parameter]... 550 551@item 552parameter: variable_declaration , 553 554The variable declarations must not have initializations. 555 556@item 557initial: = value 558 559@item 560value: integer_constant 561 562Values without a unary plus or minus are considered to be unsigned. 563@smallexample 564e.g.@: 1 +2 -3 565@end smallexample 566 567@item 568function_declaration: name @{ variable_declarations statements @} 569 570A function consists of the function name then the declarations (if any) 571and statements (if any) within one pair of braces. 572 573The details of the function arguments come from the function 574prototype. The function prototype must precede the function declaration 575in the file. 576 577@item 578statement: if_statement OR expression_statement OR return_statement 579 580@item 581if_statement: if ( expression ) @{ variable_declarations statements @} 582else @{ variable_declarations statements @} 583 584The first lot of statements is executed if the expression is 585nonzero. Otherwise the second lot of statements is executed. Either 586list of statements may be empty, but both sets of braces and the else must be present. 587 588@smallexample 589if (a==b) 590@{ 591// @r{nothing} 592@} 593else 594@{ 595a=b; 596@} 597@end smallexample 598 599@item 600expression_statement: expression; 601 602The expression is executed, including any side effects. 603 604@item 605return_statement: return expression_opt; 606 607Returns from the function. If the function is void, the expression must 608be absent, and if the function is not void the expression must be 609present. 610 611@item 612expression: variable OR integer_constant OR expression + expression 613OR expression - expression OR expression == expression OR ( expression ) 614OR variable = expression OR function_call 615 616An expression can be a constant or a variable reference or a 617function_call. Expressions can be combined as a sum of two expressions 618or the difference of two expressions, or an equality test of two 619expressions. An assignment is also an expression. Expressions and operator 620precedence work as in C. 621 622@item 623function_call: function_name ( optional_comma_separated_expressions ) 624 625This invokes the function, passing to it the values of the expressions 626as actual parameters. 627 628@end itemize 629 630@cindex compilers 631@node Compiler Overview, TREELANG and GCC, Parsing Syntax, Top 632@chapter Compiler Overview 633treelang is run as part of the GCC compiler. 634 635@itemize @bullet 636@cindex source code 637@cindex file, source 638@cindex code, source 639@cindex source file 640@item 641It reads a user's program, stored in a file and containing instructions 642written in the appropriate language (Treelang, C, and so on). This file 643contains @dfn{source code}. 644 645@cindex translation of user programs 646@cindex machine code 647@cindex code, machine 648@cindex mistakes 649@item 650It translates the user's program into instructions a computer can carry 651out more quickly than it takes to translate the instructions in the 652first place. These instructions are called @dfn{machine code}---code 653designed to be efficiently translated and processed by a machine such as 654a computer. Humans usually aren't as good writing machine code as they 655are at writing Treelang or C, because it is easy to make tiny mistakes 656writing machine code. When writing Treelang or C, it is easy to make 657big mistakes. But you can only make one mistake, because the compiler 658stops after it finds any problem. 659 660@cindex debugger 661@cindex bugs, finding 662@cindex @code{gdb}, command 663@cindex commands, @code{gdb} 664@item 665It provides information in the generated machine code 666that can make it easier to find bugs in the program 667(using a debugging tool, called a @dfn{debugger}, 668such as @code{gdb}). 669 670@cindex libraries 671@cindex linking 672@cindex @code{ld} command 673@cindex commands, @code{ld} 674@item 675It locates and gathers machine code already generated to perform actions 676requested by statements in the user's program. This machine code is 677organized into @dfn{libraries} and is located and gathered during the 678@dfn{link} phase of the compilation process. (Linking often is thought 679of as a separate step, because it can be directly invoked via the 680@code{ld} command. However, the @code{gcc} command, as with most 681compiler commands, automatically performs the linking step by calling on 682@code{ld} directly, unless asked to not do so by the user.) 683 684@cindex language, incorrect use of 685@cindex incorrect use of language 686@item 687It attempts to diagnose cases where the user's program contains 688incorrect usages of the language. The @dfn{diagnostics} produced by the 689compiler indicate the problem and the location in the user's source file 690where the problem was first noticed. The user can use this information 691to locate and fix the problem. 692 693The compiler stops after the first error. There are no plans to fix 694this, ever, as it would vastly complicate the implementation of treelang 695to little or no benefit. 696 697@cindex diagnostics, incorrect 698@cindex incorrect diagnostics 699@cindex error messages, incorrect 700@cindex incorrect error messages 701(Sometimes an incorrect usage of the language leads to a situation where 702the compiler can not make any sense of what it reads---while a human 703might be able to---and thus ends up complaining about an incorrect 704``problem'' it encounters that, in fact, reflects a misunderstanding of 705the programmer's intention.) 706 707@cindex warnings 708@cindex questionable instructions 709@item 710There are a few warnings in treelang. For example an unused static function 711generate a warnings when -Wunused-function is specified, similarly an unused 712static variable generates a warning when -Wunused-variable are specified. 713The only treelang specific warning is a warning when an expression is in a 714return statement for functions that return void. 715@end itemize 716 717@cindex components of treelang 718@cindex @code{treelang}, components of 719@code{treelang} consists of several components: 720 721@cindex @code{gcc}, command 722@cindex commands, @code{gcc} 723@itemize @bullet 724@item 725A modified version of the @code{gcc} command, which also might be 726installed as the system's @code{cc} command. 727(In many cases, @code{cc} refers to the 728system's ``native'' C compiler, which 729might be a non-GNU compiler, or an older version 730of @code{GCC} considered more stable or that is 731used to build the operating system kernel.) 732 733@cindex @code{treelang}, command 734@cindex commands, @code{treelang} 735@item 736The @code{treelang} command itself. 737 738@item 739The @code{libc} run-time library. This library contains the machine 740code needed to support capabilities of the Treelang language that are 741not directly provided by the machine code generated by the 742@code{treelang} compilation phase. This is the same library that the 743main C compiler uses (libc). 744 745@cindex @code{tree1}, program 746@cindex programs, @code{tree1} 747@cindex assembler 748@cindex @code{as} command 749@cindex commands, @code{as} 750@cindex assembly code 751@cindex code, assembly 752@item 753The compiler itself, is internally named @code{tree1}. 754 755Note that @code{tree1} does not generate machine code directly---it 756generates @dfn{assembly code} that is a more readable form 757of machine code, leaving the conversion to actual machine code 758to an @dfn{assembler}, usually named @code{as}. 759@end itemize 760 761@code{GCC} is often thought of as ``the C compiler'' only, 762but it does more than that. 763Based on command-line options and the names given for files 764on the command line, @code{gcc} determines which actions to perform, including 765preprocessing, compiling (in a variety of possible languages), assembling, 766and linking. 767 768@cindex driver, gcc command as 769@cindex @code{gcc}, command as driver 770@cindex executable file 771@cindex files, executable 772@cindex cc1 program 773@cindex programs, cc1 774@cindex preprocessor 775@cindex cpp program 776@cindex programs, cpp 777For example, the command @samp{gcc foo.c} @dfn{drives} the file 778@file{foo.c} through the preprocessor @code{cpp}, then 779the C compiler (internally named 780@code{cc1}), then the assembler (usually @code{as}), then the linker 781(@code{ld}), producing an executable program named @file{a.out} (on 782UNIX systems). 783 784@cindex treelang program 785@cindex programs, treelang 786As another example, the command @samp{gcc foo.tree} would do much the 787same as @samp{gcc foo.c}, but instead of using the C compiler named 788@code{cc1}, @code{gcc} would use the treelang compiler (named 789@code{tree1}). However there is no preprocessor for treelang. 790 791@cindex @code{tree1}, program 792@cindex programs, @code{tree1} 793In a GNU Treelang installation, @code{gcc} recognizes Treelang source 794files by name just like it does C and C++ source files. It knows to use 795the Treelang compiler named @code{tree1}, instead of @code{cc1} or 796@code{cc1plus}, to compile Treelang files. If a file's name ends in 797@code{.tree} then GCC knows that the program is written in treelang. You 798can also manually override the language. 799 800@cindex @code{gcc}, not recognizing Treelang source 801@cindex unrecognized file format 802@cindex file format not recognized 803Non-Treelang-related operation of @code{gcc} is generally 804unaffected by installing the GNU Treelang version of @code{gcc}. 805However, without the installed version of @code{gcc} being the 806GNU Treelang version, @code{gcc} will not be able to compile 807and link Treelang programs. 808 809@cindex printing version information 810@cindex version information, printing 811The command @samp{gcc -v x.tree} where @samp{x.tree} is a file which 812must exist but whose contents are ignored, is a quick way to display 813version information for the various programs used to compile a typical 814Treelang source file. 815 816The @code{tree1} program represents most of what is unique to GNU 817Treelang; @code{tree1} is a combination of two rather large chunks of 818code. 819 820@cindex GCC Back End (GBE) 821@cindex GBE 822@cindex @code{GCC}, back end 823@cindex back end, GCC 824@cindex code generator 825One chunk is the so-called @dfn{GNU Back End}, or GBE, 826which knows how to generate fast code for a wide variety of processors. 827The same GBE is used by the C, C++, and Treelang compiler programs @code{cc1}, 828@code{cc1plus}, and @code{tree1}, plus others. 829Often the GBE is referred to as the ``GCC back end'' or 830even just ``GCC''---in this manual, the term GBE is used 831whenever the distinction is important. 832 833@cindex GNU Treelang Front End (TFE) 834@cindex tree1 835@cindex @code{treelang}, front end 836@cindex front end, @code{treelang} 837The other chunk of @code{tree1} is the majority of what is unique about 838GNU Treelang---the code that knows how to interpret Treelang programs to 839determine what they are intending to do, and then communicate that 840knowledge to the GBE for actual compilation of those programs. This 841chunk is called the @dfn{Treelang Front End} (TFE). The @code{cc1} and 842@code{cc1plus} programs have their own front ends, for the C and C++ 843languages, respectively. These fronts ends are responsible for 844diagnosing incorrect usage of their respective languages by the programs 845the process, and are responsible for most of the warnings about 846questionable constructs as well. (The GBE in principle handles 847producing some warnings, like those concerning possible references to 848undefined variables, but these warnings should not occur in treelang 849programs as the front end is meant to pick them up first). 850 851Because so much is shared among the compilers for various languages, 852much of the behavior and many of the user-selectable options for these 853compilers are similar. 854For example, diagnostics (error messages and 855warnings) are similar in appearance; command-line 856options like @samp{-Wall} have generally similar effects; and the quality 857of generated code (in terms of speed and size) is roughly similar 858(since that work is done by the shared GBE). 859 860@node TREELANG and GCC, Compiler, Compiler Overview, Top 861@chapter Compile Treelang, C, or Other Programs 862@cindex compiling programs 863@cindex programs, compiling 864 865@cindex @code{gcc}, command 866@cindex commands, @code{gcc} 867A GNU Treelang installation includes a modified version of the @code{gcc} 868command. 869 870In a non-Treelang installation, @code{gcc} recognizes C, C++, 871and Objective-C source files. 872 873In a GNU Treelang installation, @code{gcc} also recognizes Treelang source 874files and accepts Treelang-specific command-line options, plus some 875command-line options that are designed to cater to Treelang users 876but apply to other languages as well. 877 878@xref{G++ and GCC,,Programming Languages Supported by GCC,GCC,Using 879the GNU Compiler Collection (GCC)}, 880for information on the way different languages are handled 881by the GCC compiler (@code{gcc}). 882 883You can use this, combined with the output of the @samp{gcc -v x.tree} 884command to get the options applicable to treelang. Treelang programs 885must end with the suffix @samp{.tree}. 886 887@cindex preprocessor 888 889Treelang programs are not by default run through the C 890preprocessor by @code{gcc}. There is no reason why they cannot be run through the 891preprocessor manually, but you would need to prevent the preprocessor 892from generating #line directives, using the @samp{-P} option, otherwise 893tree1 will not accept the input. 894 895@node Compiler, Other Languages, TREELANG and GCC, Top 896@chapter The GNU Treelang Compiler 897 898The GNU Treelang compiler, @code{treelang}, supports programs written 899in the GNU Treelang language. 900 901@node Other Languages, treelang internals, Compiler, Top 902@chapter Other Languages 903 904@menu 905* Interoperating with C and C++:: 906@end menu 907 908@node Interoperating with C and C++, , Other Languages, Other Languages 909@section Tools and advice for interoperating with C and C++ 910 911The output of treelang programs looks like C program code to the linker 912and everybody else, so you should be able to freely mix treelang and C 913(and C++) code, with one proviso. 914 915C promotes small integer types to 'int' when used as function parameters and 916return values in non-prototyped functions. Since treelang has no 917non-prototyped functions, the treelang compiler does not do this. 918 919@ifset INTERNALS 920@node treelang internals, Open Questions, Other Languages, Top 921@chapter treelang internals 922 923@menu 924* treelang files:: 925* treelang compiler interfaces:: 926* Hints and tips:: 927@end menu 928 929@node treelang files, treelang compiler interfaces, treelang internals, treelang internals 930@section treelang files 931 932To create a compiler that integrates into GCC, you need create many 933files. Some of the files are integrated into the main GCC makefile, to 934build the various parts of the compiler and to run the test 935suite. Others are incorporated into various GCC programs such as 936@file{gcc.c}. Finally you must provide the actual programs comprising your 937compiler. 938 939@cindex files 940 941The files are: 942 943@enumerate 1 944 945@item 946COPYING. This is the copyright file, assuming you are going to use the 947GNU General Public License. You probably need to use the GPL because if 948you use the GCC back end your program and the back end are one program, 949and the back end is GPLed. 950 951This need not be present if the language is incorporated into the main 952GCC tree, as the main GCC directory has this file. 953 954@item 955COPYING.LIB. This is the copyright file for those parts of your program 956that are not to be covered by the GPL, but are instead to be covered by 957the LGPL (Library or Lesser GPL). This license may be appropriate for 958the library routines associated with your compiler. These are the 959routines that are linked with the @emph{output} of the compiler. Using 960the LGPL for these programs allows programs written using your compiler 961to be closed source. For example LIBC is under the LGPL. 962 963This need not be present if the language is incorporated into the main 964GCC tree, as the main GCC directory has this file. 965 966@item 967ChangeLog. Record all the changes to your compiler. Use the same format 968as used in treelang as it is supported by an emacs editing mode and is 969part of the FSF coding standard. Normally each directory has its own 970changelog. The FSF standard allows but does not require a meaningful 971comment on why the changes were made, above and beyond @emph{why} they 972were made. In the author's opinion it is useful to provide this 973information. 974 975@item 976treelang.texi. The manual, written in texinfo. Your manual would have a 977different file name. You need not write it in texinfo if you don't want 978do, but a lot of GNU software does use texinfo. 979 980@cindex Make-lang.in 981@item 982Make-lang.in. This file is part of the make file which in incorporated 983with the GCC make file skeleton (Makefile.in in the GCC directory) to 984make Makefile, as part of the configuration process. 985 986Makefile in turn is the main instruction to actually build 987everything. The build instructions are held in the main GCC manual and 988web site so they are not repeated here. 989 990There are some comments at the top which will help you understand what 991you need to do. 992 993There are make commands to build things, remove generated files with 994various degrees of thoroughness, count the lines of code (so you know 995how much progress you are making), build info and html files from the 996texinfo source, run the tests etc. 997 998@item 999README. Just a brief informative text file saying what is in this 1000directory. 1001 1002@cindex config-lang.in 1003@item 1004config-lang.in. This file is read by the configuration progress and must 1005be present. You specify the name of your language, the name(s) of the 1006compiler(s) including preprocessors you are going to build, whether any, 1007usually generated, files should be excluded from diffs (ie when making 1008diff files to send in patches). Whether the equate 'stagestuff' is used 1009is unknown (???). 1010 1011@cindex lang.opt 1012@item 1013lang.opt. This file is included into @file{gcc.c}, the main GCC driver, and 1014tells it what options your language supports. This is also used to 1015display help. 1016 1017@cindex lang-specs.h 1018@item 1019lang-specs.h. This file is also included in @file{gcc.c}. It tells 1020@file{gcc.c} when to call your programs and what options to send them. The 1021mini-language 'specs' is documented in the source of @file{gcc.c}. Do not 1022attempt to write a specs file from scratch - use an existing one as the base 1023and enhance it. 1024 1025@item 1026Your texi files. Texinfo can be used to build documentation in HTML, 1027info, dvi and postscript formats. It is a tagged language, is documented 1028in its own manual, and has its own emacs mode. 1029 1030@item 1031Your programs. The relationships between all the programs are explained 1032in the next section. You need to write or use the following programs: 1033 1034@itemize @bullet 1035 1036@item 1037lexer. This breaks the input into words and passes these to the 1038parser. This is @file{lex.l} in treelang, which is passed through flex, a lex 1039variant, to produce C code @file{lex.c}. Note there is a school of thought 1040that says real men hand code their own lexers. However, you may prefer to 1041write far less code and use flex, as was done with treelang. 1042 1043@item 1044parser. This breaks the program into recognizable constructs such as 1045expressions, statements etc. This is @file{parse.y} in treelang, which is 1046passed through bison, which is a yacc variant, to produce C code 1047@file{parse.c}. 1048 1049@item 1050back end interface. This interfaces to the code generation back end. In 1051treelang, this is @file{tree1.c} which mainly interfaces to @file{toplev.c} and 1052@file{treetree.c} which mainly interfaces to everything else. Many languages 1053mix up the back end interface with the parser, as in the C compiler for 1054example. It is a matter of taste which way to do it, but with treelang 1055it is separated out to make the back end interface cleaner and easier to 1056understand. 1057 1058@item 1059header files. For function prototypes and common data items. One point 1060to note here is that bison can generate a header files with all the 1061numbers is has assigned to the keywords and symbols, and you can include 1062the same header in your lexer. This technique is demonstrated in 1063treelang. 1064 1065@item 1066compiler main file. GCC comes with a file @file{toplev.c} which is a 1067perfectly serviceable main program for your compiler. GNU Treelang uses 1068@file{toplev.c} but other languages have been known to replace it with their 1069own main program. Again this is a matter of taste and how much code you 1070want to write. 1071 1072@end itemize 1073 1074@end enumerate 1075 1076@node treelang compiler interfaces, Hints and tips, treelang files, treelang internals 1077@section treelang compiler interfaces 1078 1079@cindex driver 1080@cindex toplev.c 1081 1082@menu 1083* treelang driver:: 1084* treelang main compiler:: 1085@end menu 1086 1087@node treelang driver, treelang main compiler, treelang compiler interfaces, treelang compiler interfaces 1088@subsection treelang driver 1089 1090The GCC compiler consists of a driver, which then executes the various 1091compiler phases based on the instructions in the specs files. 1092 1093Typically a program's language will be identified from its suffix 1094(e.g., @file{.tree}) for treelang programs. 1095 1096The driver (@file{gcc.c}) will then drive (exec) in turn a preprocessor, 1097the main compiler, the assembler and the link editor. Options to GCC allow you 1098to override all of this. In the case of treelang programs there is no 1099preprocessor, and mostly these days the C preprocessor is run within the 1100main C compiler rather than as a separate process, apparently for reasons of speed. 1101 1102You will be using the standard assembler and linkage editor so these are 1103ignored from now on. 1104 1105You have to write your own preprocessor if you want one. This is usually 1106totally language specific. The main point to be aware of is to ensure 1107that you find some way to pass file name and line number information 1108through to the main compiler so that it can tell the back end this 1109information and so the debugger can find the right source line for each 1110piece of code. That is all there is to say about the preprocessor except 1111that the preprocessor will probably not be the slowest part of the 1112compiler and will probably not use the most memory so don't waste too 1113much time tuning it until you know you need to do so. 1114 1115@node treelang main compiler, , treelang driver, treelang compiler interfaces 1116@subsection treelang main compiler 1117 1118The main compiler for treelang consists of @file{toplev.c} from the main GCC 1119compiler, the parser, lexer and back end interface routines, and the 1120back end routines themselves, of which there are many. 1121 1122@file{toplev.c} does a lot of work for you and you should almost certainly 1123use it. 1124 1125Writing this code is the hard part of creating a compiler using GCC. The 1126back end interface documentation is incomplete and the interface is 1127complex. 1128 1129There are three main aspects to interfacing to the other GCC code. 1130 1131@menu 1132* Interfacing to toplev.c:: 1133* Interfacing to the garbage collection:: 1134* Interfacing to the code generation code. :: 1135@end menu 1136 1137@node Interfacing to toplev.c, Interfacing to the garbage collection, treelang main compiler, treelang main compiler 1138@subsubsection Interfacing to toplev.c 1139 1140In treelang this is handled mainly in tree1.c 1141and partly in treetree.c. Peruse toplev.c for details of what you need 1142to do. 1143 1144@node Interfacing to the garbage collection, Interfacing to the code generation code. , Interfacing to toplev.c, treelang main compiler 1145@subsubsection Interfacing to the garbage collection 1146 1147Interfacing to the garbage collection. In treelang this is mainly in 1148tree1.c. 1149 1150Memory allocation in the compiler should be done using the ggc_alloc and 1151kindred routines in ggc*.*. At the end of every 'function' in your language, toplev.c calls 1152the garbage collection several times. The garbage collection calls mark 1153routines which go through the memory which is still used, telling the 1154garbage collection not to free it. Then all the memory not used is 1155freed. 1156 1157What this means is that you need a way to hook into this marking 1158process. This is done by calling ggc_add_root. This provides the address 1159of a callback routine which will be called duing garbage collection and 1160which can call ggc_mark to save the storage. If storage is only 1161used within the parsing of a function, you do not need to provide a way 1162to mark it. 1163 1164Note that you can also call ggc_mark_tree to mark any of the back end 1165internal 'tree' nodes. This routine will follow the branches of the 1166trees and mark all the subordinate structures. This is useful for 1167example when you have created a variable declaration that will be used 1168across multiple functions, or for a function declaration (from a 1169prototype) that may be used later on. See the next item for more on the 1170tree nodes. 1171 1172@node Interfacing to the code generation code. , , Interfacing to the garbage collection, treelang main compiler 1173@subsubsection Interfacing to the code generation code. 1174 1175In treelang this is done in treetree.c. A typedef called 'tree' which is 1176defined in tree.h and tree.def in the GCC directory and largely 1177implemented in tree.c and stmt.c forms the basic interface to the 1178compiler back end. 1179 1180In general you call various tree routines to generate code, either 1181directly or through toplev.c. You build up data structures and 1182expressions in similar ways. 1183 1184You can read some documentation on this which can be found via the GCC 1185main web page. In particular, the documentation produced by Joachim 1186Nadler and translated by Tim Josling can be quite useful. the C compiler 1187also has documentation in the main GCC manual (particularly the current 1188CVS version) which is useful on a lot of the details. 1189 1190In time it is hoped to enhance this document to provide a more 1191comprehensive overview of this topic. The main gap is in explaining how 1192it all works together. 1193 1194@node Hints and tips, , treelang compiler interfaces, treelang internals 1195@section Hints and tips 1196 1197@itemize @bullet 1198 1199@item 1200TAGS: Use the make ETAGS commands to create TAGS files which can be used in 1201emacs to jump to any symbol quickly. 1202 1203@item 1204GREP: grep is also a useful way to find all uses of a symbol. 1205 1206@item 1207TREE: The main routines to look at are tree.h and tree.def. You will 1208probably want a hardcopy of these. 1209 1210@item 1211SAMPLE: look at the sample interfacing code in treetree.c. You can use 1212gdb to trace through the code and learn about how it all works. 1213 1214@item 1215GDB: the GCC back end works well with gdb. It traps abort() and allows 1216you to trace back what went wrong. 1217 1218@item 1219Error Checking: The compiler back end does some error and consistency 1220checking. Often the result of an error is just no code being 1221generated. You will then need to trace through and find out what is 1222going wrong. The rtl dump files can help here also. 1223 1224@item 1225rtl dump files: The main compiler documents these files which are dumps 1226of the rtl (intermediate code) which is manipulated doing the code 1227generation process. This can provide useful clues about what is going 1228wrong. The rtl 'language' is documented in the main GCC manual. 1229 1230@end itemize 1231 1232@end ifset 1233 1234@node Open Questions, Bugs, treelang internals, Top 1235@chapter Open Questions 1236 1237If you know GCC well, please consider looking at the file treetree.c and 1238resolving any questions marked "???". 1239 1240@node Bugs, Service, Open Questions, Top 1241@chapter Reporting Bugs 1242@cindex bugs 1243@cindex reporting bugs 1244 1245You can report bugs to @email{@value{email-bugs}}. Please make 1246sure bugs are real before reporting them. Follow the guidelines in the 1247main GCC manual for submitting bug reports. 1248 1249@menu 1250* Sending Patches:: 1251@end menu 1252 1253@node Sending Patches, , Bugs, Bugs 1254@section Sending Patches for GNU Treelang 1255 1256If you would like to write bug fixes or improvements for the GNU 1257Treelang compiler, that is very helpful. Send suggested fixes to 1258@email{@value{email-patches}}. 1259 1260@node Service, Projects, Bugs, Top 1261@chapter How To Get Help with GNU Treelang 1262 1263If you need help installing, using or changing GNU Treelang, there are two 1264ways to find it: 1265 1266@itemize @bullet 1267 1268@item 1269Look in the service directory for someone who might help you for a fee. 1270The service directory is found in the file named @file{SERVICE} in the 1271GCC distribution. 1272 1273@item 1274Send a message to @email{@value{email-general}}. 1275 1276@end itemize 1277 1278@end ifset 1279@ifset INTERNALS 1280 1281@node Projects, Index, Service, Top 1282@chapter Projects 1283@cindex projects 1284 1285If you want to contribute to @code{treelang} by doing research, 1286design, specification, documentation, coding, or testing, 1287the following information should give you some ideas. 1288 1289Send a message to @email{@value{email-general}} if you plan to add a 1290feature. 1291 1292The main requirement for treelang is to add features and to add 1293documentation. Features are things that the GCC back end can do but 1294which are not reflected in treelang. Examples include structures, 1295unions, pointers, arrays. 1296 1297@end ifset 1298 1299@node Index, , Projects, Top 1300@unnumbered Index 1301 1302@printindex cp 1303@summarycontents 1304@contents 1305@bye 1306