• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..14-Dec-2021-

internal/H14-Dec-2021-479,714400,241

README.mdH A D14-Dec-20215.5 KiB11982

abi-internal.mdH A D14-Dec-202136.7 KiB871707

doc.goH A D14-Dec-202110.1 KiB2621

main.goH A D14-Dec-20211.3 KiB5847

README.md

1<!---
2// Copyright 2018 The Go Authors. All rights reserved.
3// Use of this source code is governed by a BSD-style
4// license that can be found in the LICENSE file.
5-->
6
7## Introduction to the Go compiler
8
9`cmd/compile` contains the main packages that form the Go compiler. The compiler
10may be logically split in four phases, which we will briefly describe alongside
11the list of packages that contain their code.
12
13You may sometimes hear the terms "front-end" and "back-end" when referring to
14the compiler. Roughly speaking, these translate to the first two and last two
15phases we are going to list here. A third term, "middle-end", often refers to
16much of the work that happens in the second phase.
17
18Note that the `go/*` family of packages, such as `go/parser` and `go/types`,
19have no relation to the compiler. Since the compiler was initially written in C,
20the `go/*` packages were developed to enable writing tools working with Go code,
21such as `gofmt` and `vet`.
22
23It should be clarified that the name "gc" stands for "Go compiler", and has
24little to do with uppercase "GC", which stands for garbage collection.
25
26### 1. Parsing
27
28* `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
29
30In the first phase of compilation, source code is tokenized (lexical analysis),
31parsed (syntax analysis), and a syntax tree is constructed for each source
32file.
33
34Each syntax tree is an exact representation of the respective source file, with
35nodes corresponding to the various elements of the source such as expressions,
36declarations, and statements. The syntax tree also includes position information
37which is used for error reporting and the creation of debugging information.
38
39### 2. Type-checking and AST transformations
40
41* `cmd/compile/internal/gc` (create compiler AST, type checking, AST transformations)
42
43The gc package includes an AST definition carried over from when it was written
44in C. All of its code is written in terms of it, so the first thing that the gc
45package must do is convert the syntax package's syntax tree to the compiler's
46AST representation. This extra step may be refactored away in the future.
47
48The AST is then type-checked. The first steps are name resolution and type
49inference, which determine which object belongs to which identifier, and what
50type each expression has. Type-checking includes certain extra checks, such as
51"declared and not used" as well as determining whether or not a function
52terminates.
53
54Certain transformations are also done on the AST. Some nodes are refined based
55on type information, such as string additions being split from the arithmetic
56addition node type. Some other examples are dead code elimination, function call
57inlining, and escape analysis.
58
59### 3. Generic SSA
60
61* `cmd/compile/internal/gc` (converting to SSA)
62* `cmd/compile/internal/ssa` (SSA passes and rules)
63
64
65In this phase, the AST is converted into Static Single Assignment (SSA) form, a
66lower-level intermediate representation with specific properties that make it
67easier to implement optimizations and to eventually generate machine code from
68it.
69
70During this conversion, function intrinsics are applied. These are special
71functions that the compiler has been taught to replace with heavily optimized
72code on a case-by-case basis.
73
74Certain nodes are also lowered into simpler components during the AST to SSA
75conversion, so that the rest of the compiler can work with them. For instance,
76the copy builtin is replaced by memory moves, and range loops are rewritten into
77for loops. Some of these currently happen before the conversion to SSA due to
78historical reasons, but the long-term plan is to move all of them here.
79
80Then, a series of machine-independent passes and rules are applied. These do not
81concern any single computer architecture, and thus run on all `GOARCH` variants.
82
83Some examples of these generic passes include dead code elimination, removal of
84unneeded nil checks, and removal of unused branches. The generic rewrite rules
85mainly concern expressions, such as replacing some expressions with constant
86values, and optimizing multiplications and float operations.
87
88### 4. Generating machine code
89
90* `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
91* `cmd/internal/obj` (machine code generation)
92
93The machine-dependent phase of the compiler begins with the "lower" pass, which
94rewrites generic values into their machine-specific variants. For example, on
95amd64 memory operands are possible, so many load-store operations may be combined.
96
97Note that the lower pass runs all machine-specific rewrite rules, and thus it
98currently applies lots of optimizations too.
99
100Once the SSA has been "lowered" and is more specific to the target architecture,
101the final code optimization passes are run. This includes yet another dead code
102elimination pass, moving values closer to their uses, the removal of local
103variables that are never read from, and register allocation.
104
105Other important pieces of work done as part of this step include stack frame
106layout, which assigns stack offsets to local variables, and pointer liveness
107analysis, which computes which on-stack pointers are live at each GC safe point.
108
109At the end of the SSA generation phase, Go functions have been transformed into
110a series of obj.Prog instructions. These are passed to the assembler
111(`cmd/internal/obj`), which turns them into machine code and writes out the
112final object file. The object file will also contain reflect data, export data,
113and debugging information.
114
115### Further reading
116
117To dig deeper into how the SSA package works, including its passes and rules,
118head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).
119