1# Why Include What You Use Is Difficult #
2
3This section is informational, for folks who are wondering why include-what-you-use requires so much code and yet still has so many errors.
4
5Include-what-you-use has the most problems with templates and macros. If your code doesn't use either, IWYU will probably do great. And, you're probably not actually programming in C++...
6
7## Use Versus Forward Declare ##
8
9Include-what-you-use has to be able to tell when a symbol is being used in a way that you can forward-declare it. Otherwise, if you wrote
10
11    vector<MyClass*> foo;
12
13IWYU would tell you to `#include "myclass.h"`, when perhaps the whole reason you're using a pointer here is to avoid the need for that `#include`.
14
15In the above case, it's pretty easy for IWYU to tell that we can safely forward-declare `MyClass`. But now consider
16
17    vector<MyClass> foo;        // requires full definition of MyClass
18    scoped_ptr<MyClass> foo;    // forward-declaring MyClass is often ok
19
20To distinguish these, clang has to instantiate the vector and scoped_ptr template classes, including analyzing all member variables and the bodies of the constructor and destructor (and recursively for superclasses).
21
22But that's not enough: when instantiating the templates, we need to keep track of which symbols come from template arguments and which don't. For instance, suppose you call `MyFunc<MyClass>()`, where `MyFunc` looks like this:
23
24    template<typename T> void MyFunc() {
25      T* t;
26      MyClass myclass;
27      ...
28    }
29
30In this case, the caller of `MyFunc` is not using the full type of `MyClass`, because the template parameter is only used as a pointer. On the other hand, the file that defines `MyFunc` is using the full type information for `MyClass`. The end result is that the caller can forward-declare `MyClass`, but the file defining `MyFunc` has to `#include "myclass.h"`.
31
32## Handling Template Arguments ##
33
34Even figuring out what types are 'used' with a template can be difficult. Consider the following two declarations:
35
36    vector<MyClass> v;
37    hash_set<MyClass> h;
38
39These both have default template arguments, so are parsed like
40
41    vector<MyClass, alloc<MyClass> > v;
42    hash_set<MyClass, hash<MyClass>, equal_to<MyClass>, alloc<MyClass> > h;
43
44What symbols should we say are used? If we say `alloc<MyClass>` is used when you declare a vector, then every file that `#includes` `<vector>` will also need to `#include <memory>`.
45
46So it's tempting to just ignore default template arguments. But that's not right either. What if `hash<MyClass>` is defined in some local `myhash.h` file (as `hash<string>` often is)? Then we want to make sure IWYU says to `#include "myhash.h"` when you create the hash_set (otherwise the code won't compile). That requires paying attention to the default template argument. Figuring out how to handle default template arguments can get very complex.
47
48Even normal template arguments can be confusing. Consider this templated function:
49
50    template<typename A, typename B, typename C> void MyFunc(A (*fn)(B,C))
51    { ... }
52
53and you call `MyFunc(FunctionReturningAFunctionPointer())`. What types are being used where, in this case?
54
55## Who is Responsible for Dependent Template Types? ##
56
57If you say `vector<MyClass> v;`, it's clear that you, and not `vector.h` are responsible for the use of `MyClass`, even though all the functions that use `MyClass` are defined in `vector.h`. (OK, technically, these functions are not "defined" in a particular location, they're instantiated from template methods written in `vector.h`, but for us it works out the same.)
58
59When you say `hash_map<MyClass, int> h;`, you are likewise responsible for `MyClass` (and `int`), but are you responsible for `pair<MyClass, int>`? That is the type that hash_map uses to store your entries internally, and it depends on one of your template arguments, but even so  it shouldn't be your responsibility -- it's an implementation detail of hash_map. Of course, if you say `hash_map<pair<int, int>, int>`, then you are responsible for the use of `pair`. Distinguishing these two cases from each other, and from the vector case, can be difficult.
60
61Now suppose there's a template function like this:
62
63    template<typename T> void MyFunc(T t) {
64      strcat(t, 'a');
65      strchr(t, 'a');
66      cerr << t;
67    }
68
69If you call `MyFunc(some_char_star)`, which of these symbols are you responsible for, and which is the author of `MyFunc` responsible for: `strcat`, `strchr`, `operator<<(ostream&, T)`?
70
71`strcat` is a normal function, and the author of `MyFunc` is responsible for its use. This is an easy case.
72
73In C++, `strchr` is a templatized function (different impls for `char*` and `const char*`). Which version is called depends on the template argument. So, naively, we'd conclude that the caller is responsible for the use of `strchr`. However, that's ridiculous; we don't want caller of `MyFunc` to have to `#include <string.h>` just to call `MyFunc`. We have special code that (usually) handles this kind of case.
74
75`operator<<` is also a templated function, but it's one that may be defined in lots of different files. It would be ridiculous in its own way if `MyFunc` was responsible for including every file that defines `operator<<(ostream&, T)` for all `T`. So, unlike the two cases above, the caller is the one responsible for the use of `operator<<`, and will have to `#include` the file that defines it. It's counter-intuitive, perhaps, but the alternatives are all worse.
76
77As you can imagine, distinguishing all these cases is extremely difficult. To get it exactly right would require re-implementing C++'s (byzantine) lookup rules, which we have not yet tackled.
78
79## Template Template Types ##
80
81Let's say you have a function
82
83    template<template<typename U> T> void MyFunc() {
84      T<string> t;
85    }
86
87And you call `MyFunc<hash_set>`. Who is responsible for the 'use' of `hash<string>`, and thus needs to `#include "myhash.h"`? I think it has to be the caller, even if the  caller never uses the `string` type in its file at all. This is rather counter-intuitive. Luckily, it's also rather rare.
88
89## Typedefs ##
90
91Suppose you `#include` a file `"foo.h"` that has typedef `hash_map<Foo, Bar> MyMap;`. And you have this code:
92
93    for (MyMap::iterator it = ...)
94
95Who, if anyone, is using the symbol `hash_map<Foo, Bar>::iterator`? If we say you, as the author of the for-loop, are the user, then you must `#include <hash_map>`, which undoubtedly goes against the goal of the typedef (you shouldn't even have to know you're using a hash_map). So we want to say the author of the typedef is responsible for the use. But how could the author of the typedef know that you were going to use `MyMap::iterator`? It can't predict that. That means it has to be responsible for every possible use of the typedef type. This can be complicated to figure out. It requires instantiating all methods of the underlying type, some of which might not even be legal C++ (if, say, the class uses SFINAE).
96
97Worse, when the language auto-derives template types, it loses typedef information. Suppose you wrote this:
98
99    MyMap m;
100    find(m.begin(), m.end(), some_foo);
101
102The compiler sees this as syntactic sugar for `find<hash_map<Foo, Bar, hash<Foo>, equal_to<Foo>, alloc<Foo> >(m.begin(), m.end(), some_foo);`
103
104Not only is the template argument `hash_map` instead of `MyMap`, it includes all the default template arguments, with no indication they're default arguments. All the tricks we used above to intelligently ignore default template arguments are worthless here. We have to jump through lots of hoops so this code doesn't require you to `#include` not only `<hash_map>`, but `<alloc>` and `<utility>` as well.
105
106## Macros ##
107
108It's no surprise macros cause a huge problem for include-what-you-use. Basically, all the problems of templates also apply to macros, but worse: with templates you can analyze the uninstantiated template, but with macros, you can't analyze the uninstantiated macro -- it likely doesn't even parse cleanly in isolation. As a result, we have very few tools to distinguish when the author of a macro is responsible for a symbol used in a macro, and when the caller of the macro is responsible.
109
110## Includes with Side Effects ##
111
112While not a major problem, this indicates the myriad "gotchas" that exist around include-what-you-use: removing an `#include` and replacing it with a forward-declare may be dangerous even if no symbols are fully used from the `#include`.  Consider the following code:
113
114    foo.h:
115      namespace ns { class Foo {}; }
116      using ns::Foo;
117
118    foo.cc:
119      #include "foo.h"
120      Foo* foo;`
121
122If IWYU just blindly replaces the `#include` with a forward declare such as `namespace ns { class Foo; }`, the code will break because of the lost using declaration. Include-what-you-use has to watch out for this case.
123
124Another case is a header file like this:
125
126    foo.h:
127      #define MODULE_NAME MyModule
128      #include "module_writer.h"
129
130We might think we can remove an `#include` of `foo.h` and replace it by `#include "module_writer.h"`, but that is likely to break the build if `module_writer.h` requires `MODULE_NAME` be defined.  Since my file doesn't participate in this dependency at all, it won't even notice it.  IWYU needs to keep track of dependencies between files it's not even trying to analyze!
131
132## Private Includes ##
133
134Suppose you write `vector<int> v;`. You are using vector, and thus have to `#include <vector>`. Even this seemingly easy case is difficult, because vector isn't actually defined in `<vector>`; it's defined in `<bits/stl_vector.h>`. The C++ standard library has hundreds of private files that users are not supposed to `#include` directly. Third party libraries have hundreds more.  There's no general way to distinguish private from public headers; we have to manually construct the proper mapping.
135
136In the future, we hope to provide a way for users to annotate if a file is public or private, either a comment or a `#pragma`. For now, we hard-code it in the IWYU tool.
137
138The mappings themselves can be ambiguous. For instance, `NULL` is provided by many files, including `stddef.h`, `stdlib.h`, and more. If you use `NULL`, what header file should IWYU suggest? We have rules to try to minimize the number of `#includes` you have to add; it can get rather involved.
139
140## Unparsed Code ##
141
142Conditional `#includes` are a problem for IWYU when the condition is false:
143
144    #if defined(LOG_VERBOSE)
145    #include <verbose_logger.h>
146    #endif
147
148    ...
149
150    void StartProcess() {
151        #if defined(LOG_VERBOSE)
152        LogVerbose("Starting process");
153        #endif
154        ...
155    }
156
157If you're running IWYU without that preprocessor definition set, it has no way of telling if `verbose_logger.h` is a necessary `#include` or not.
158
159## Placing New Includes and Forward-Declares ##
160
161Figuring out where to insert new `#includes` and forward-declares is a complex problem of its own (one that is the responsibility of `fix_includes.py`). In general, we want to put new `#includes` with existing `#includes`. But the existing `#includes` may be broken up into sections, either because of conditional `#includes` (with `#ifdefs`), or macros (such as `#define __GNU_SOURCE`), or for other reasons. Some forward-declares may need to come early in the file, and some may prefer to come later (after we're in an appropriate namespace, for instance).
162
163`fix_includes.py` tries its best to give pleasant-looking output, while being conservative about putting code in a place where it might not compile. It uses heuristics to do this, which are not yet perfect.
164