1=================================
2MergeFunctions pass, how it works
3=================================
4
5.. contents::
6   :local:
7
8Introduction
9============
10Sometimes code contains equal functions, or functions that does exactly the same
11thing even though they are non-equal on the IR level (e.g.: multiplication on 2
12and 'shl 1'). It could happen due to several reasons: mainly, the usage of
13templates and automatic code generators. Though, sometimes the user itself could
14write the same thing twice :-)
15
16The main purpose of this pass is to recognize such functions and merge them.
17
18This document is the extension to pass comments and describes the pass logic. It
19describes the algorithm that is used in order to compare functions and
20explains how we could combine equal functions correctly to keep the module
21valid.
22
23Material is brought in a top-down form, so the reader could start to learn pass
24from high level ideas and end with low-level algorithm details, thus preparing
25him or her for reading the sources.
26
27The main goal is to describe the algorithm and logic here and the concept. If
28you *don't want* to read the source code, but want to understand pass
29algorithms, this document is good for you. The author tries not to repeat the
30source-code and covers only common cases to avoid the cases of needing to
31update this document after any minor code changes.
32
33
34What should I know to be able to follow along with this document?
35-----------------------------------------------------------------
36
37The reader should be familiar with common compile-engineering principles and
38LLVM code fundamentals. In this article, we assume the reader is familiar with
39`Single Static Assignment
40<http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
41concept and has an understanding of
42`IR structure <https://llvm.org/docs/LangRef.html#high-level-structure>`_.
43
44We will use terms such as
45"`module <https://llvm.org/docs/LangRef.html#high-level-structure>`_",
46"`function <https://llvm.org/docs/ProgrammersManual.html#the-function-class>`_",
47"`basic block <http://en.wikipedia.org/wiki/Basic_block>`_",
48"`user <https://llvm.org/docs/ProgrammersManual.html#the-user-class>`_",
49"`value <https://llvm.org/docs/ProgrammersManual.html#the-value-class>`_",
50"`instruction
51<https://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_".
52
53As a good starting point, the Kaleidoscope tutorial can be used:
54
55:doc:`tutorial/index`
56
57It's especially important to understand chapter 3 of tutorial:
58
59:doc:`tutorial/LangImpl03`
60
61The reader should also know how passes work in LLVM. They could use this
62article as a reference and start point here:
63
64:doc:`WritingAnLLVMPass`
65
66What else? Well perhaps the reader should also have some experience in LLVM pass
67debugging and bug-fixing.
68
69Narrative structure
70-------------------
71The article consists of three parts. The first part explains pass functionality
72on the top-level. The second part describes the comparison procedure itself.
73The third part describes the merging process.
74
75In every part, the author tries to put the contents in the top-down form.
76The top-level methods will first be described followed by the terminal ones at
77the end, in the tail of each part. If the reader sees the reference to the
78method that wasn't described yet, they will find its description a bit below.
79
80Basics
81======
82
83How to do it?
84-------------
85Do we need to merge functions? The obvious answer is: Yes, that is quite a
86possible case. We usually *do* have duplicates and it would be good to get rid
87of them. But how do we detect duplicates? This is the idea: we split functions
88into smaller bricks or parts and compare the "bricks" amount. If equal,
89we compare the "bricks" themselves, and then do our conclusions about functions
90themselves.
91
92What could the difference be? For example, on a machine with 64-bit pointers
93(let's assume we have only one address space), one function stores a 64-bit
94integer, while another one stores a pointer. If the target is the machine
95mentioned above, and if functions are identical, except the parameter type (we
96could consider it as a part of function type), then we can treat a ``uint64_t``
97and a ``void*`` as equal.
98
99This is just an example; more possible details are described a bit below.
100
101As another example, the reader may imagine two more functions. The first
102function performs a multiplication by 2, while the second one performs an
103logical left shift by 1.
104
105Possible solutions
106^^^^^^^^^^^^^^^^^^
107Let's briefly consider possible options about how and what we have to implement
108in order to create full-featured functions merging, and also what it would
109mean for us.
110
111Equal function detection obviously supposes that a "detector" method to be
112implemented and latter should answer the question "whether functions are equal".
113This "detector" method consists of tiny "sub-detectors", which each answers
114exactly the same question, but for function parts.
115
116As the second step, we should merge equal functions. So it should be a "merger"
117method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2*
118function, the result of merging.
119
120Having such routines in our hands, we can process a whole module, and merge all
121equal functions.
122
123In this case, we have to compare every function with every another function. As
124the reader may notice, this way seems to be quite expensive. Of course we could
125introduce hashing and other helpers, but it is still just an optimization, and
126thus the level of O(N*N) complexity.
127
128Can we reach another level? Could we introduce logarithmical search, or random
129access lookup? The answer is: "yes".
130
131Random-access
132"""""""""""""
133How it could this be done? Just convert each function to a number, and gather
134all of them in a special hash-table. Functions with equal hashes are equal.
135Good hashing means, that every function part must be taken into account. That
136means we have to convert every function part into some number, and then add it
137into the hash. The lookup-up time would be small, but such an approach adds some
138delay due to the hashing routine.
139
140Logarithmical search
141""""""""""""""""""""
142We could introduce total ordering among the functions set, once ordered we
143could then implement a logarithmical search. Lookup time still depends on N,
144but adds a little of delay (*log(N)*).
145
146Present state
147"""""""""""""
148Both of the approaches (random-access and logarithmical) have been implemented
149and tested and both give a very good improvement. What was most
150surprising is that logarithmical search was faster; sometimes by up to 15%. The
151hashing method needs some extra CPU time, which is the main reason why it works
152slower; in most cases, total "hashing" time is greater than total
153"logarithmical-search" time.
154
155So, preference has been granted to the "logarithmical search".
156
157Though in the case of need, *logarithmical-search* (read "total-ordering") could
158be used as a milestone on our way to the *random-access* implementation.
159
160Every comparison is based either on the numbers or on the flags comparison. In
161the *random-access* approach, we could use the same comparison algorithm.
162During comparison, we exit once we find the difference, but here we might have
163to scan the whole function body every time (note, it could be slower). Like in
164"total-ordering", we will track every number and flag, but instead of
165comparison, we should get the numbers sequence and then create the hash number.
166So, once again, *total-ordering* could be considered as a milestone for even
167faster (in theory) random-access approach.
168
169MergeFunctions, main fields and runOnModule
170^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
171There are two main important fields in the class:
172
173``FnTree``  – the set of all unique functions. It keeps items that couldn't be
174merged with each other. It is defined as:
175
176``std::set<FunctionNode> FnTree;``
177
178Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with
179implemented “<” operator among the functions set (below we explain how it works
180exactly; this is a key point in fast functions comparison).
181
182``Deferred`` – merging process can affect bodies of functions that are in
183``FnTree`` already. Obviously, such functions should be rechecked again. In this
184case, we remove them from ``FnTree``, and mark them to be rescanned, namely
185put them into ``Deferred`` list.
186
187runOnModule
188"""""""""""
189The algorithm is pretty simple:
190
1911. Put all module's functions into the *worklist*.
192
1932. Scan *worklist*'s functions twice: first enumerate only strong functions and
194then only weak ones:
195
196   2.1. Loop body: take a function from *worklist*  (call it *FCur*) and try to
197   insert it into *FnTree*: check whether *FCur* is equal to one of functions
198   in *FnTree*. If there *is* an equal function in *FnTree*
199   (call it *FExists*): merge function *FCur* with *FExists*. Otherwise add
200   the function from the *worklist* to *FnTree*.
201
2023. Once the *worklist* scanning and merging operations are complete, check the
203*Deferred* list. If it is not empty: refill the *worklist* contents with
204*Deferred* list and redo step 2, if the *Deferred* list is empty, then exit
205from method.
206
207Comparison and logarithmical search
208"""""""""""""""""""""""""""""""""""
209Let's recall our task: for every function *F* from module *M*, we have to find
210equal functions *F`* in the shortest time possible , and merge them into a
211single function.
212
213Defining total ordering among the functions set allows us to organize
214functions into a binary tree. The lookup procedure complexity would be
215estimated as O(log(N)) in this case. But how do we define *total-ordering*?
216
217We have to introduce a single rule applicable to every pair of functions, and
218following this rule, then evaluate which of them is greater. What kind of rule
219could it be? Let's declare it as the "compare" method that returns one of 3
220possible values:
221
222-1, left is *less* than right,
223
2240, left and right are *equal*,
225
2261, left is *greater* than right.
227
228Of course it means, that we have to maintain
229*strict and non-strict order relation properties*:
230
231* reflexivity (``a <= a``, ``a == a``, ``a >= a``),
232* antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``),
233* transitivity (``a <= b`` and ``b <= c``, then ``a <= c``)
234* asymmetry (if ``a < b``, then ``a > b`` or ``a == b``).
235
236As mentioned before, the comparison routine consists of
237"sub-comparison-routines", with each of them also consisting of
238"sub-comparison-routines", and so on. Finally, it ends up with primitive
239comparison.
240
241Below, we will use the following operations:
242
243#. ``cmpNumbers(number1, number2)`` is a method that returns -1 if left is less
244   than right; 0, if left and right are equal; and 1 otherwise.
245
246#. ``cmpFlags(flag1, flag2)`` is a hypothetical method that compares two flags.
247   The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and
248   ``false`` is 0.
249
250The rest of the article is based on *MergeFunctions.cpp* source code
251(found in *<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like
252to ask reader to keep this file open, so we could use it as a reference
253for further explanations.
254
255Now, we're ready to proceed to the next chapter and see how it works.
256
257Functions comparison
258====================
259At first, let's define how exactly we compare complex objects.
260
261Complex object comparison (function, basic-block, etc) is mostly based on its
262sub-object comparison results. It is similar to the next "tree" objects
263comparison:
264
265#. For two trees *T1* and *T2* we perform *depth-first-traversal* and have
266   two sequences as a product: "*T1Items*" and "*T2Items*".
267
268#. We then compare chains "*T1Items*" and "*T2Items*" in
269   the most-significant-item-first order. The result of items comparison
270   would be the result of *T1* and *T2* comparison itself.
271
272FunctionComparator::compare(void)
273---------------------------------
274A brief look at the source code tells us that the comparison starts in the
275“``int FunctionComparator::compare(void)``” method.
276
2771. The first parts to be compared are the function's attributes and some
278properties that is outside the “attributes” term, but still could make the
279function different without changing its body. This part of the comparison is
280usually done within simple *cmpNumbers* or *cmpFlags* operations (e.g.
281``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is a full list of function's
282properties to be compared on this stage:
283
284  * *Attributes* (those are returned by ``Function::getAttributes()``
285    method).
286
287  * *GC*, for equivalence, *RHS* and *LHS* should be both either without
288    *GC* or with the same one.
289
290  * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the
291    same section.
292
293  * *Variable arguments*. *LHS* and *RHS* should be both either with or
294    without *var-args*.
295
296  * *Calling convention* should be the same.
297
2982. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)``
299method. It checks return type and parameters type; the method itself will be
300described later.
301
3023. Associate function formal parameters with each other. Then comparing function
303bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then,
304we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s
305body, otherwise functions are different. On this stage we grant the preference
306to those we met later in function body (value we met first would be *less*).
307This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``”
308method (will be described a bit later).
309
3104. Function body comparison. As it written in method comments:
311
312“We do a CFG-ordered walk since the actual ordering of the blocks in the linked
313list is immaterial. Our walk starts at the entry block for both functions, then
314takes each block from each terminator in order. As an artifact, this also means
315that unreachable blocks are ignored.”
316
317So, using this walk we get BBs from *left* and *right* in the same order, and
318compare them by “``FunctionComparator::compare(const BasicBlock*, const
319BasicBlock*)``” method.
320
321We also associate BBs with each other, like we did it with function formal
322arguments (see ``cmpValues`` method below).
323
324FunctionComparator::cmpType
325---------------------------
326Consider how type comparison works.
327
3281. Coerce pointer to integer. If left type is a pointer, try to coerce it to the
329integer type. It could be done if its address space is 0, or if address spaces
330are ignored at all. Do the same thing for the right type.
331
3322. If left and right types are equal, return 0. Otherwise we need to give
333preference to one of them. So proceed to the next step.
334
3353. If types are of different kind (different type IDs). Return result of type
336IDs comparison, treating them as numbers (use ``cmpNumbers`` operation).
337
3384. If types are vectors or integers, return result of their pointers comparison,
339comparing them as numbers.
340
3415. Check whether type ID belongs to the next group (call it equivalent-group):
342
343   * Void
344
345   * Float
346
347   * Double
348
349   * X86_FP80
350
351   * FP128
352
353   * PPC_FP128
354
355   * Label
356
357   * Metadata.
358
359   If ID belongs to group above, return 0. Since it's enough to see that
360   types has the same ``TypeID``. No additional information is required.
361
3626. Left and right are pointers. Return result of address space comparison
363(numbers comparison).
364
3657. Complex types (structures, arrays, etc.). Follow complex objects comparison
366technique (see the very first paragraph of this chapter). Both *left* and
367*right* are to be expanded and their element types will be checked the same
368way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
369
3708. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't
371get any conclusions, then invoke ``llvm_unreachable``, since it's quite an
372unexpectable case.
373
374cmpValues(const Value*, const Value*)
375-------------------------------------
376Method that compares local values.
377
378This method gives us an answer to a very curious question: whether we could
379treat local values as equal, and which value is greater otherwise. It's
380better to start from example:
381
382Consider the situation when we're looking at the same place in left
383function "*FL*" and in right function "*FR*". Every part of *left* place is
384equal to the corresponding part of *right* place, and (!) both parts use
385*Value* instances, for example:
386
387.. code-block:: text
388
389   instr0 i32 %LV   ; left side, function FL
390   instr0 i32 %RV   ; right side, function FR
391
392So, now our conclusion depends on *Value* instances comparison.
393
394The main purpose of this method is to determine relation between such values.
395
396What can we expect from equal functions? At the same place, in functions
397"*FL*" and "*FR*" we expect to see *equal* values, or values *defined* at
398the same place in "*FL*" and "*FR*".
399
400Consider a small example here:
401
402.. code-block:: text
403
404  define void %f(i32 %pf0, i32 %pf1) {
405    instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
406  }
407
408.. code-block:: text
409
410  define void %g(i32 %pg0, i32 %pg1) {
411    instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
412  }
413
414In this example, *pf0* is associated with *pg0*, *pf1* is associated with
415*pg1*, and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*.
416
417Instructions with opcode "*instr0*" would be *equal*, since their types and
418opcodes are equal, and values are *associated*.
419
420Instructions with opcode "*instr1*" from *f* is *greater* than instructions
421with opcode "*instr1*" from *g*; here we have equal types and opcodes, but
422"*pf1* is greater than "*pg0*".
423
424Instructions with opcode "*instr2*" are equal, because their opcodes and
425types are equal, and the same constant is used as a value.
426
427What we associate in cmpValues?
428^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
429* Function arguments. *i*-th argument from left function associated with
430  *i*-th argument from right function.
431* BasicBlock instances. In basic-block enumeration loop we associate *i*-th
432  BasicBlock from the left function with *i*-th BasicBlock from the right
433  function.
434* Instructions.
435* Instruction operands. Note, we can meet *Value* here we have never seen
436  before. In this case it is not a function argument, nor *BasicBlock*, nor
437  *Instruction*. It is a global value. It is a constant, since it's the only
438  supposed global here. The method also compares: Constants that are of the
439  same type and if right constant can be losslessly bit-casted to the left
440  one, then we also compare them.
441
442How to implement cmpValues?
443^^^^^^^^^^^^^^^^^^^^^^^^^^^
444*Association* is a case of equality for us. We just treat such values as equal,
445but, in general, we need to implement antisymmetric relation. As mentioned
446above, to understand what is *less*, we can use order in which we
447meet values. If both values have the same order in a function (met at the same
448time), we then treat values as *associated*. Otherwise – it depends on who was
449first.
450
451Every time we run the top-level compare method, we initialize two identical
452maps (one for the left side, another one for the right side):
453
454``map<Value, int> sn_mapL, sn_mapR;``
455
456The key of the map is the *Value* itself, the *value* – is its order (call it
457*serial number*).
458
459To add value *V* we need to perform the next procedure:
460
461``sn_map.insert(std::make_pair(V, sn_map.size()));``
462
463For the first *Value*, map will return *0*, for the second *Value* map will
464return *1*, and so on.
465
466We can then check whether left and right values met at the same time with
467a simple comparison:
468
469``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);``
470
471Of course, we can combine insertion and comparison:
472
473.. code-block:: c++
474
475  std::pair<iterator, bool>
476    LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
477    = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
478  return cmpNumbers(LeftRes.first->second, RightRes.first->second);
479
480Let's look, how whole method could be implemented.
481
4821. We have to start with the bad news. Consider function self and
483cross-referencing cases:
484
485.. code-block:: c++
486
487  // self-reference unsigned fact0(unsigned n) { return n > 1 ? n
488  * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
489  fact1(n-1) : 1; }
490
491  // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
492  } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
493
494..
495
496  This comparison has been implemented in initial *MergeFunctions* pass
497  version. But, unfortunately, it is not transitive. And this is the only case
498  we can't convert to less-equal-greater comparison. It is a seldom case, 4-5
499  functions of 10000 (checked in test-suite), and, we hope, the reader would
500  forgive us for such a sacrifice in order to get the O(log(N)) pass time.
501
5022. If left/right *Value* is a constant, we have to compare them. Return 0 if it
503is the same constant, or use ``cmpConstants`` method otherwise.
504
5053. If left/right is *InlineAsm* instance. Return result of *Value* pointers
506comparison.
507
5084. Explicit association of *L* (left value) and *R*  (right value). We need to
509find out whether values met at the same time, and thus are *associated*. Or we
510need to put the rule: when we treat *L* < *R*. Now it is easy: we just return
511the result of numbers comparison:
512
513.. code-block:: c++
514
515   std::pair<iterator, bool>
516     LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
517     RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
518   if (LeftRes.first->second == RightRes.first->second) return 0;
519   if (LeftRes.first->second < RightRes.first->second) return -1;
520   return 1;
521
522Now when *cmpValues* returns 0, we can proceed the comparison procedure.
523Otherwise, if we get (-1 or 1), we need to pass this result to the top level,
524and finish comparison procedure.
525
526cmpConstants
527------------
528Performs constants comparison as follows:
529
5301. Compare constant types using ``cmpType`` method. If the result is -1 or 1,
531goto step 2, otherwise proceed to step 3.
532
5332. If types are different, we still can check whether constants could be
534losslessly bitcasted to each other. The further explanation is modification of
535``canLosslesslyBitCastTo`` method.
536
537   2.1 Check whether constants are of the first class types
538   (``isFirstClassType`` check):
539
540   2.1.1. If both constants are *not* of the first class type: return result
541   of ``cmpType``.
542
543   2.1.2. Otherwise, if left type is not of the first class, return -1. If
544   right type is not of the first class, return 1.
545
546   2.1.3. If both types are of the first class type, proceed to the next step
547   (2.1.3.1).
548
549   2.1.3.1. If types are vectors, compare their bitwidth using the
550   *cmpNumbers*. If result is not 0, return it.
551
552   2.1.3.2. Different types, but not a vectors:
553
554   * if both of them are pointers, good for us, we can proceed to step 3.
555   * if one of types is pointer, return result of *isPointer* flags
556     comparison (*cmpFlags* operation).
557   * otherwise we have no methods to prove bitcastability, and thus return
558     result of types comparison (-1 or 1).
559
560Steps below are for the case when types are equal, or case when constants are
561bitcastable:
562
5633. One of constants is a "*null*" value. Return the result of
564``cmpFlags(L->isNullValue, R->isNullValue)`` comparison.
565
5664. Compare value IDs, and return result if it is not 0:
567
568.. code-block:: c++
569
570  if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
571    return Res;
572
5735. Compare the contents of constants. The comparison depends on the kind of
574constants, but on this stage it is just a lexicographical comparison. Just see
575how it was described in the beginning of "*Functions comparison*" paragraph.
576Mathematically, it is equal to the next case: we encode left constant and right
577constant (with similar way *bitcode-writer* does). Then compare left code
578sequence and right code sequence.
579
580compare(const BasicBlock*, const BasicBlock*)
581---------------------------------------------
582Compares two *BasicBlock* instances.
583
584It enumerates instructions from left *BB* and right *BB*.
585
5861. It assigns serial numbers to the left and right instructions, using
587``cmpValues`` method.
588
5892. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as
590greater than other instructions. If both instructions are *GEPs* use ``cmpGEP``
591method for comparison. If result is -1 or 1, pass it to the top-level
592comparison (return it).
593
594   3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or
595   1, return it.
596
597   3.2. Compare number of operands, if result is -1 or 1, return it.
598
599   3.3. Compare operands themselves, use ``cmpValues`` method. Return result
600   if it is -1 or 1.
601
602   3.4. Compare type of operands, using ``cmpType`` method. Return result if
603   it is -1 or 1.
604
605   3.5. Proceed to the next instruction.
606
6074. We can finish instruction enumeration in 3 cases:
608
609   4.1. We reached the end of both left and right basic-blocks. We didn't
610   exit on steps 1-3, so contents are equal, return 0.
611
612   4.2. We have reached the end of the left basic-block. Return -1.
613
614   4.3. Return 1 (we reached the end of the right basic block).
615
616cmpGEP
617------
618Compares two GEPs (``getelementptr`` instructions).
619
620It differs from regular operations comparison with the only thing: possibility
621to use ``accumulateConstantOffset`` method.
622
623So, if we get constant offset for both left and right *GEPs*, then compare it as
624numbers, and return comparison result.
625
626Otherwise treat it like a regular operation (see previous paragraph).
627
628cmpOperation
629------------
630Compares instruction opcodes and some important operation properties.
631
6321. Compare opcodes, if it differs return the result.
633
6342. Compare number of operands. If it differs – return the result.
635
6363. Compare operation types, use *cmpType*. All the same – if types are
637different, return result.
638
6394. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData``
640method, and compare it like a numbers.
641
6425. Compare operand types.
643
6446. For some particular instructions, check equivalence (relation in our case) of
645some significant attributes. For example, we have to compare alignment for
646``load`` instructions.
647
648O(log(N))
649---------
650Methods described above implement order relationship. And latter, could be used
651for nodes comparison in a binary tree. So we can organize functions set into
652the binary tree and reduce the cost of lookup procedure from
653O(N*N) to O(log(N)).
654
655Merging process, mergeTwoFunctions
656==================================
657Once *MergeFunctions* detected that current function (*G*) is equal to one that
658were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*,
659Function*)``.
660
661Operation affects ``FnTree`` contents with next way: *F* will stay in
662``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of
663*G* would be replaced with something else. It changes bodies of callers. So,
664functions that calls *G* would be put into ``Deferred`` set and removed from
665``FnTree``, and analyzed again.
666
667The approach is next:
668
6691. Most wished case: when we can use alias and both of *F* and *G* are weak. We
670make both of them with aliases to the third strong function *H*. Actually *H*
671is *F*. See below how it's made (but it's better to look straight into the
672source code). Well, this is a case when we can just replace *G* with *F*
673everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*).
674
6752. *F* could not be overridden, while *G* could. It would be good to do the
676next: after merging the places where overridable function were used, still use
677overridable stub. So try to make *G* alias to *F*, or create overridable tail
678call wrapper around *F* and replace *G* with that call.
679
6803. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just
681change the callers: call *F* instead of *G*.  That's what
682``replaceDirectCallers`` does.
683
684Below is a detailed body description.
685
686If “F” may be overridden
687------------------------
688As follows from ``mayBeOverridden`` comments: “whether the definition of this
689global may be replaced by something non-equivalent at link time”. If so, that's
690ok: we can use alias to *F* instead of *G* or change call instructions itself.
691
692HasGlobalAliases, removeUsers
693^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
694First consider the case when we have global aliases of one function name to
695another. Our purpose is  make both of them with aliases to the third strong
696function. Though if we keep *F* alive and without major changes we can leave it
697in ``FnTree``. Try to combine these two goals.
698
699Do stub replacement of *F* itself with an alias to *F*.
700
7011. Create stub function *H*, with the same name and attributes like function
702*F*. It takes maximum alignment of *F* and *G*.
703
7042. Replace all uses of function *F* with uses of function *H*. It is the two
705steps procedure instead. First of all, we must take into account, all functions
706from whom *F* is called would be changed: since we change the call argument
707(from *F* to *H*). If so we must to review these caller functions again after
708this procedure. We remove callers from ``FnTree``, method with name
709``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``):
710
711   2.1. ``Inside removeUsers(Value*
712   V)`` we go through the all values that use value *V* (or *F* in our context).
713   If value is instruction, we go to function that holds this instruction and
714   mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove
715   caller from ``FnTree``.
716
717   2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``.
718
7193. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*.
720Do the same with *G*: replace it with alias to *F*. So finally everywhere *F*
721was used, we use *H* and it is alias to *F*, and everywhere *G* was used we
722also have alias to *F*.
723
7244. Set *F* linkage to private. Make it strong :-)
725
726No global aliases, replaceDirectCallers
727^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
728If global aliases are not supported. We call ``replaceDirectCallers``. Just
729go through all calls of *G* and replace it with calls of *F*. If you look into
730the method you will see that it scans all uses of *G* too, and if use is callee
731(if user is call instruction and *G* is used as what to be called), we replace
732it with use of *F*.
733
734If “F” could not be overridden, fix it!
735"""""""""""""""""""""""""""""""""""""""
736
737We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace
738*G* with alias to *F* first. The next conditions are essential:
739
740* target should support global aliases,
741* the address itself of  *G* should be not significant, not named and not
742  referenced anywhere,
743* function should come with external, local or weak linkage.
744
745Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*,
746so *G* could be replaced with this wrapper.
747
748*writeAlias*
749
750As follows from *llvm* reference:
751
752“Aliases act as *second name* for the aliasee value”. So we just want to create
753a second name for *F* and use it instead of *G*:
754
7551. create global alias itself (*GA*),
756
7572. adjust alignment of *F* so it must be maximum of current and *G's* alignment;
758
7593. replace uses of *G*:
760
761   3.1. first mark all callers of *G* as to-be-analyzed-again, using
762   ``removeUsers`` method (see chapter above),
763
764   3.2. call ``G->replaceAllUsesWith(GA)``.
765
7664. Get rid of *G*.
767
768*writeThunk*
769
770As it written in method comments:
771
772“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G
773with bitcast(F). Deletes G.”
774
775In general it does the same as usual when we want to replace callee, except the
776first point:
777
7781. We generate tail call wrapper around *F*, but with interface that allows use
779it instead of *G*.
780
7812. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then.
782
7833. Get rid of *G*.
784
785
786