1/* Copyright (c) 2010 Jan Waclawek
2   Copyright (c) 2010 Joerg Wunsch
3   All rights reserved.
4
5   Redistribution and use in source and binary forms, with or without
6   modification, are permitted provided that the following conditions are met:
7
8   * Redistributions of source code must retain the above copyright
9     notice, this list of conditions and the following disclaimer.
10   * Redistributions in binary form must reproduce the above copyright
11     notice, this list of conditions and the following disclaimer in
12     the documentation and/or other materials provided with the
13     distribution.
14
15  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
16  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
19  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
20  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
21  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
22  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
23  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
24  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
25  POSSIBILITY OF SUCH DAMAGE. */
26
27/* $Id: optimize.dox 2160 2010-06-10 20:45:42Z joerg_wunsch $ */
28
29/** \page optimization Compiler optimization
30
31\section optim_code_reorder Problems with reordering code
32\author Jan Waclawek
33
34Programs contain sequences of statements, and a naive compiler would
35execute them exactly in the order as they are written. But an
36optimizing compiler is free to \e reorder the statements - or even
37parts of them - if the resulting "net effect" is the same. The
38"measure" of the "net effect" is what the standard calls "side
39effects", and is accomplished exclusively through accesses (reads and
40writes) to variables qualified as \c volatile. So, as long as all
41volatile reads and writes are to the same addresses and in the same
42order (and writes write the same values), the program is correct,
43regardless of other operations in it. (One important point to note
44here is, that time duration between consecutive volatile accesses is
45not considered at all.)
46
47Unfortunately, there are also operations which are not covered by
48volatile accesses. An example of this in avr-gcc/avr-libc are the
49cli() and sei() macros defined in <avr/interrupt.h>, which convert
50directly to the respective assembler mnemonics through the __asm__()
51statement. These don't constitute a variable access at all, not even
52volatile, so the compiler is free to move them around. Although there
53is a "volatile" qualifier which can be attached to the __asm__()
54statement, its effect on (re)ordering is not clear from the
55documentation (and is more likely only to prevent complete removal by
56the optimiser), as it (among other) states:
57
58<em>Note that even a volatile asm instruction can be moved
59relative to other code, including across jump instructions. [...]
60Similarly, you can't expect a sequence of volatile asm instructions to
61remain perfectly consecutive.</em>
62
63\sa http://gcc.gnu.org/onlinedocs/gcc-4.3.4/gcc/Extended-Asm.html
64
65There is another mechanism which can be used to achieve something
66similar: <em>memory barriers</em>. This is accomplished through adding a
67special "memory" clobber to the inline \c asm statement, and ensures that
68all variables are flushed from registers to memory before the
69statement, and then re-read after the statement. The purpose of memory
70barriers is slightly different than to enforce code ordering: it is
71supposed to ensure that there are no variables "cached" in registers,
72so that it is safe to change the content of registers e.g. when
73switching context in a multitasking OS (on "big" processors with
74out-of-order execution they also imply usage of special instructions
75which force the processor into "in-order" state (this is not the case
76of AVRs)).
77
78However, memory barrier works well in ensuring that all volatile
79accesses before and after the barrier occur in the given order with
80respect to the barrier. However, it does not ensure the compiler
81moving non-volatile-related statements across the barrier. Peter
82Dannegger provided a nice example of this effect:
83
84\code
85#define cli() __asm volatile( "cli" ::: "memory" )
86#define sei() __asm volatile( "sei" ::: "memory" )
87
88unsigned int ivar;
89
90void test2( unsigned int val )
91{
92  val = 65535U / val;
93
94  cli();
95
96  ivar = val;
97
98  sei();
99}
100\endcode
101
102compiles with optimisations switched on (-Os) to
103
104\verbatim
10500000112 <test2>:
106 112:	bc 01       	movw	r22, r24
107 114:	f8 94       	cli
108 116:	8f ef       	ldi	r24, 0xFF	; 255
109 118:	9f ef       	ldi	r25, 0xFF	; 255
110 11a:	0e 94 96 00 	call	0x12c	; 0x12c <__udivmodhi4>
111 11e:	70 93 01 02 	sts	0x0201, r23
112 122:	60 93 00 02 	sts	0x0200, r22
113 126:	78 94       	sei
114 128:	08 95       	ret
115\endverbatim
116
117where the potentially slow division is moved across cli(),
118resulting in interrupts to be disabled longer than intended. Note,
119that the volatile access occurs in order with respect to cli() or
120sei(); so the "net effect" required by the standard is achieved as
121intended, it is "only" the timing which is off. However, for most of
122embedded applications, timing is an important, sometimes critical
123factor.
124
125\sa https://www.mikrocontroller.net/topic/65923
126
127Unfortunately, at the moment, in avr-gcc (nor in the C standard),
128there is no mechanism to enforce complete match of written and
129executed code ordering - except maybe of switching the optimization
130completely off (-O0), or writing all the critical code in assembly.
131
132To sum it up:
133
134\li memory barriers ensure proper ordering of volatile accesses
135\li memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier
136
137*/
138