README.md
1# mcelog
2
3mcelog is the user space backend for logging machine check errors reported
4by the hardware to the kernel. The kernel does the immediate actions
5(like killing processes etc.) and mcelog decodes the errors and manages
6various other advanced error responses like offlining memory, CPUs or triggering
7events. In addition mcelog also handles corrected errors, by logging and
8accounting them.
9It primarily handles machine checks and thermal events, which are reported
10for errors detected by the CPU.
11
12For more details on what mcelog can do and the underlying theory
13see [mcelog.org](https://www.mcelog.org).
14
15It is recommended that mcelog runs on all x86 machines, both 64bit
16(since early 2.6) and 32bit (since 2.6.32).
17
18mcelog can run in several modes:
19
20- cronjob
21- trigger
22- daemon
23
24**cronjob** is the old method. mcelog runs every 5 minutes from cron and checks
25for errors. Disadvantage of this is that it can delay error reporting
26significantly (upto 10 minutes) and does not allow mcelog to keep extended state.
27
28**trigger** is a newer method where the kernel runs mcelog on a error.
29
30This is configured with:
31```sh
32echo /usr/sbin/mcelog > /sys/devices/system/machinecheck/machinecheck0/trigger
33```
34This is faster, but still doesn't allow mcelog to keep state,
35and has relatively high overhead for each error because a program has
36to be initialized from scratch.
37
38In **daemon** mode mcelog runs continuously as a daemon in the background and
39wait for errors. It is enabled by running `mcelog --daemon &`
40from a init script. This is the fastest and most feature-ful.
41
42The recommended mode is **daemon**, because several new functions (like page
43error predictive failure analysis) require a continuously running daemon.
44
45## Documentation
46
47- The primary reference documentation are the man pages.
48- [lk10-mcelog.pdf](lk10-mcelog.pdf)
49 has a overview over the errors mcelog handles (originally from Linux Kongress 2010).
50- [mce.pdf](mce.pdf)
51 is a very old paper describing the first releases of mcelog (some parts are obsolete).
52
53## For distributors
54
55You can run mcelog from systemd or similar daemons. An example systemd unit
56file is in `mcelog.service`.
57
58By default mcelog reports its version as the git tag. This can be overridden
59by setting up a `.os_version` file in the source directory. A build system
60could write the OS version to this file to mark the binary.
61
62### For older distributions using init scripts
63
64Please install an init script by default that runs mcelog in daemon mode.
65The `mcelog.init` script is a good starting point. Also install a
66logrotated file (mcelog.logrotate) or equivalent when mcelog is running
67in daemon mode.
68These two are not in make install.
69
70The installation also requires a config file `/etc/mcelog.conf` and the default
71triggers. These are all installed by `make install`
72
73`/dev/mcelog` is needed for mcelog operation. If it's not there it can be
74created with:
75```sh
76mknod /dev/mcelog c 10 227
77```
78
79Normally it should be created automatically in udev.
80
81## Security
82
83mcelog needs to run as root because it might trigger actions like
84page-offlining, which require `CAP_SYS_ADMIN`. Also it opens `/dev/mcelog`
85and an UNIX socket for client support.
86
87It also opens `/dev/mem` to parse the BIOS DMI tables. It is careful to close
88the file descriptor and unmap any mappings after using them.
89
90There is support for changing the user in daemon mode after opening the device
91and the sockets, but that would stop triggers from doing corrective action
92that require `root`.
93
94In principle it would be possible to only keep `CAP_SYS_ADMIN` for page-offling,
95but that would prevent triggers from doing root-only actions not covered by
96it (and `CAP_SYS_ADMIN` is not that different from full root)
97
98In `daemon` mode mcelog listens to a UNIX socket and processes requests from
99`sh mcelog --client`. This can be disabled in the configuration file.
100The uid/gid of the requestor is checked on access and is configurable
101(default 0/0 only). The command parsing code is very straight forward
102(server.c). The client parsing/reply is currently done with full privileges
103of the `daemon`.
104
105## Testing
106
107There is a simple test suite in `sh tests/`. The test suite requires root to
108run and access to mce-inject and a kernel with MCE injection support
109`CONFIG_X86_MCE_INJECT`. It will kill any running mcelog daemon.
110
111Run it with `sh make test`.
112
113The test suite requires the
114[mce-inject](git://git.kernel.org/pub/utils/cpu/mce/mce-inject.git) tool.
115The `mce-inject` executable must be either in `$PATH` or in the
116`../mce-inject` directory.
117
118You can also test under **valgrind** with `sh make valgrind-test`. For this
119valgrind needs to be installed of course. Advanced valgrind options can be
120specified with:
121```sh
122make VALGRIND="valgrind --option" valgrind-test
123```
124
125### Other checks
126
127`make iccverify` and `make clangverify` run the static verifiers in *clang*
128and *icc* respectively.
129
130## License
131
132This program is licensed under the subject of the GNU Public General
133License, v.2
134
README.releases
1mcelog used to do released, but now switched to a rolling release
2scheme. That means the git tree is always kept stable and can
3be used directly in production.
4
5To simplify package management which likes to have
6increasing version numbers the commits are regularly tagged
7with a number. The number starts (arbitarily) with 100.
8
9The tags are named vXXX (e.g. v100)
10