• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

input/H20-Jul-2021-354260

tests/H03-May-2022-940630

triggers/H20-Jul-2021-30868

.gitignoreH A D20-Jul-202187 1312

.os_versionH A D03-May-20225 21

CHANGESH A D20-Jul-20214.9 KiB111108

LICENSEH A D20-Jul-202117.7 KiB340281

MakefileH A D03-May-20224 KiB144102

README.mdH A D20-Jul-20214.9 KiB13498

README.releasesH A D20-Jul-2021365 107

THIRD-PARTYH A D20-Jul-2021128 53

bitfield.cH A D20-Jul-20211.2 KiB6458

bitfield.hH A D20-Jul-20211,012 3828

broadwell_de.cH A D20-Jul-20213.1 KiB10576

broadwell_de.hH A D20-Jul-2021141 32

broadwell_epex.cH A D20-Jul-20215.4 KiB175131

broadwell_epex.hH A D20-Jul-2021127 32

bus.cH A D20-Jul-20213.3 KiB12896

bus.hH A D20-Jul-2021207 54

cache.cH A D03-May-20224.7 KiB204169

cache.hH A D20-Jul-202196 32

client.cH A D03-May-20222.1 KiB8455

client.hH A D20-Jul-202159 32

config-intro.manH A D20-Jul-2021242 119

config.cH A D03-May-20228.4 KiB403339

config.hH A D20-Jul-2021746 2520

core2.cH A D20-Jul-20212.9 KiB10689

core2.hH A D20-Jul-202174 32

denverton.cH A D20-Jul-20211.3 KiB4621

denverton.hH A D20-Jul-202174 21

dimm.cH A D20-Jul-202110.2 KiB453383

dimm.hH A D20-Jul-2021247 97

dmi.cH A D20-Jul-202116.5 KiB709596

dmi.hH A D20-Jul-20212 KiB8371

dunnington.cH A D20-Jul-20213.3 KiB11378

dunnington.hH A D20-Jul-202143 31

eventloop.cH A D03-May-20223.5 KiB159120

eventloop.hH A D20-Jul-2021239 96

genconfig.pyH A D20-Jul-20211.7 KiB8160

haswell.cH A D20-Jul-20215.6 KiB193144

haswell.hH A D20-Jul-2021203 43

i10nm.cH A D20-Jul-202113.4 KiB500426

i10nm.hH A D20-Jul-2021185 43

intel.cH A D03-May-20226.1 KiB217183

intel.hH A D20-Jul-20211,013 4339

ivy-bridge.cH A D20-Jul-20214.2 KiB156111

ivy-bridge.hH A D20-Jul-2021140 32

k8.cH A D20-Jul-20217 KiB282238

k8.hH A D20-Jul-2021466 129

leaky-bucket.cH A D20-Jul-20215.4 KiB224173

leaky-bucket.hH A D20-Jul-2021854 3527

list.hH A D20-Jul-20216.8 KiB24091

mcelog.8H A D20-Jul-202110.1 KiB337258

mcelog.cH A D03-May-202246.2 KiB2,0541,838

mcelog.confH A D20-Jul-20217 KiB19731

mcelog.conf.5H A D20-Jul-20217.4 KiB295284

mcelog.cronH A D20-Jul-202171 31

mcelog.hH A D03-May-20225.4 KiB187158

mcelog.initH A D20-Jul-20212 KiB9863

mcelog.logrotateH A D20-Jul-2021267 1614

mcelog.serviceH A D20-Jul-2021287 1310

mcelog.triggers.5H A D20-Jul-20216.7 KiB232224

memdb.cH A D03-May-202211.2 KiB444358

memdb.hH A D20-Jul-2021666 2620

memstream.cH A D03-May-20222.3 KiB134117

memutil.cH A D20-Jul-20211.6 KiB8358

memutil.hH A D20-Jul-2021298 119

msg.cH A D20-Jul-20213.3 KiB172152

msg.hH A D20-Jul-202195 53

msr.cH A D20-Jul-20211.5 KiB6558

nehalem.cH A D20-Jul-20215.7 KiB184140

nehalem.hH A D20-Jul-2021226 54

p4.cH A D03-May-202212.3 KiB494413

p4.hH A D20-Jul-2021111 52

page.cH A D20-Jul-202112 KiB446343

page.hH A D20-Jul-2021173 106

paths.hH A D20-Jul-2021294 127

rbtree.cH A D20-Jul-20218.3 KiB386299

rbtree.hH A D20-Jul-20214.9 KiB16553

sandy-bridge.cH A D20-Jul-20214.5 KiB160109

sandy-bridge.hH A D20-Jul-2021142 32

server.cH A D03-May-20229.1 KiB427343

server.hH A D20-Jul-202125 21

skylake_xeon.cH A D20-Jul-20218.1 KiB283223

skylake_xeon.hH A D20-Jul-2021195 43

sysfs.cH A D20-Jul-20212.5 KiB12195

sysfs.hH A D20-Jul-2021349 1410

trigger.cH A D20-Jul-20214 KiB183135

trigger.hH A D20-Jul-2021288 129

tsc.cH A D03-May-20225.4 KiB232181

tsc.hH A D20-Jul-2021190 74

tulsa.cH A D20-Jul-20214.2 KiB13496

tulsa.hH A D20-Jul-202147 21

unknown.cH A D20-Jul-20212.1 KiB8256

unknown.hH A D20-Jul-202190 32

version.hH A D20-Jul-202155 42

yellow.cH A D20-Jul-20213.1 KiB12191

yellow.hH A D20-Jul-2021111 32

README.md

1# mcelog
2
3mcelog is the user space backend for logging machine check errors reported
4by the hardware to the kernel. The kernel does the immediate actions
5(like killing processes etc.) and mcelog decodes the errors and manages
6various other advanced error responses like offlining memory, CPUs or triggering
7events. In addition mcelog also handles corrected errors, by logging and
8accounting them.
9It primarily handles machine checks and thermal events, which are reported
10for errors detected by the CPU.
11
12For more details on what mcelog can do and the underlying theory
13see [mcelog.org](https://www.mcelog.org).
14
15It is recommended that mcelog runs on all x86 machines, both 64bit
16(since early 2.6) and 32bit (since 2.6.32).
17
18mcelog can run in several modes:
19
20- cronjob
21- trigger
22- daemon
23
24**cronjob** is the old method. mcelog runs every 5 minutes from cron and checks
25for errors. Disadvantage of this is that it can delay error reporting
26significantly (upto 10 minutes) and does not allow mcelog to keep extended state.
27
28**trigger** is a newer method where the kernel runs mcelog on a error.
29
30This is configured with:
31```sh
32echo /usr/sbin/mcelog > /sys/devices/system/machinecheck/machinecheck0/trigger
33```
34This is faster, but still doesn't allow mcelog to keep state,
35and has relatively high overhead for each error because a program has
36to be initialized from scratch.
37
38In **daemon** mode mcelog runs continuously as a daemon in the background and
39wait for errors. It is enabled by running `mcelog --daemon &`
40from a init script. This is the fastest and most feature-ful.
41
42The recommended mode is **daemon**, because several new functions (like page
43error predictive failure analysis) require a continuously running daemon.
44
45## Documentation
46
47- The primary reference documentation are the man pages.
48- [lk10-mcelog.pdf](lk10-mcelog.pdf)
49  has a overview over the errors mcelog handles (originally from Linux Kongress 2010).
50- [mce.pdf](mce.pdf)
51  is a very old paper describing the first releases of mcelog (some parts are obsolete).
52
53## For distributors
54
55You can run mcelog from systemd or similar daemons. An example systemd unit
56file is in `mcelog.service`.
57
58By default mcelog reports its version as the git tag. This can be overridden
59by setting up a `.os_version` file in the source directory. A build system
60could write the OS version to this file to mark the binary.
61
62### For older distributions using init scripts
63
64Please install an init script by default that runs mcelog in daemon mode.
65The `mcelog.init` script is a good starting point. Also install a
66logrotated file (mcelog.logrotate) or equivalent when mcelog is running
67in daemon mode.
68These two are not in make install.
69
70The installation also requires a config file `/etc/mcelog.conf` and the default
71triggers. These are all installed by `make install`
72
73`/dev/mcelog` is needed for mcelog operation. If it's not there it can be
74created with:
75```sh
76mknod /dev/mcelog c 10 227
77```
78
79Normally it should be created automatically in udev.
80
81## Security
82
83mcelog needs to run as root because it might trigger actions like
84page-offlining, which require `CAP_SYS_ADMIN`. Also it opens `/dev/mcelog`
85and an UNIX socket for client support.
86
87It also opens `/dev/mem` to parse the BIOS DMI tables. It is careful to close
88the file descriptor and unmap any mappings after using them.
89
90There is support for changing the user in daemon mode after opening the device
91and the sockets, but that would stop triggers from doing corrective action
92that require `root`.
93
94In principle it would be possible to only keep `CAP_SYS_ADMIN` for page-offling,
95but that would prevent triggers from doing root-only actions not covered by
96it (and `CAP_SYS_ADMIN` is not that different from full root)
97
98In `daemon` mode mcelog listens to a UNIX socket and processes requests from
99`sh mcelog --client`. This can be disabled in the configuration file.
100The uid/gid of the requestor is checked on access and is configurable
101(default 0/0 only). The command parsing code is very straight forward
102(server.c). The client parsing/reply is currently done with full privileges
103of the `daemon`.
104
105## Testing
106
107There is a simple test suite in `sh tests/`. The test suite requires root to
108run and access to mce-inject and a kernel with MCE injection support
109`CONFIG_X86_MCE_INJECT`.  It will kill any running mcelog daemon.
110
111Run it with `sh make test`.
112
113The test suite requires the
114[mce-inject](git://git.kernel.org/pub/utils/cpu/mce/mce-inject.git) tool.
115The `mce-inject` executable must be either in `$PATH` or in the
116`../mce-inject` directory.
117
118You can also test under **valgrind** with `sh make valgrind-test`. For this
119valgrind needs to be installed of course. Advanced valgrind options can be
120specified with:
121```sh
122make VALGRIND="valgrind --option" valgrind-test
123```
124
125### Other checks
126
127`make iccverify` and `make clangverify` run the static verifiers in *clang*
128and *icc* respectively.
129
130## License
131
132This program is licensed under the subject of the GNU Public General
133License, v.2
134

README.releases

1mcelog used to do released, but now switched to a rolling release
2scheme. That means the git tree is always kept stable and can
3be used directly in production.
4
5To simplify package management which likes to have
6increasing version numbers the commits are regularly tagged
7with a number. The number starts (arbitarily) with 100.
8
9The tags are named vXXX (e.g. v100)
10