mcelog-178 - OpenGrok cross reference for /dports/sysutils/mcelog/mcelog-178/

# mcelog

mcelog is the user space backend for logging machine check errors reported
by the hardware to the kernel. The kernel does the immediate actions
(like killing processes etc.) and mcelog decodes the errors and manages
various other advanced error responses like offlining memory, CPUs or triggering
events. In addition mcelog also handles corrected errors, by logging and
accounting them.
It primarily handles machine checks and thermal events, which are reported
for errors detected by the CPU.

For more details on what mcelog can do and the underlying theory
see [mcelog.org](https://www.mcelog.org).

It is recommended that mcelog runs on all x86 machines, both 64bit
(since early 2.6) and 32bit (since 2.6.32).

mcelog can run in several modes:

- cronjob
- trigger
- daemon

**cronjob** is the old method. mcelog runs every 5 minutes from cron and checks
for errors. Disadvantage of this is that it can delay error reporting
significantly (upto 10 minutes) and does not allow mcelog to keep extended state.

**trigger** is a newer method where the kernel runs mcelog on a error.

This is configured with:
```sh
echo /usr/sbin/mcelog > /sys/devices/system/machinecheck/machinecheck0/trigger
```
This is faster, but still doesn't allow mcelog to keep state,
and has relatively high overhead for each error because a program has
to be initialized from scratch.

In **daemon** mode mcelog runs continuously as a daemon in the background and
wait for errors. It is enabled by running `mcelog --daemon &`
from a init script. This is the fastest and most feature-ful.

The recommended mode is **daemon**, because several new functions (like page
error predictive failure analysis) require a continuously running daemon.

## Documentation

- The primary reference documentation are the man pages.
- [lk10-mcelog.pdf](lk10-mcelog.pdf)
  has a overview over the errors mcelog handles (originally from Linux Kongress 2010).
- [mce.pdf](mce.pdf)
  is a very old paper describing the first releases of mcelog (some parts are obsolete).

## For distributors

You can run mcelog from systemd or similar daemons. An example systemd unit
file is in `mcelog.service`.

By default mcelog reports its version as the git tag. This can be overridden
by setting up a `.os_version` file in the source directory. A build system
could write the OS version to this file to mark the binary.

### For older distributions using init scripts

Please install an init script by default that runs mcelog in daemon mode.
The `mcelog.init` script is a good starting point. Also install a
logrotated file (mcelog.logrotate) or equivalent when mcelog is running
in daemon mode.
These two are not in make install.

The installation also requires a config file `/etc/mcelog.conf` and the default
triggers. These are all installed by `make install`

`/dev/mcelog` is needed for mcelog operation. If it's not there it can be
created with:
```sh
mknod /dev/mcelog c 10 227
```

Normally it should be created automatically in udev.

## Security

mcelog needs to run as root because it might trigger actions like
page-offlining, which require `CAP_SYS_ADMIN`. Also it opens `/dev/mcelog`
and an UNIX socket for client support.

It also opens `/dev/mem` to parse the BIOS DMI tables. It is careful to close
the file descriptor and unmap any mappings after using them.

There is support for changing the user in daemon mode after opening the device
and the sockets, but that would stop triggers from doing corrective action
that require `root`.

In principle it would be possible to only keep `CAP_SYS_ADMIN` for page-offling,
but that would prevent triggers from doing root-only actions not covered by
it (and `CAP_SYS_ADMIN` is not that different from full root)

In `daemon` mode mcelog listens to a UNIX socket and processes requests from
`sh mcelog --client`. This can be disabled in the configuration file.
The uid/gid of the requestor is checked on access and is configurable
(default 0/0 only). The command parsing code is very straight forward
(server.c). The client parsing/reply is currently done with full privileges
of the `daemon`.

## Testing

There is a simple test suite in `sh tests/`. The test suite requires root to
run and access to mce-inject and a kernel with MCE injection support
`CONFIG_X86_MCE_INJECT`.  It will kill any running mcelog daemon.

Run it with `sh make test`.

The test suite requires the
[mce-inject](git://git.kernel.org/pub/utils/cpu/mce/mce-inject.git) tool.
The `mce-inject` executable must be either in `$PATH` or in the
`../mce-inject` directory.

You can also test under **valgrind** with `sh make valgrind-test`. For this
valgrind needs to be installed of course. Advanced valgrind options can be
specified with:
```sh
make VALGRIND="valgrind --option" valgrind-test
```

### Other checks

`make iccverify` and `make clangverify` run the static verifiers in *clang*
and *icc* respectively.

## License

This program is licensed under the subject of the GNU Public General
License, v.2
Name		Date	Size	#Lines	LOC
..		03-May-2022	-
input/	H	20-Jul-2021	-	354	260
tests/	H	03-May-2022	-	940	630
triggers/	H	20-Jul-2021	-	308	68
.gitignore	H A D	20-Jul-2021	87	13	12
.os_version	H A D	03-May-2022	5	2	1
CHANGES	H A D	20-Jul-2021	4.9 KiB	111	108
LICENSE	H A D	20-Jul-2021	17.7 KiB	340	281
Makefile	H A D	03-May-2022	4 KiB	144	102
README.md	H A D	20-Jul-2021	4.9 KiB	134	98
README.releases	H A D	20-Jul-2021	365	10	7
THIRD-PARTY	H A D	20-Jul-2021	128	5	3
bitfield.c	H A D	20-Jul-2021	1.2 KiB	64	58
bitfield.h	H A D	20-Jul-2021	1,012	38	28
broadwell_de.c	H A D	20-Jul-2021	3.1 KiB	105	76
broadwell_de.h	H A D	20-Jul-2021	141	3	2
broadwell_epex.c	H A D	20-Jul-2021	5.4 KiB	175	131
broadwell_epex.h	H A D	20-Jul-2021	127	3	2
bus.c	H A D	20-Jul-2021	3.3 KiB	128	96
bus.h	H A D	20-Jul-2021	207	5	4
cache.c	H A D	03-May-2022	4.7 KiB	204	169
cache.h	H A D	20-Jul-2021	96	3	2
client.c	H A D	03-May-2022	2.1 KiB	84	55
client.h	H A D	20-Jul-2021	59	3	2
config-intro.man	H A D	20-Jul-2021	242	11	9
config.c	H A D	03-May-2022	8.4 KiB	403	339
config.h	H A D	20-Jul-2021	746	25	20
core2.c	H A D	20-Jul-2021	2.9 KiB	106	89
core2.h	H A D	20-Jul-2021	74	3	2
denverton.c	H A D	20-Jul-2021	1.3 KiB	46	21
denverton.h	H A D	20-Jul-2021	74	2	1
dimm.c	H A D	20-Jul-2021	10.2 KiB	453	383
dimm.h	H A D	20-Jul-2021	247	9	7
dmi.c	H A D	20-Jul-2021	16.5 KiB	709	596
dmi.h	H A D	20-Jul-2021	2 KiB	83	71
dunnington.c	H A D	20-Jul-2021	3.3 KiB	113	78
dunnington.h	H A D	20-Jul-2021	43	3	1
eventloop.c	H A D	03-May-2022	3.5 KiB	159	120
eventloop.h	H A D	20-Jul-2021	239	9	6
genconfig.py	H A D	20-Jul-2021	1.7 KiB	81	60
haswell.c	H A D	20-Jul-2021	5.6 KiB	193	144
haswell.h	H A D	20-Jul-2021	203	4	3
i10nm.c	H A D	20-Jul-2021	13.4 KiB	500	426
i10nm.h	H A D	20-Jul-2021	185	4	3
intel.c	H A D	03-May-2022	6.1 KiB	217	183
intel.h	H A D	20-Jul-2021	1,013	43	39
ivy-bridge.c	H A D	20-Jul-2021	4.2 KiB	156	111
ivy-bridge.h	H A D	20-Jul-2021	140	3	2
k8.c	H A D	20-Jul-2021	7 KiB	282	238
k8.h	H A D	20-Jul-2021	466	12	9
leaky-bucket.c	H A D	20-Jul-2021	5.4 KiB	224	173
leaky-bucket.h	H A D	20-Jul-2021	854	35	27
list.h	H A D	20-Jul-2021	6.8 KiB	240	91
mcelog.8	H A D	20-Jul-2021	10.1 KiB	337	258
mcelog.c	H A D	03-May-2022	46.2 KiB	2,054	1,838
mcelog.conf	H A D	20-Jul-2021	7 KiB	197	31
mcelog.conf.5	H A D	20-Jul-2021	7.4 KiB	295	284
mcelog.cron	H A D	20-Jul-2021	71	3	1
mcelog.h	H A D	03-May-2022	5.4 KiB	187	158
mcelog.init	H A D	20-Jul-2021	2 KiB	98	63
mcelog.logrotate	H A D	20-Jul-2021	267	16	14
mcelog.service	H A D	20-Jul-2021	287	13	10
mcelog.triggers.5	H A D	20-Jul-2021	6.7 KiB	232	224
memdb.c	H A D	03-May-2022	11.2 KiB	444	358
memdb.h	H A D	20-Jul-2021	666	26	20
memstream.c	H A D	03-May-2022	2.3 KiB	134	117
memutil.c	H A D	20-Jul-2021	1.6 KiB	83	58
memutil.h	H A D	20-Jul-2021	298	11	9
msg.c	H A D	20-Jul-2021	3.3 KiB	172	152
msg.h	H A D	20-Jul-2021	95	5	3
msr.c	H A D	20-Jul-2021	1.5 KiB	65	58
nehalem.c	H A D	20-Jul-2021	5.7 KiB	184	140
nehalem.h	H A D	20-Jul-2021	226	5	4
p4.c	H A D	03-May-2022	12.3 KiB	494	413
p4.h	H A D	20-Jul-2021	111	5	2
page.c	H A D	20-Jul-2021	12 KiB	446	343
page.h	H A D	20-Jul-2021	173	10	6
paths.h	H A D	20-Jul-2021	294	12	7
rbtree.c	H A D	20-Jul-2021	8.3 KiB	386	299
rbtree.h	H A D	20-Jul-2021	4.9 KiB	165	53
sandy-bridge.c	H A D	20-Jul-2021	4.5 KiB	160	109
sandy-bridge.h	H A D	20-Jul-2021	142	3	2
server.c	H A D	03-May-2022	9.1 KiB	427	343
server.h	H A D	20-Jul-2021	25	2	1
skylake_xeon.c	H A D	20-Jul-2021	8.1 KiB	283	223
skylake_xeon.h	H A D	20-Jul-2021	195	4	3
sysfs.c	H A D	20-Jul-2021	2.5 KiB	121	95
sysfs.h	H A D	20-Jul-2021	349	14	10
trigger.c	H A D	20-Jul-2021	4 KiB	183	135
trigger.h	H A D	20-Jul-2021	288	12	9
tsc.c	H A D	03-May-2022	5.4 KiB	232	181
tsc.h	H A D	20-Jul-2021	190	7	4
tulsa.c	H A D	20-Jul-2021	4.2 KiB	134	96
tulsa.h	H A D	20-Jul-2021	47	2	1
unknown.c	H A D	20-Jul-2021	2.1 KiB	82	56
unknown.h	H A D	20-Jul-2021	90	3	2
version.h	H A D	20-Jul-2021	55	4	2
yellow.c	H A D	20-Jul-2021	3.1 KiB	121	91
yellow.h	H A D	20-Jul-2021	111	3	2