• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..13-Oct-2021-

MakefileH A D13-Oct-20214.2 KiB11693

README.mdH A D13-Oct-2021956 52

cp2k-collocate.ccH A D13-Oct-202118.6 KiB509411

cp2k-collocate.vcxprojH A D13-Oct-202123.1 KiB406406

cp2k-dbcsr.cppH A D13-Oct-202114.8 KiB367322

cp2k-dbcsr.shH A D03-May-20222.5 KiB6950

cp2k-dbcsr.vcxprojH A D13-Oct-202123.6 KiB418418

cp2k-perf-jit.shH A D03-May-20223.6 KiB7051

cp2k-perf.pltH A D13-Oct-202112.1 KiB235214

cp2k-plot.shH A D03-May-20223.3 KiB11183

mdarray.hppH A D13-Oct-202132.7 KiB1,127903

rt_graph.ccH A D13-Oct-202118 KiB546420

rt_graph.hppH A D13-Oct-20217.8 KiB213128

README.md

1# CP2K Artificial Benchmark
2
3The first code sample given for LIBXSMM was a performance reproducer exercising the same set of kernels usually generated for CP2K's SMM library. The code sample attempted to model the way "matrix stacks" are processed in CP2K, however there are two different code paths in CP2K: (1) the "main" code path used when processing stacks on the host-side, and (2) a code path targeting offload devices. Beside of the host-sided parallelization via MPI (and perhaps OpenMP), the secondly mentioned code path relies on an additional level of parallelization (which is obviously necessary to drive a potentially highly parallel offload device). Also, the additional level of parallelism is not exactly "nested" in the sense that it participates on sharing the same resources as the host-side. In fact, this "artificial benchmark" (cp2k code sample) is modeling a code path as utilized in the secondly mentioned case (offload device).
4
5