• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..18-May-2020-

README.mdH A D18-May-202016.1 KiB238224

coord_molH A D18-May-202037.4 MiB1,066,6291,066,628

stmv_xtb.inpH A D18-May-20201.8 KiB9796

README.md

1# STMV benchmark
2
3This benchmark test the performance of CP2K to run calculations of the electronic structure of relatively complex systems containing a million atoms. The input is based on [earlier work](https://pubs.acs.org/doi/full/10.1021/acs.jctc.6b00398) where the electronic structure of the STMV virus was simulated based on DFT and subsystem DFT. Here, instead, the xTB tight-binding method is employed. The input is realistic in its input settings and might be useful to set up similar systems.
4
5## Properties of the benchmark
6
7The benchmark exercises in particular the sparse matrix handling and linear scaling algorithms in CP2K. It performs 1 step of geometry optimization, so requires SCF, energy and force calculations. Some properties are computed as well. Given the xTB method, relatively small block sizes dominate in the sparse matrix multiplication.
8
9## Typical timings and setup
10
11A typical parallel run will require on the order of 256 nodes, to see completion of the benchmark in a reasonable time, and memory consumption. An invocation with slurm on a system with 32 threads per node (dual socket, Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz, Piz Daint multi-core) could look like:
12```
13export OMP_NUM_THREADS=8
14srun --cpu-bind=none --nodes=256 --ntasks=1024 --ntasks-per-node=4 --cpus-per-task=8 ./cp2k.psmp -i stmv_xtb.inp -o stmv_xtb.out
15```
16Which would need roughly 7Gb per rank (28Gb per node), and would run in in less than 4h. The timing report for  this run (based on CP2K 7.0, git:bf104a630):
17
18```
19SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME
20                                MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM
21 CP2K                                 1  1.0    6.012    6.487 8256.073 8256.198
22 cp_geo_opt                           1  2.0    0.041    0.092 8236.532 8236.652
23 geoopt_lbfgs                         1  3.0    0.010    0.074 8236.490 8236.639
24 cp_opt_gopt_step                     1  4.0    0.087    0.285 8236.314 8236.439
25 cp_eval_at                           2  5.0    0.097    0.148 8223.558 8223.698
26 qs_forces                            2  6.0    0.831    1.322 8223.245 8223.380
27 qs_energies                          2  7.0    0.034    0.117 8095.812 8099.994
28 ls_scf                               2  8.0    0.029    0.093 7976.523 7980.780
29 ls_scf_main                          2  9.0    0.016    0.055 7240.555 7240.672
30 dm_ls_curvy_optimization            47 10.0    0.003    0.076 6583.940 6599.528
31 dbcsr_multiply_generic             737 13.0    7.232    7.446 6427.778 6579.351
32 multiply_cannon                    737 14.0   41.860   44.498 5883.682 6121.346
33 optimization_step                   47 11.0    0.001    0.015 4866.856 4889.982
34 multiply_cannon_multrec          23584 15.0 4119.121 4434.655 4133.269 4447.990
35 compute_direction_newton            16 12.0    0.812    0.870 2530.534 2536.757
36 update_p_exp                        47 12.0    0.009    0.015 2336.108 2359.676
37 mp_waitall_1                    200904 16.1 1041.558 1716.869 1041.558 1716.869
38 transform_matrix_orth               65 11.0    0.002    0.003 1664.426 1698.501
39 commutator_symm                    170 13.0    0.003    0.004 1413.495 1456.582
40 purify_mcweeny_orth                 50 12.9    0.003    0.065 1312.274 1334.495
41 multiply_cannon_metrocomm3       23584 15.0    0.125    0.167  294.026 1106.474
42 multiply_cannon_metrocomm1       23584 15.0    0.172    0.196  596.117 1034.438
43 dbcsr_new_transposed               417 13.2   32.609   35.757  703.789  824.199
44 dbcsr_redistribute                 417 14.2  379.063  436.415  659.380  776.357
45 multiply_cannon_metrocomm4       22847 15.0    0.156    0.232  200.626  526.612
46 mp_irecv_dv                      58794 16.3  191.731  503.107  191.731  503.107
47 calculate_norms                  47168 15.0  437.842  473.288  437.842  473.288
48 ls_scf_init_scf                      2  9.0    0.014    0.088  461.141  461.330
49 ls_scf_init_matrix_S                 2 10.0    0.001    0.030  413.455  414.477
50 ls_scf_dm_to_ks                     49 10.0    0.001    0.023  381.400  401.911
51 matrix_sqrt_Newton_Schulz            2 11.0    0.023    0.426  389.123  389.457
52 make_m2s                          1474 14.0    7.055    7.820  283.134  324.396
53 mp_alltoall_i22                    569 14.7  236.860  323.813  236.860  323.813
54 make_images                       1474 15.0   36.912   39.536  268.811  310.928
55 ls_scf_post                          2  9.0    0.009    0.087  274.798  279.184
56 mp_sum_l                          2322 13.8  207.665  270.645  207.665  270.645
57 density_matrix_trs4                  1 10.0    0.015    0.042  248.757  249.218
58 qs_ks_update_qs_env                 50 11.0    0.000    0.000  208.761  218.124
59 make_images_data                  1474 16.0    0.081    0.122  152.463  205.226
60 hybrid_alltoall_any               1525 16.9    0.731   22.546  142.014  199.699
61 matrix_ls_to_qs                     51 11.0    0.001    0.001  183.276  198.338
62 mp_sum_d                          2810 12.6  148.925  197.472  148.925  197.472
63 rebuild_ks_matrix                   52 11.9    0.042    0.061  194.777  195.286
64 build_xtb_ks_matrix                 52 12.9    5.507    6.598  194.735  195.237
65 dbcsr_complete_redistribute        101 12.5   89.804   93.327  171.256  191.421
66 post_scf_homo_lumo                   2 10.0    0.000    0.001  186.075  186.911
67 matrix_decluster                    51 12.0    0.095    0.238  147.731  168.361
68 dbcsr_finalize                    3365 14.5    2.293    2.770  140.479  155.280
69 dbcsr_frobenius_norm               712 13.0   34.195   35.252  114.124  139.443
70 mp_sum_dm                          430  9.1  131.588  135.181  131.588  135.181
71 mp_allgather_i34                   737 15.0   71.689  130.243   71.689  130.243
72 dbcsr_merge_all                   2761 15.5   72.120   75.581  118.675  123.148
73 qs_energies_init_hamiltonians        2  8.0    0.007    0.082  119.151  119.333
74 dbcsr_add_d                       1749 13.0    0.006    0.008  100.754  111.917
75 dbcsr_add_anytype                 1749 14.0   25.357   28.939  100.748  111.910
76 build_xtb_matrices                   4  8.0   64.662   80.037  110.824  111.490
77 make_images_sizes                 1474 16.0    0.006    0.036   37.330  106.622
78 mp_alltoall_i44                   1474 17.0   37.324  106.617   37.324  106.617
79 dbcsr_dot_sd                       345 13.0   13.894   14.777   73.443  101.347
80 arnoldi_extremal                     9 11.2    0.089    0.533   98.396  100.332
81 arnoldi_normal_ev                    9 12.2    0.343    0.839   98.307  100.124
82 build_xtb_coulomb                   52 13.9   18.489   19.417   99.082   99.771
83 calculate_dispersion_pairpot         2  9.0   57.454   70.439   92.449   92.457
84 ao_charges_kp_2                     52 13.9    2.220    2.512   89.914   91.795
85 mp_alltoall_d11v                  1707 14.8   89.032   90.724   89.032   90.724
86 ao_charges_2                        52 14.9    4.520    4.815   87.694   89.634
87 build_subspace                      39 13.3    0.645    1.234   87.475   88.719
88 mp_shift_i                        8184  9.0   46.858   66.427   46.858   66.427
89 setup_rec_index_2d                1474 15.0   59.164   65.956   59.164   65.956
90 dbcsr_matrix_vector_mult           884 14.0    0.068    1.905   61.132   64.708
91 dbcsr_data_release               70614 16.3   31.961   59.053   32.162   59.250
92 mp_sum_iv                          744 14.9   42.010   51.567   42.010   51.567
93 mp_sum_dv                         5097 15.6   44.088   50.582   44.088   50.582
94 ls_scf_initial_guess                 2 10.0    0.000    0.015   47.606   48.384
95 calculate_w_matrix                   2 10.0    0.000    0.000   44.894   45.647
96 dbcsr_copy_into_existing            51 12.0   35.076   44.774   35.077   44.775
97 dbcsr_matrix_vector_mult_local     884 15.0   38.363   44.350   38.371   44.359
98 dbcsr_multiply_generic_mpsum_f      60 12.7    0.000    0.000   28.968   43.956
99 ls_scf_store_result                  2 10.0    0.020    0.206   42.334   42.976
100 matrix_qs_to_ls                     50 10.0    0.001    0.021   35.255   37.254
101 matrix_cluster                      50 11.0    0.115    0.300   35.254   37.254
102```
103
104and DBCSR statistics
105
106```
107
108 COUNTER                                    TOTAL       BLAS       SMM       ACC
109 flops     2 x     2 x     2       36142856249008       0.0%    100.0%      0.0%
110 flops     2 x     2 x     8       39213980112512       0.0%    100.0%      0.0%
111 flops     8 x     2 x     2       43804007223424       0.0%    100.0%      0.0%
112 flops     2 x     8 x     2       44584775765312       0.0%    100.0%      0.0%
113 flops     4 x     2 x     2       90117651537184       0.0%    100.0%      0.0%
114 flops     2 x     4 x     2       90294481986400       0.0%    100.0%      0.0%
115 flops     2 x     2 x     4       97540850606240       0.0%    100.0%      0.0%
116 flops     8 x     8 x     2       97563507289344       0.0%    100.0%      0.0%
117 flops     2 x     4 x     8      107600053992192       0.0%    100.0%      0.0%
118 flops     4 x     2 x     8      107835358956416       0.0%    100.0%      0.0%
119 flops     8 x     4 x     2      117969022215680       0.0%    100.0%      0.0%
120 flops     4 x     8 x     2      119092012799616       0.0%    100.0%      0.0%
121 flops     8 x     2 x     4      130156579707520       0.0%    100.0%      0.0%
122 flops     2 x     8 x     4      130528054439040       0.0%    100.0%      0.0%
123 flops     8 x     2 x     8      161502281254912       0.0%    100.0%      0.0%
124 flops     2 x     8 x     8      166307689803776       0.0%    100.0%      0.0%
125 flops     4 x     4 x     2      229852669562368       0.0%    100.0%      0.0%
126 flops     2 x     4 x     4      247224891759168       0.0%    100.0%      0.0%
127 flops     4 x     2 x     4      249694810664896       0.0%    100.0%      0.0%
128 flops     8 x     8 x     4      305672038397440       0.0%    100.0%      0.0%
129 flops     4 x     4 x     8      308761351253760       0.0%    100.0%      0.0%
130 flops     8 x     4 x     4      362908283625472       0.0%    100.0%      0.0%
131 flops     4 x     8 x     4      365532741502720       0.0%    100.0%      0.0%
132 flops     8 x     4 x     8      455052334568448       0.0%    100.0%      0.0%
133 flops     4 x     8 x     8      465166240374272       0.0%    100.0%      0.0%
134 flops     4 x     4 x     4      648336487473152       0.0%    100.0%      0.0%
135 flops     8 x     8 x     8    10568517651483648       0.0%    100.0%      0.0%
136 flops inhomo. stacks            2554975071623610     100.0%      0.0%      0.0%
137 flops total                        18.341948E+15      13.9%     86.1%      0.0%
138 flops max/rank                     18.975286E+12      15.8%     84.2%      0.0%
139 matmuls inhomo. stacks             1752383451045     100.0%      0.0%      0.0%
140 matmuls total                     55112132774528       3.2%     96.8%      0.0%
141 number of processed stacks           61576905441       3.4%     96.6%      0.0%
142 average stack size                                   830.2     897.3       0.0
143 marketing flops                    36.676667E+21
144 -------------------------------------------------------------------------------
145 # multiplications                            737
146 max memory usage/rank               7.564362E+09
147 # max total images/rank                        1
148 # max 3D layers                                1
149 # MPI messages exchanged                46790656
150 MPI messages size (bytes):
151  total size                         4.348855E+15
152  min size                           0.000000E+00
153  max size                         202.967424E+06
154  average size                      92.942816E+06
155 MPI breakdown and total messages size (bytes):
156             size <=      128              153760                        0
157       128 < size <=     8192               83948                332481200
158      8192 < size <=    32768               54994               1018984136
159     32768 < size <=   131072              468286              34129603760
160    131072 < size <=  4194304             1278316             958318296392
161   4194304 < size <= 16777216              630819            5065558052384
162  16777216 < size                        44120533         4342797468343176
163 -------------------------------------------------------------------------------
164 -                                                                             -
165 -                      DBCSR MESSAGE PASSING PERFORMANCE                      -
166 -                                                                             -
167 -------------------------------------------------------------------------------
168 ROUTINE             CALLS      AVE VOLUME [Bytes]
169 MP_Group                1
170 MP_Bcast               28                     12.
171 MP_Allreduce         4230                 325415.
172 MP_Alltoall          5355               22325841.
173 MP_Wait            200464
174 MP_ISend            94336               47211041.
175 MP_IRecv            94336               47206344.
176 MP_Memory           19172
177 -------------------------------------------------------------------------------
178
179 MEMORY| Estimated peak process memory [MiB]                                6402
180
181```
182
183## Key output elements
184
185SCF cycles output looks like
186```
187 ------------------------------ Linear scaling SCF -----------------------------
188 SCF     1  -2019286.740626037  -2019286.740626037  257.119279
189 SCF     2  -2022372.558790133     -3085.818164097  283.782252
190 SCF     3  -2024495.728609510     -5208.987983474   72.828389
191 SCF     4  -2026488.297578288     -7201.556952252  117.651759
192 SCF     5  -2029048.217573609     -2559.919995321  229.818745
193 SCF     6  -2030970.001095107     -4481.703516819   73.634025
194 SCF     7  -2032985.476104558     -6497.178526270  114.296279
195 SCF     8  -2033540.457572915      -554.981468357  234.732257
196 SCF     9  -2033521.016724448      -535.540619890   74.614194
197 SCF    10  -2033612.235695642      -626.759591084  112.382977
198 SCF    11  -2033854.276528493      -242.040832851  237.561626
199 SCF    12  -2033982.666816269      -370.431120626   66.611201
200 SCF    13  -2034003.081946934      -390.846251292   74.652012
201 SCF    14  -2034069.478891714       -66.396944780  264.631421
202 SCF    15  -2033892.868982235       110.212964699   74.861617
203 SCF    16  -2034077.572747632       -74.490800698  108.504719
204 SCF    17  -2034120.056681868       -42.483934236  281.836313
205 SCF    18  -2034151.889439356       -74.316691724   66.644064
206 SCF    19  -2034184.719602434      -107.146854802   74.555730
207 SCF    20  -2034205.581420024       -20.861817590  260.906525
208 SCF    21  -2034080.981126470       103.738475963   75.491708
209 SCF    22  -2034213.969040957       -29.249438523  106.544563
210 SCF    23  -2034217.910995587        -3.941954630  318.571562
211 SCF    24  -2034219.786293399        -5.817252442   73.672705
212 SCF    25  -2034219.952455441        -5.983414485  104.098476
213 SCF    26  -2034224.105341386        -4.152885945  295.732344
214 SCF    27  -2034225.354788742        -5.402333301   74.348011
215 SCF    28  -2034225.361724817        -5.409269375  106.903410
216 SCF    29  -2034226.617073632        -1.255348816  312.159937
217 SCF    30  -2034226.664080255        -1.302355439   74.832404
218 SCF not converged!
219 SCF     1  -2034504.114579430      -102.226643770  294.428820
220 SCF     2  -2034582.030317720      -180.142382059   59.509095
221 SCF     3  -2034669.633673732      -267.745738071   67.661870
222 SCF     4  -2034686.756534706       -17.122860974  240.952248
223 SCF     5  -2034383.864712474       285.768961258   70.071076
224 SCF     6  -2034726.751386791       -57.117713059   97.803430
225 SCF     7  -2034732.385823286        -5.634436496  286.235458
226 SCF     8  -2034737.205264020       -10.453877229   61.265636
227 SCF     9  -2034749.118323179       -22.366936388   69.003241
228 SCF    10  -2034748.012737155         1.105586024  305.655801
229 SCF    11  -2034751.398910127        -2.280586948   63.232803
230 SCF    12  -2034751.430011003        -2.311687824   62.608883
231 SCF    13  -2034751.818950539        -0.388939536  333.393301
232 SCF    14  -2034752.073886994        -0.643875991   62.769221
233 SCF    15  -2034752.205481594        -0.775470591   63.329253
234 SCF    16  -2034752.282033039        -0.076551445  351.013058
235 SCF    17  -2034752.318892021        -0.113410427   63.425647
236 SCF    18  -2034752.322544992        -0.117063398   63.808013
237```
238