• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..18-May-2020-

H2O-dft-ls.NREP2.inpH A D18-May-20206.8 KiB183182

H2O-dft-ls.NREP4.inpH A D18-May-20206.8 KiB183182

H2O-dft-ls.inpH A D18-May-20206.8 KiB183182

README.mdH A D18-May-20205.9 KiB6851

TiO2.inpH A D18-May-20201.5 KiB8180

TiO2_larger.xyzH A D18-May-2020621.2 KiB9,7879,786

amorph.inpH A D18-May-20201.7 KiB9291

amorph.xyzH A D18-May-2020703.1 KiB13,84713,846

README.md

1# Quickstep Density Matrix Linear Scaling
2
3## Description
4
5This is a single-point energy calculation using linear-scaling DFT.
6
7For large systems the linear-scaling approach for solving Self-Consistent-Field equations will be much cheaper computationally than using standard DFT and allows scaling up to 1 million atoms for simple systems. The linear scaling cost results from the fact that the algorithm is based on an iteration on the density matrix. The cubically-scaling orthogonalisation step of standard Quickstep DFT using OT is avoided and the key operation is sparse matrix-matrix multiplications, which have a number of non-zero entries that scale linearly with system size. These are implemented efficiently in the DBCSR library.
8
9The problem size can be tuned by the parameter `NREP` in the input file, whereby the number of atoms scales cubically with `NREP`.
10
11## Files Description
12
13- [H2O-dft-ls.inp](H2O-dft-ls.inp) (NREP=6): H20 density functional theory linear scaling consisting of 20'736 atoms in a 59 cubic angstrom box (6'912 water molecules in total). An LDA functional is used with a DZVP MOLOPT basis set and a 300 Ry cut-off.
14- [H2O-dft-ls.NREP4.inp](H2O-dft-ls.NREP4.inp): H20 density functional theory linear scaling consisting of 6'144 atoms in a 39 cubic angstrom box (2'048 water molecules in total). An LDA functional is used with a DZVP MOLOPT basis set and a 300 Ry cut-off.
15- [H2O-dft-ls.NREP2.inp](H2O-dft-ls.NREP2.inp): H20 density functional theory linear scaling consisting of 6'144 atoms in a 39 cubic angstrom box (2'048 water molecules in total). An LDA functional is used with a DZVP MOLOPT basis set and a 300 Ry cut-off (a smaller version of the H2O-dft-ls benchmark, with NREP=2, meant to run on 1 node).
16- [TiO2.inp](TiO2.inp)
17- [amorph.inp](amorph.inp)
18
19## Results
20
21### NREP=4
22
23The best configurations are shown below. Click the links under "Detailed Results" to see more detail.
24
25| Machine Name | Architecture | Date       | SVN Revision | Fastest time (s) | Number of Cores | Number of Threads                  | Detailed Results |
26| ------------ | ------------ | ---------- | ------------ | ---------------- | --------------- | ---------------------------------- | ---------------- |
27| HECToR       | Cray XE6     | 16/1/2014  | 13196        | 98.256           | 65536           | 8 OMP threads per MPI task	        | [hector-h2o-dft-ls](https://www.cp2k.org/performance:hector-h2o-dft-ls) |
28| ARCHER	   | Cray XC30	  | 8/1/2014   | 13473	      | 28.476	         | 49152           | 4 OMP threads per MPI task	        | [archer-h2o-dft-ls](https://www.cp2k.org/performance:archer-h2o-dft-ls) |
29| Magnus	   | Cray XC40	  | 3/12/2014  | 14377	      | 30.921	         | 24576           | 2 OMP threads per MPI task	        | [magnus-h2o-dft-ls](https://www.cp2k.org/performance:magnus-h2o-dft-ls) |
30| Piz Daint	   | Cray XC30	  | 12/05/2015 | 15268	      | 27.900	         | 32768           | 2 OMP threads per MPI task, no GPU	| [piz-daint-h2o-dft-ls](https://www.cp2k.org/performance:piz-daint-h2o-dft-ls) |
31| Cirrus	   | SGI ICE XA	  | 24/11/2016 | 17566	      | 543.032	         | 2016            | 2 OMP threads per MPI task	        | [cirrus-h2o-dft-ls](https://www.cp2k.org/performance:cirrus-h2o-dft-ls) |
32| Noctua	   | Cray CS500	  | 25/09/2019 | 9f58d81      | 37.730	         | 10240           | 10 OMP thread per MPI task	        | [noctua-h2o-dft-ls](https://www.cp2k.org/performance:noctua-h2o-dft-ls) |
33
34### Weak Scaling on Piz Daint, CSCS
35
36Following results were obtained in the following conditions:
37
38- Date: 15th November 2019
39- CP2K version: version 7.0 (Development Version, git:78cea8eeebb25e459941d8a28d987c9990d92676)
40- DBCSR version: v2.0.0-rc9 (git:15fdaba855385f12db7599a6e69b51a7a4ce8a9a)
41- CP2K flags: omp libint fftw3 libxc elpa parallel mpi3 scalapack acc pw_cuda xsmm dbcsr_acc max_contr=4
42- Machine: Piz Daint (GPU partition), CSCS
43- Slurm configuration: 2 MPI ranks per node, 12 OpenMP threads per MPI rank
44- The cell contents specify the runtime (`grep 'CP2K    ' output.out`) in seconds, while the cells marked with an `X` crashed with out-of-memory errors, and the cells left empty weren't measured.
45
46|  nodes / NREP | NREP=1 | NREP=2 | NREP=3 | NREP=4 | NREP=6 | NREP=8 | NREP=9 |
47| ------------- | -----  | -----  | -----  | -----  | -----  | -----  | -----  |
48|  1 node       |   7.4  |  60.3  |   X    |        |        |        |        |
49|  2 nodes      |   7.4  |  35.0  | 269.4  |   X    |        |        |        |
50|  4 nodes      |   9.9  |  22.7  | 149.8  |   X    |        |        |        |
51|  6 nodes      |  12.1  |  19.7  | 113.0  |   X    |        |        |        |
52|  8 nodes      |  11.4  |  16.4  |  90.2  | 253.4  |   X    |        |        |
53| 12 nodes      |  15.5  |  21.7  |  71.5  | 193.8  |   X    |        |        |
54| 16 nodes      |  15.5  |  20.8  |  61.5  | 159.2  |   X    |        |        |
55| 24 nodes      |  22.0  |  24.7  |  51.8  | 130.2  |   X    |        |        |
56| 32 nodes      |  15.9  |  20.4  |  42.8  | 101.8  | 352.9  |   X    |        |
57| 36 nodes      |  21.9  |  25.6  |  44.0  |  99.8  | 333.0  |   X    |        |
58| 48 nodes      |  24.5  |  34.1  |  42.0  |  84.1  | 277.9  |   X    |        |
59| 64 nodes      |  24.9  |  29.0  |  40.4  |  79.7  | 257.5  |   X    |        |
60| 128 nodes     |  26.3  |  32.8  |  36.6  |  62.5  | 181.9  | 400.6  |   X    |
61
62|  nodes / NREP | NREP=6 | NREP=8 | NREP=9 | NREP=10 | NREP=11 | NREP=12 | NREP=13 | NREP=14 | NREP=16 | NREP=18 | NREP=19 | NREP=20 |
63| ------------- | -----  | -----  | -----  | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- |
64|  256 nodes    |  132.6 |  262.3 |  359.2 |   498.8 |   647.1 |    X    |         |         |         |         |         |         |
65|  512 nodes    |  106.0 |  212.5 |  290.2 |   409.2 |   534.0 |   732.3 |   875.2 |  1030.1 |    X    |         |         |         |
66| 1024 nodes    |   98.1 |  168.9 |        |   284.7 |         |   510.8 |         |   786.5 |  1161.1 |  1607.3 |  1872.8 |    X    |
67
68