1-------------------------------------------------------------- 2Wathen: nx 4 ny 4 n 65 nz 752 method 0, time: 0.000 sec 3 4total time to read A matrix: 0.000254 sec 5 6n 65 # edges 376 7U=triu(A) time: 0.000028 sec 8L=tril(A) time: 0.000007 sec 9 10------------------------------------- dot product method: 11# triangles 872 12L*U' time (dot): 0.000057 sec 13tricount time: 0.000061 sec (dot product method) 14tri+prep time: 0.000097 sec (incl time to compute L and U) 15compute C time: 0.000057 sec 16reduce (C) time: 0.000005 sec 17rate 3.89 million edges/sec (incl time for U=triu(A)) 18rate 6.13 million edges/sec (just tricount itself) 19L*U' time (dot): 0.000014 sec (nthreads: 2 speedup 4.15599) 20tricount time: 0.000015 sec (dot product method) 21tri+prep time: 0.000051 sec (incl time to compute L and U) 22compute C time: 0.000014 sec 23reduce (C) time: 0.000002 sec 24rate 7.42 million edges/sec (incl time for U=triu(A)) 25rate 24.38 million edges/sec (just tricount itself) 26L*U' time (dot): 0.000011 sec (nthreads: 4 speedup 5.05627) 27tricount time: 0.000013 sec (dot product method) 28tri+prep time: 0.000048 sec (incl time to compute L and U) 29compute C time: 0.000011 sec 30reduce (C) time: 0.000002 sec 31rate 7.84 million edges/sec (incl time for U=triu(A)) 32rate 29.50 million edges/sec (just tricount itself) 33L*U' time (dot): 0.000011 sec (nthreads: 8 speedup 5.0794) 34tricount time: 0.000013 sec (dot product method) 35tri+prep time: 0.000048 sec (incl time to compute L and U) 36compute C time: 0.000011 sec 37reduce (C) time: 0.000002 sec 38rate 7.85 million edges/sec (incl time for U=triu(A)) 39rate 29.65 million edges/sec (just tricount itself) 40L*U' time (dot): 0.000016 sec 41tricount time: 0.000018 sec (dot product method) 42tri+prep time: 0.000053 sec (incl time to compute L and U) 43compute C time: 0.000016 sec 44reduce (C) time: 0.000002 sec 45rate 7.07 million edges/sec (incl time for U=triu(A)) 46rate 20.95 million edges/sec (just tricount itself) 47L*U' time (dot): 0.000012 sec (nthreads: 2 speedup 1.36091) 48tricount time: 0.000013 sec (dot product method) 49tri+prep time: 0.000049 sec (incl time to compute L and U) 50compute C time: 0.000012 sec 51reduce (C) time: 0.000002 sec 52rate 7.72 million edges/sec (incl time for U=triu(A)) 53rate 27.87 million edges/sec (just tricount itself) 54L*U' time (dot): 0.000012 sec (nthreads: 4 speedup 1.38573) 55tricount time: 0.000013 sec (dot product method) 56tri+prep time: 0.000049 sec (incl time to compute L and U) 57compute C time: 0.000012 sec 58reduce (C) time: 0.000002 sec 59rate 7.75 million edges/sec (incl time for U=triu(A)) 60rate 28.26 million edges/sec (just tricount itself) 61L*U' time (dot): 0.000012 sec (nthreads: 8 speedup 1.39356) 62tricount time: 0.000013 sec (dot product method) 63tri+prep time: 0.000048 sec (incl time to compute L and U) 64compute C time: 0.000012 sec 65reduce (C) time: 0.000002 sec 66rate 7.77 million edges/sec (incl time for U=triu(A)) 67rate 28.54 million edges/sec (just tricount itself) 68 69----------------------------------- saxpy method: 70C<L>=L*L time (saxpy): 0.000051 sec 71tricount time: 0.000052 sec (saxpy method) 72tri+prep time: 0.000060 sec (incl time to compute L) 73compute C time: 0.000051 sec 74reduce (C) time: 0.000002 sec 75rate 6.31 million edges/sec (incl time for L=tril(A)) 76rate 7.17 million edges/sec (just tricount itself) 77C<L>=L*L time (saxpy): 0.000025 sec (nthreads: 2 speedup 2.00982) 78tricount time: 0.000027 sec (saxpy method) 79tri+prep time: 0.000034 sec (incl time to compute L) 80compute C time: 0.000025 sec 81reduce (C) time: 0.000001 sec 82rate 11.11 million edges/sec (incl time for L=tril(A)) 83rate 14.05 million edges/sec (just tricount itself) 84C<L>=L*L time (saxpy): 0.000022 sec (nthreads: 4 speedup 2.34526) 85tricount time: 0.000023 sec (saxpy method) 86tri+prep time: 0.000030 sec (incl time to compute L) 87compute C time: 0.000022 sec 88reduce (C) time: 0.000001 sec 89rate 12.45 million edges/sec (incl time for L=tril(A)) 90rate 16.29 million edges/sec (just tricount itself) 91C<L>=L*L time (saxpy): 0.000022 sec (nthreads: 8 speedup 2.29399) 92tricount time: 0.000024 sec (saxpy method) 93tri+prep time: 0.000031 sec (incl time to compute L) 94compute C time: 0.000022 sec 95reduce (C) time: 0.000002 sec 96rate 12.23 million edges/sec (incl time for L=tril(A)) 97rate 15.90 million edges/sec (just tricount itself) 98 99-------------------------------------------------------------- 100random 5 by 5, nz: 18, method 1 time 0.000 sec 101 102total time to read A matrix: 0.000101 sec 103 104n 5 # edges 9 105U=triu(A) time: 0.000024 sec 106L=tril(A) time: 0.000003 sec 107 108------------------------------------- dot product method: 109# triangles 7 110L*U' time (dot): 0.000024 sec 111tricount time: 0.000027 sec (dot product method) 112tri+prep time: 0.000054 sec (incl time to compute L and U) 113compute C time: 0.000024 sec 114reduce (C) time: 0.000003 sec 115rate 0.17 million edges/sec (incl time for U=triu(A)) 116rate 0.33 million edges/sec (just tricount itself) 117L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 4.80951) 118tricount time: 0.000006 sec (dot product method) 119tri+prep time: 0.000033 sec (incl time to compute L and U) 120compute C time: 0.000005 sec 121reduce (C) time: 0.000001 sec 122rate 0.27 million edges/sec (incl time for U=triu(A)) 123rate 1.57 million edges/sec (just tricount itself) 124L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 6.80389) 125tricount time: 0.000004 sec (dot product method) 126tri+prep time: 0.000031 sec (incl time to compute L and U) 127compute C time: 0.000004 sec 128reduce (C) time: 0.000000 sec 129rate 0.29 million edges/sec (incl time for U=triu(A)) 130rate 2.26 million edges/sec (just tricount itself) 131L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 6.99995) 132tricount time: 0.000004 sec (dot product method) 133tri+prep time: 0.000031 sec (incl time to compute L and U) 134compute C time: 0.000003 sec 135reduce (C) time: 0.000000 sec 136rate 0.29 million edges/sec (incl time for U=triu(A)) 137rate 2.34 million edges/sec (just tricount itself) 138L*U' time (dot): 0.000005 sec 139tricount time: 0.000005 sec (dot product method) 140tri+prep time: 0.000032 sec (incl time to compute L and U) 141compute C time: 0.000005 sec 142reduce (C) time: 0.000001 sec 143rate 0.28 million edges/sec (incl time for U=triu(A)) 144rate 1.74 million edges/sec (just tricount itself) 145L*U' time (dot): 0.000004 sec (nthreads: 2 speedup 1.28269) 146tricount time: 0.000004 sec (dot product method) 147tri+prep time: 0.000031 sec (incl time to compute L and U) 148compute C time: 0.000004 sec 149reduce (C) time: 0.000000 sec 150rate 0.29 million edges/sec (incl time for U=triu(A)) 151rate 2.27 million edges/sec (just tricount itself) 152L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 1.23356) 153tricount time: 0.000004 sec (dot product method) 154tri+prep time: 0.000031 sec (incl time to compute L and U) 155compute C time: 0.000004 sec 156reduce (C) time: 0.000000 sec 157rate 0.29 million edges/sec (incl time for U=triu(A)) 158rate 2.18 million edges/sec (just tricount itself) 159L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.37847) 160tricount time: 0.000004 sec (dot product method) 161tri+prep time: 0.000031 sec (incl time to compute L and U) 162compute C time: 0.000003 sec 163reduce (C) time: 0.000000 sec 164rate 0.29 million edges/sec (incl time for U=triu(A)) 165rate 2.42 million edges/sec (just tricount itself) 166 167----------------------------------- saxpy method: 168C<L>=L*L time (saxpy): 0.000012 sec 169tricount time: 0.000013 sec (saxpy method) 170tri+prep time: 0.000016 sec (incl time to compute L) 171compute C time: 0.000012 sec 172reduce (C) time: 0.000001 sec 173rate 0.56 million edges/sec (incl time for L=tril(A)) 174rate 0.69 million edges/sec (just tricount itself) 175C<L>=L*L time (saxpy): 0.000003 sec (nthreads: 2 speedup 4.12281) 176tricount time: 0.000003 sec (saxpy method) 177tri+prep time: 0.000007 sec (incl time to compute L) 178compute C time: 0.000003 sec 179reduce (C) time: 0.000000 sec 180rate 1.37 million edges/sec (incl time for L=tril(A)) 181rate 2.68 million edges/sec (just tricount itself) 182C<L>=L*L time (saxpy): 0.000002 sec (nthreads: 4 speedup 5.12891) 183tricount time: 0.000003 sec (saxpy method) 184tri+prep time: 0.000006 sec (incl time to compute L) 185compute C time: 0.000002 sec 186reduce (C) time: 0.000000 sec 187rate 1.50 million edges/sec (incl time for L=tril(A)) 188rate 3.25 million edges/sec (just tricount itself) 189C<L>=L*L time (saxpy): 0.000003 sec (nthreads: 8 speedup 3.73818) 190tricount time: 0.000004 sec (saxpy method) 191tri+prep time: 0.000007 sec (incl time to compute L) 192compute C time: 0.000003 sec 193reduce (C) time: 0.000000 sec 194rate 1.28 million edges/sec (incl time for L=tril(A)) 195rate 2.37 million edges/sec (just tricount itself) 196 197-------------------------------------------------------------- 198matrix 3 by 3, 0 entries, from stdin 199 200total time to read A matrix: 0.000136 sec 201 202n 3 # edges 0 203U=triu(A) time: 0.000023 sec 204L=tril(A) time: 0.000004 sec 205 206------------------------------------- dot product method: 207# triangles 0 208L*U' time (dot): 0.000032 sec 209tricount time: 0.000034 sec (dot product method) 210tri+prep time: 0.000061 sec (incl time to compute L and U) 211compute C time: 0.000032 sec 212reduce (C) time: 0.000002 sec 213rate 0.00 million edges/sec (incl time for U=triu(A)) 214rate 0.00 million edges/sec (just tricount itself) 215L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 6.70786) 216tricount time: 0.000005 sec (dot product method) 217tri+prep time: 0.000032 sec (incl time to compute L and U) 218compute C time: 0.000005 sec 219reduce (C) time: 0.000001 sec 220rate 0.00 million edges/sec (incl time for U=triu(A)) 221rate 0.00 million edges/sec (just tricount itself) 222L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 8.34564) 223tricount time: 0.000004 sec (dot product method) 224tri+prep time: 0.000031 sec (incl time to compute L and U) 225compute C time: 0.000004 sec 226reduce (C) time: 0.000000 sec 227rate 0.00 million edges/sec (incl time for U=triu(A)) 228rate 0.00 million edges/sec (just tricount itself) 229L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 8.56331) 230tricount time: 0.000004 sec (dot product method) 231tri+prep time: 0.000031 sec (incl time to compute L and U) 232compute C time: 0.000004 sec 233reduce (C) time: 0.000000 sec 234rate 0.00 million edges/sec (incl time for U=triu(A)) 235rate 0.00 million edges/sec (just tricount itself) 236L*U' time (dot): 0.000003 sec 237tricount time: 0.000003 sec (dot product method) 238tri+prep time: 0.000030 sec (incl time to compute L and U) 239compute C time: 0.000003 sec 240reduce (C) time: 0.000000 sec 241rate 0.00 million edges/sec (incl time for U=triu(A)) 242rate 0.00 million edges/sec (just tricount itself) 243L*U' time (dot): 0.000002 sec (nthreads: 2 speedup 1.32441) 244tricount time: 0.000002 sec (dot product method) 245tri+prep time: 0.000029 sec (incl time to compute L and U) 246compute C time: 0.000002 sec 247reduce (C) time: 0.000000 sec 248rate 0.00 million edges/sec (incl time for U=triu(A)) 249rate 0.00 million edges/sec (just tricount itself) 250L*U' time (dot): 0.000002 sec (nthreads: 4 speedup 1.26397) 251tricount time: 0.000002 sec (dot product method) 252tri+prep time: 0.000029 sec (incl time to compute L and U) 253compute C time: 0.000002 sec 254reduce (C) time: 0.000000 sec 255rate 0.00 million edges/sec (incl time for U=triu(A)) 256rate 0.00 million edges/sec (just tricount itself) 257L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 0.710361) 258tricount time: 0.000004 sec (dot product method) 259tri+prep time: 0.000031 sec (incl time to compute L and U) 260compute C time: 0.000004 sec 261reduce (C) time: 0.000000 sec 262rate 0.00 million edges/sec (incl time for U=triu(A)) 263rate 0.00 million edges/sec (just tricount itself) 264 265----------------------------------- saxpy method: 266C<L>=L*L time (saxpy): 0.000026 sec 267tricount time: 0.000026 sec (saxpy method) 268tri+prep time: 0.000031 sec (incl time to compute L) 269compute C time: 0.000026 sec 270reduce (C) time: 0.000001 sec 271rate 0.00 million edges/sec (incl time for L=tril(A)) 272rate 0.00 million edges/sec (just tricount itself) 273C<L>=L*L time (saxpy): 0.000006 sec (nthreads: 2 speedup 3.9542) 274tricount time: 0.000007 sec (saxpy method) 275tri+prep time: 0.000011 sec (incl time to compute L) 276compute C time: 0.000006 sec 277reduce (C) time: 0.000000 sec 278rate 0.00 million edges/sec (incl time for L=tril(A)) 279rate 0.00 million edges/sec (just tricount itself) 280C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 4 speedup 6.0321) 281tricount time: 0.000005 sec (saxpy method) 282tri+prep time: 0.000009 sec (incl time to compute L) 283compute C time: 0.000004 sec 284reduce (C) time: 0.000000 sec 285rate 0.00 million edges/sec (incl time for L=tril(A)) 286rate 0.00 million edges/sec (just tricount itself) 287C<L>=L*L time (saxpy): 0.000005 sec (nthreads: 8 speedup 4.8894) 288tricount time: 0.000006 sec (saxpy method) 289tri+prep time: 0.000010 sec (incl time to compute L) 290compute C time: 0.000005 sec 291reduce (C) time: 0.000000 sec 292rate 0.00 million edges/sec (incl time for L=tril(A)) 293rate 0.00 million edges/sec (just tricount itself) 294 295-------------------------------------------------------------- 296matrix 4 by 4, 4 entries, from stdin 297 298total time to read A matrix: 0.000182 sec 299 300n 4 # edges 2 301U=triu(A) time: 0.000042 sec 302L=tril(A) time: 0.000005 sec 303 304------------------------------------- dot product method: 305# triangles 0 306L*U' time (dot): 0.000035 sec 307tricount time: 0.000038 sec (dot product method) 308tri+prep time: 0.000085 sec (incl time to compute L and U) 309compute C time: 0.000035 sec 310reduce (C) time: 0.000003 sec 311rate 0.02 million edges/sec (incl time for U=triu(A)) 312rate 0.05 million edges/sec (just tricount itself) 313L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 6.83555) 314tricount time: 0.000006 sec (dot product method) 315tri+prep time: 0.000053 sec (incl time to compute L and U) 316compute C time: 0.000005 sec 317reduce (C) time: 0.000001 sec 318rate 0.04 million edges/sec (incl time for U=triu(A)) 319rate 0.35 million edges/sec (just tricount itself) 320L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 11.2006) 321tricount time: 0.000003 sec (dot product method) 322tri+prep time: 0.000050 sec (incl time to compute L and U) 323compute C time: 0.000003 sec 324reduce (C) time: 0.000000 sec 325rate 0.04 million edges/sec (incl time for U=triu(A)) 326rate 0.57 million edges/sec (just tricount itself) 327L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 10.4373) 328tricount time: 0.000004 sec (dot product method) 329tri+prep time: 0.000051 sec (incl time to compute L and U) 330compute C time: 0.000003 sec 331reduce (C) time: 0.000000 sec 332rate 0.04 million edges/sec (incl time for U=triu(A)) 333rate 0.54 million edges/sec (just tricount itself) 334L*U' time (dot): 0.000006 sec 335tricount time: 0.000007 sec (dot product method) 336tri+prep time: 0.000054 sec (incl time to compute L and U) 337compute C time: 0.000006 sec 338reduce (C) time: 0.000001 sec 339rate 0.04 million edges/sec (incl time for U=triu(A)) 340rate 0.30 million edges/sec (just tricount itself) 341L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 1.09848) 342tricount time: 0.000006 sec (dot product method) 343tri+prep time: 0.000053 sec (incl time to compute L and U) 344compute C time: 0.000005 sec 345reduce (C) time: 0.000000 sec 346rate 0.04 million edges/sec (incl time for U=triu(A)) 347rate 0.34 million edges/sec (just tricount itself) 348L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 1.68923) 349tricount time: 0.000004 sec (dot product method) 350tri+prep time: 0.000051 sec (incl time to compute L and U) 351compute C time: 0.000004 sec 352reduce (C) time: 0.000000 sec 353rate 0.04 million edges/sec (incl time for U=triu(A)) 354rate 0.51 million edges/sec (just tricount itself) 355L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.97691) 356tricount time: 0.000003 sec (dot product method) 357tri+prep time: 0.000050 sec (incl time to compute L and U) 358compute C time: 0.000003 sec 359reduce (C) time: 0.000000 sec 360rate 0.04 million edges/sec (incl time for U=triu(A)) 361rate 0.60 million edges/sec (just tricount itself) 362 363----------------------------------- saxpy method: 364C<L>=L*L time (saxpy): 0.000022 sec 365tricount time: 0.000023 sec (saxpy method) 366tri+prep time: 0.000028 sec (incl time to compute L) 367compute C time: 0.000022 sec 368reduce (C) time: 0.000001 sec 369rate 0.07 million edges/sec (incl time for L=tril(A)) 370rate 0.09 million edges/sec (just tricount itself) 371C<L>=L*L time (saxpy): 0.000008 sec (nthreads: 2 speedup 2.89783) 372tricount time: 0.000008 sec (saxpy method) 373tri+prep time: 0.000013 sec (incl time to compute L) 374compute C time: 0.000008 sec 375reduce (C) time: 0.000001 sec 376rate 0.15 million edges/sec (incl time for L=tril(A)) 377rate 0.24 million edges/sec (just tricount itself) 378C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 4 speedup 4.92595) 379tricount time: 0.000005 sec (saxpy method) 380tri+prep time: 0.000010 sec (incl time to compute L) 381compute C time: 0.000004 sec 382reduce (C) time: 0.000000 sec 383rate 0.20 million edges/sec (incl time for L=tril(A)) 384rate 0.42 million edges/sec (just tricount itself) 385C<L>=L*L time (saxpy): 0.000008 sec (nthreads: 8 speedup 2.76205) 386tricount time: 0.000008 sec (saxpy method) 387tri+prep time: 0.000014 sec (incl time to compute L) 388compute C time: 0.000008 sec 389reduce (C) time: 0.000001 sec 390rate 0.15 million edges/sec (incl time for L=tril(A)) 391rate 0.24 million edges/sec (just tricount itself) 392 393-------------------------------------------------------------- 394matrix 4 by 4, 10 entries, from stdin 395 396total time to read A matrix: 0.000188 sec 397 398n 4 # edges 5 399U=triu(A) time: 0.000030 sec 400L=tril(A) time: 0.000004 sec 401 402------------------------------------- dot product method: 403# triangles 2 404L*U' time (dot): 0.000028 sec 405tricount time: 0.000031 sec (dot product method) 406tri+prep time: 0.000066 sec (incl time to compute L and U) 407compute C time: 0.000028 sec 408reduce (C) time: 0.000003 sec 409rate 0.08 million edges/sec (incl time for U=triu(A)) 410rate 0.16 million edges/sec (just tricount itself) 411L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 5.73686) 412tricount time: 0.000006 sec (dot product method) 413tri+prep time: 0.000040 sec (incl time to compute L and U) 414compute C time: 0.000005 sec 415reduce (C) time: 0.000001 sec 416rate 0.13 million edges/sec (incl time for U=triu(A)) 417rate 0.90 million edges/sec (just tricount itself) 418L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 8.37966) 419tricount time: 0.000004 sec (dot product method) 420tri+prep time: 0.000038 sec (incl time to compute L and U) 421compute C time: 0.000003 sec 422reduce (C) time: 0.000000 sec 423rate 0.13 million edges/sec (incl time for U=triu(A)) 424rate 1.32 million edges/sec (just tricount itself) 425L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 8.68117) 426tricount time: 0.000004 sec (dot product method) 427tri+prep time: 0.000038 sec (incl time to compute L and U) 428compute C time: 0.000003 sec 429reduce (C) time: 0.000000 sec 430rate 0.13 million edges/sec (incl time for U=triu(A)) 431rate 1.35 million edges/sec (just tricount itself) 432L*U' time (dot): 0.000004 sec 433tricount time: 0.000004 sec (dot product method) 434tri+prep time: 0.000038 sec (incl time to compute L and U) 435compute C time: 0.000004 sec 436reduce (C) time: 0.000001 sec 437rate 0.13 million edges/sec (incl time for U=triu(A)) 438rate 1.17 million edges/sec (just tricount itself) 439L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.20088) 440tricount time: 0.000004 sec (dot product method) 441tri+prep time: 0.000038 sec (incl time to compute L and U) 442compute C time: 0.000003 sec 443reduce (C) time: 0.000000 sec 444rate 0.13 million edges/sec (incl time for U=triu(A)) 445rate 1.40 million edges/sec (just tricount itself) 446L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.08018) 447tricount time: 0.000004 sec (dot product method) 448tri+prep time: 0.000038 sec (incl time to compute L and U) 449compute C time: 0.000003 sec 450reduce (C) time: 0.000000 sec 451rate 0.13 million edges/sec (incl time for U=triu(A)) 452rate 1.27 million edges/sec (just tricount itself) 453L*U' time (dot): 0.000002 sec (nthreads: 8 speedup 1.68545) 454tricount time: 0.000003 sec (dot product method) 455tri+prep time: 0.000037 sec (incl time to compute L and U) 456compute C time: 0.000002 sec 457reduce (C) time: 0.000001 sec 458rate 0.14 million edges/sec (incl time for U=triu(A)) 459rate 1.79 million edges/sec (just tricount itself) 460 461----------------------------------- saxpy method: 462C<L>=L*L time (saxpy): 0.000008 sec 463tricount time: 0.000009 sec (saxpy method) 464tri+prep time: 0.000013 sec (incl time to compute L) 465compute C time: 0.000008 sec 466reduce (C) time: 0.000000 sec 467rate 0.40 million edges/sec (incl time for L=tril(A)) 468rate 0.57 million edges/sec (just tricount itself) 469C<L>=L*L time (saxpy): 0.000002 sec (nthreads: 2 speedup 3.33955) 470tricount time: 0.000003 sec (saxpy method) 471tri+prep time: 0.000007 sec (incl time to compute L) 472compute C time: 0.000002 sec 473reduce (C) time: 0.000000 sec 474rate 0.75 million edges/sec (incl time for L=tril(A)) 475rate 1.75 million edges/sec (just tricount itself) 476C<L>=L*L time (saxpy): 0.000002 sec (nthreads: 4 speedup 4.10873) 477tricount time: 0.000002 sec (saxpy method) 478tri+prep time: 0.000006 sec (incl time to compute L) 479compute C time: 0.000002 sec 480reduce (C) time: 0.000000 sec 481rate 0.81 million edges/sec (incl time for L=tril(A)) 482rate 2.11 million edges/sec (just tricount itself) 483C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 8 speedup 1.98804) 484tricount time: 0.000005 sec (saxpy method) 485tri+prep time: 0.000008 sec (incl time to compute L) 486compute C time: 0.000004 sec 487reduce (C) time: 0.000000 sec 488rate 0.59 million edges/sec (incl time for L=tril(A)) 489rate 1.07 million edges/sec (just tricount itself) 490 491-------------------------------------------------------------- 492matrix 7 by 7, 16 entries, from stdin 493 494total time to read A matrix: 0.000242 sec 495 496n 7 # edges 8 497U=triu(A) time: 0.000018 sec 498L=tril(A) time: 0.000003 sec 499 500------------------------------------- dot product method: 501# triangles 0 502L*U' time (dot): 0.000033 sec 503tricount time: 0.000035 sec (dot product method) 504tri+prep time: 0.000057 sec (incl time to compute L and U) 505compute C time: 0.000033 sec 506reduce (C) time: 0.000003 sec 507rate 0.14 million edges/sec (incl time for U=triu(A)) 508rate 0.23 million edges/sec (just tricount itself) 509L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 6.33494) 510tricount time: 0.000006 sec (dot product method) 511tri+prep time: 0.000027 sec (incl time to compute L and U) 512compute C time: 0.000005 sec 513reduce (C) time: 0.000001 sec 514rate 0.29 million edges/sec (incl time for U=triu(A)) 515rate 1.40 million edges/sec (just tricount itself) 516L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 8.49876) 517tricount time: 0.000004 sec (dot product method) 518tri+prep time: 0.000026 sec (incl time to compute L and U) 519compute C time: 0.000004 sec 520reduce (C) time: 0.000000 sec 521rate 0.31 million edges/sec (incl time for U=triu(A)) 522rate 1.90 million edges/sec (just tricount itself) 523L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 9.15498) 524tricount time: 0.000004 sec (dot product method) 525tri+prep time: 0.000025 sec (incl time to compute L and U) 526compute C time: 0.000004 sec 527reduce (C) time: 0.000000 sec 528rate 0.31 million edges/sec (incl time for U=triu(A)) 529rate 2.04 million edges/sec (just tricount itself) 530L*U' time (dot): 0.000004 sec 531tricount time: 0.000005 sec (dot product method) 532tri+prep time: 0.000026 sec (incl time to compute L and U) 533compute C time: 0.000004 sec 534reduce (C) time: 0.000000 sec 535rate 0.31 million edges/sec (incl time for U=triu(A)) 536rate 1.71 million edges/sec (just tricount itself) 537L*U' time (dot): 0.000004 sec (nthreads: 2 speedup 1.09382) 538tricount time: 0.000004 sec (dot product method) 539tri+prep time: 0.000026 sec (incl time to compute L and U) 540compute C time: 0.000004 sec 541reduce (C) time: 0.000000 sec 542rate 0.31 million edges/sec (incl time for U=triu(A)) 543rate 1.89 million edges/sec (just tricount itself) 544L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 1.21016) 545tricount time: 0.000004 sec (dot product method) 546tri+prep time: 0.000025 sec (incl time to compute L and U) 547compute C time: 0.000004 sec 548reduce (C) time: 0.000000 sec 549rate 0.32 million edges/sec (incl time for U=triu(A)) 550rate 2.07 million edges/sec (just tricount itself) 551L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.26045) 552tricount time: 0.000004 sec (dot product method) 553tri+prep time: 0.000025 sec (incl time to compute L and U) 554compute C time: 0.000003 sec 555reduce (C) time: 0.000000 sec 556rate 0.32 million edges/sec (incl time for U=triu(A)) 557rate 2.16 million edges/sec (just tricount itself) 558 559----------------------------------- saxpy method: 560C<L>=L*L time (saxpy): 0.000020 sec 561tricount time: 0.000020 sec (saxpy method) 562tri+prep time: 0.000024 sec (incl time to compute L) 563compute C time: 0.000020 sec 564reduce (C) time: 0.000000 sec 565rate 0.34 million edges/sec (incl time for L=tril(A)) 566rate 0.40 million edges/sec (just tricount itself) 567C<L>=L*L time (saxpy): 0.000005 sec (nthreads: 2 speedup 4.04016) 568tricount time: 0.000005 sec (saxpy method) 569tri+prep time: 0.000009 sec (incl time to compute L) 570compute C time: 0.000005 sec 571reduce (C) time: 0.000000 sec 572rate 0.93 million edges/sec (incl time for L=tril(A)) 573rate 1.53 million edges/sec (just tricount itself) 574C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 4 speedup 4.86751) 575tricount time: 0.000004 sec (saxpy method) 576tri+prep time: 0.000008 sec (incl time to compute L) 577compute C time: 0.000004 sec 578reduce (C) time: 0.000000 sec 579rate 1.03 million edges/sec (incl time for L=tril(A)) 580rate 1.83 million edges/sec (just tricount itself) 581C<L>=L*L time (saxpy): 0.000005 sec (nthreads: 8 speedup 3.92911) 582tricount time: 0.000005 sec (saxpy method) 583tri+prep time: 0.000009 sec (incl time to compute L) 584compute C time: 0.000005 sec 585reduce (C) time: 0.000000 sec 586rate 0.91 million edges/sec (incl time for L=tril(A)) 587rate 1.47 million edges/sec (just tricount itself) 588 589-------------------------------------------------------------- 590matrix 304 by 304, 876 entries, from stdin 591 592total time to read A matrix: 0.000394 sec 593 594n 304 # edges 438 595U=triu(A) time: 0.000025 sec 596L=tril(A) time: 0.000008 sec 597 598------------------------------------- dot product method: 599# triangles 0 600L*U' time (dot): 0.000036 sec 601tricount time: 0.000039 sec (dot product method) 602tri+prep time: 0.000071 sec (incl time to compute L and U) 603compute C time: 0.000036 sec 604reduce (C) time: 0.000002 sec 605rate 6.16 million edges/sec (incl time for U=triu(A)) 606rate 11.25 million edges/sec (just tricount itself) 607L*U' time (dot): 0.000010 sec (nthreads: 2 speedup 3.71621) 608tricount time: 0.000011 sec (dot product method) 609tri+prep time: 0.000043 sec (incl time to compute L and U) 610compute C time: 0.000010 sec 611reduce (C) time: 0.000001 sec 612rate 10.23 million edges/sec (incl time for U=triu(A)) 613rate 41.16 million edges/sec (just tricount itself) 614L*U' time (dot): 0.000008 sec (nthreads: 4 speedup 4.44921) 615tricount time: 0.000009 sec (dot product method) 616tri+prep time: 0.000041 sec (incl time to compute L and U) 617compute C time: 0.000008 sec 618reduce (C) time: 0.000001 sec 619rate 10.67 million edges/sec (incl time for U=triu(A)) 620rate 49.47 million edges/sec (just tricount itself) 621L*U' time (dot): 0.000008 sec (nthreads: 8 speedup 4.49582) 622tricount time: 0.000009 sec (dot product method) 623tri+prep time: 0.000041 sec (incl time to compute L and U) 624compute C time: 0.000008 sec 625reduce (C) time: 0.000001 sec 626rate 10.69 million edges/sec (incl time for U=triu(A)) 627rate 49.93 million edges/sec (just tricount itself) 628L*U' time (dot): 0.000008 sec 629tricount time: 0.000008 sec (dot product method) 630tri+prep time: 0.000041 sec (incl time to compute L and U) 631compute C time: 0.000008 sec 632reduce (C) time: 0.000001 sec 633rate 10.80 million edges/sec (incl time for U=triu(A)) 634rate 52.26 million edges/sec (just tricount itself) 635L*U' time (dot): 0.000007 sec (nthreads: 2 speedup 1.081) 636tricount time: 0.000008 sec (dot product method) 637tri+prep time: 0.000040 sec (incl time to compute L and U) 638compute C time: 0.000007 sec 639reduce (C) time: 0.000001 sec 640rate 10.99 million edges/sec (incl time for U=triu(A)) 641rate 57.20 million edges/sec (just tricount itself) 642L*U' time (dot): 0.000007 sec (nthreads: 4 speedup 1.09474) 643tricount time: 0.000008 sec (dot product method) 644tri+prep time: 0.000040 sec (incl time to compute L and U) 645compute C time: 0.000007 sec 646reduce (C) time: 0.000001 sec 647rate 11.03 million edges/sec (incl time for U=triu(A)) 648rate 58.17 million edges/sec (just tricount itself) 649L*U' time (dot): 0.000007 sec (nthreads: 8 speedup 1.109) 650tricount time: 0.000007 sec (dot product method) 651tri+prep time: 0.000040 sec (incl time to compute L and U) 652compute C time: 0.000007 sec 653reduce (C) time: 0.000001 sec 654rate 11.05 million edges/sec (incl time for U=triu(A)) 655rate 58.79 million edges/sec (just tricount itself) 656 657----------------------------------- saxpy method: 658C<L>=L*L time (saxpy): 0.000048 sec 659tricount time: 0.000048 sec (saxpy method) 660tri+prep time: 0.000056 sec (incl time to compute L) 661compute C time: 0.000048 sec 662reduce (C) time: 0.000001 sec 663rate 7.82 million edges/sec (incl time for L=tril(A)) 664rate 9.06 million edges/sec (just tricount itself) 665C<L>=L*L time (saxpy): 0.000028 sec (nthreads: 2 speedup 1.71429) 666tricount time: 0.000028 sec (saxpy method) 667tri+prep time: 0.000036 sec (incl time to compute L) 668compute C time: 0.000028 sec 669reduce (C) time: 0.000000 sec 670rate 12.16 million edges/sec (incl time for L=tril(A)) 671rate 15.46 million edges/sec (just tricount itself) 672C<L>=L*L time (saxpy): 0.000027 sec (nthreads: 4 speedup 1.75412) 673tricount time: 0.000028 sec (saxpy method) 674tri+prep time: 0.000035 sec (incl time to compute L) 675compute C time: 0.000027 sec 676reduce (C) time: 0.000000 sec 677rate 12.39 million edges/sec (incl time for L=tril(A)) 678rate 15.83 million edges/sec (just tricount itself) 679C<L>=L*L time (saxpy): 0.000030 sec (nthreads: 8 speedup 1.59233) 680tricount time: 0.000030 sec (saxpy method) 681tri+prep time: 0.000038 sec (incl time to compute L) 682compute C time: 0.000030 sec 683reduce (C) time: 0.000000 sec 684rate 11.49 million edges/sec (incl time for L=tril(A)) 685rate 14.39 million edges/sec (just tricount itself) 686 687-------------------------------------------------------------- 688matrix 48 by 48, 352 entries, from stdin 689 690total time to read A matrix: 0.000287 sec 691 692n 48 # edges 176 693U=triu(A) time: 0.000028 sec 694L=tril(A) time: 0.000009 sec 695 696------------------------------------- dot product method: 697# triangles 160 698L*U' time (dot): 0.000043 sec 699tricount time: 0.000047 sec (dot product method) 700tri+prep time: 0.000084 sec (incl time to compute L and U) 701compute C time: 0.000043 sec 702reduce (C) time: 0.000003 sec 703rate 2.10 million edges/sec (incl time for U=triu(A)) 704rate 3.78 million edges/sec (just tricount itself) 705L*U' time (dot): 0.000010 sec (nthreads: 2 speedup 4.16297) 706tricount time: 0.000012 sec (dot product method) 707tri+prep time: 0.000049 sec (incl time to compute L and U) 708compute C time: 0.000010 sec 709reduce (C) time: 0.000001 sec 710rate 3.60 million edges/sec (incl time for U=triu(A)) 711rate 14.98 million edges/sec (just tricount itself) 712L*U' time (dot): 0.000007 sec (nthreads: 4 speedup 5.94283) 713tricount time: 0.000008 sec (dot product method) 714tri+prep time: 0.000045 sec (incl time to compute L and U) 715compute C time: 0.000007 sec 716reduce (C) time: 0.000001 sec 717rate 3.88 million edges/sec (incl time for U=triu(A)) 718rate 21.31 million edges/sec (just tricount itself) 719L*U' time (dot): 0.000011 sec (nthreads: 8 speedup 4.00864) 720tricount time: 0.000012 sec (dot product method) 721tri+prep time: 0.000049 sec (incl time to compute L and U) 722compute C time: 0.000011 sec 723reduce (C) time: 0.000001 sec 724rate 3.57 million edges/sec (incl time for U=triu(A)) 725rate 14.40 million edges/sec (just tricount itself) 726L*U' time (dot): 0.000014 sec 727tricount time: 0.000015 sec (dot product method) 728tri+prep time: 0.000052 sec (incl time to compute L and U) 729compute C time: 0.000014 sec 730reduce (C) time: 0.000001 sec 731rate 3.37 million edges/sec (incl time for U=triu(A)) 732rate 11.63 million edges/sec (just tricount itself) 733L*U' time (dot): 0.000009 sec (nthreads: 2 speedup 1.56815) 734tricount time: 0.000010 sec (dot product method) 735tri+prep time: 0.000047 sec (incl time to compute L and U) 736compute C time: 0.000009 sec 737reduce (C) time: 0.000001 sec 738rate 3.76 million edges/sec (incl time for U=triu(A)) 739rate 18.17 million edges/sec (just tricount itself) 740L*U' time (dot): 0.000011 sec (nthreads: 4 speedup 1.21667) 741tricount time: 0.000013 sec (dot product method) 742tri+prep time: 0.000050 sec (incl time to compute L and U) 743compute C time: 0.000011 sec 744reduce (C) time: 0.000001 sec 745rate 3.54 million edges/sec (incl time for U=triu(A)) 746rate 13.86 million edges/sec (just tricount itself) 747L*U' time (dot): 0.000011 sec (nthreads: 8 speedup 1.20288) 748tricount time: 0.000013 sec (dot product method) 749tri+prep time: 0.000050 sec (incl time to compute L and U) 750compute C time: 0.000011 sec 751reduce (C) time: 0.000001 sec 752rate 3.53 million edges/sec (incl time for U=triu(A)) 753rate 13.81 million edges/sec (just tricount itself) 754 755----------------------------------- saxpy method: 756C<L>=L*L time (saxpy): 0.000047 sec 757tricount time: 0.000048 sec (saxpy method) 758tri+prep time: 0.000057 sec (incl time to compute L) 759compute C time: 0.000047 sec 760reduce (C) time: 0.000001 sec 761rate 3.06 million edges/sec (incl time for L=tril(A)) 762rate 3.66 million edges/sec (just tricount itself) 763C<L>=L*L time (saxpy): 0.000019 sec (nthreads: 2 speedup 2.466) 764tricount time: 0.000020 sec (saxpy method) 765tri+prep time: 0.000029 sec (incl time to compute L) 766compute C time: 0.000019 sec 767reduce (C) time: 0.000001 sec 768rate 6.04 million edges/sec (incl time for L=tril(A)) 769rate 8.93 million edges/sec (just tricount itself) 770C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 4 speedup 3.50546) 771tricount time: 0.000014 sec (saxpy method) 772tri+prep time: 0.000023 sec (incl time to compute L) 773compute C time: 0.000013 sec 774reduce (C) time: 0.000001 sec 775rate 7.51 million edges/sec (incl time for L=tril(A)) 776rate 12.58 million edges/sec (just tricount itself) 777C<L>=L*L time (saxpy): 0.000015 sec (nthreads: 8 speedup 3.0676) 778tricount time: 0.000016 sec (saxpy method) 779tri+prep time: 0.000025 sec (incl time to compute L) 780compute C time: 0.000015 sec 781reduce (C) time: 0.000001 sec 782rate 6.91 million edges/sec (incl time for L=tril(A)) 783rate 10.97 million edges/sec (just tricount itself) 784 785-------------------------------------------------------------- 786matrix 4884 by 4884, 285494 entries, from stdin 787 788total time to read A matrix: 0.073128 sec 789 790n 4884 # edges 142747 791U=triu(A) time: 0.000225 sec 792L=tril(A) time: 0.000142 sec 793 794------------------------------------- dot product method: 795# triangles 1512964 796L*U' time (dot): 0.013911 sec 797tricount time: 0.014396 sec (dot product method) 798tri+prep time: 0.014764 sec (incl time to compute L and U) 799compute C time: 0.013911 sec 800reduce (C) time: 0.000486 sec 801rate 9.67 million edges/sec (incl time for U=triu(A)) 802rate 9.92 million edges/sec (just tricount itself) 803L*U' time (dot): 0.006919 sec (nthreads: 2 speedup 2.01037) 804tricount time: 0.007159 sec (dot product method) 805tri+prep time: 0.007527 sec (incl time to compute L and U) 806compute C time: 0.006919 sec 807reduce (C) time: 0.000239 sec 808rate 18.97 million edges/sec (incl time for U=triu(A)) 809rate 19.94 million edges/sec (just tricount itself) 810L*U' time (dot): 0.003827 sec (nthreads: 4 speedup 3.63466) 811tricount time: 0.004121 sec (dot product method) 812tri+prep time: 0.004488 sec (incl time to compute L and U) 813compute C time: 0.003827 sec 814reduce (C) time: 0.000293 sec 815rate 31.80 million edges/sec (incl time for U=triu(A)) 816rate 34.64 million edges/sec (just tricount itself) 817L*U' time (dot): 0.005970 sec (nthreads: 8 speedup 2.33004) 818tricount time: 0.006280 sec (dot product method) 819tri+prep time: 0.006648 sec (incl time to compute L and U) 820compute C time: 0.005970 sec 821reduce (C) time: 0.000310 sec 822rate 21.47 million edges/sec (incl time for U=triu(A)) 823rate 22.73 million edges/sec (just tricount itself) 824L*U' time (dot): 0.015373 sec 825tricount time: 0.015847 sec (dot product method) 826tri+prep time: 0.016215 sec (incl time to compute L and U) 827compute C time: 0.015373 sec 828reduce (C) time: 0.000475 sec 829rate 8.80 million edges/sec (incl time for U=triu(A)) 830rate 9.01 million edges/sec (just tricount itself) 831L*U' time (dot): 0.007376 sec (nthreads: 2 speedup 2.08416) 832tricount time: 0.007622 sec (dot product method) 833tri+prep time: 0.007989 sec (incl time to compute L and U) 834compute C time: 0.007376 sec 835reduce (C) time: 0.000246 sec 836rate 17.87 million edges/sec (incl time for U=triu(A)) 837rate 18.73 million edges/sec (just tricount itself) 838L*U' time (dot): 0.004246 sec (nthreads: 4 speedup 3.62042) 839tricount time: 0.004506 sec (dot product method) 840tri+prep time: 0.004874 sec (incl time to compute L and U) 841compute C time: 0.004246 sec 842reduce (C) time: 0.000260 sec 843rate 29.29 million edges/sec (incl time for U=triu(A)) 844rate 31.68 million edges/sec (just tricount itself) 845L*U' time (dot): 0.006729 sec (nthreads: 8 speedup 2.28465) 846tricount time: 0.007020 sec (dot product method) 847tri+prep time: 0.007388 sec (incl time to compute L and U) 848compute C time: 0.006729 sec 849reduce (C) time: 0.000292 sec 850rate 19.32 million edges/sec (incl time for U=triu(A)) 851rate 20.33 million edges/sec (just tricount itself) 852 853----------------------------------- saxpy method: 854C<L>=L*L time (saxpy): 0.014019 sec 855tricount time: 0.014413 sec (saxpy method) 856tri+prep time: 0.014556 sec (incl time to compute L) 857compute C time: 0.014019 sec 858reduce (C) time: 0.000394 sec 859rate 9.81 million edges/sec (incl time for L=tril(A)) 860rate 9.90 million edges/sec (just tricount itself) 861C<L>=L*L time (saxpy): 0.007036 sec (nthreads: 2 speedup 1.99254) 862tricount time: 0.007225 sec (saxpy method) 863tri+prep time: 0.007367 sec (incl time to compute L) 864compute C time: 0.007036 sec 865reduce (C) time: 0.000189 sec 866rate 19.38 million edges/sec (incl time for L=tril(A)) 867rate 19.76 million edges/sec (just tricount itself) 868C<L>=L*L time (saxpy): 0.004042 sec (nthreads: 4 speedup 3.46866) 869tricount time: 0.004236 sec (saxpy method) 870tri+prep time: 0.004378 sec (incl time to compute L) 871compute C time: 0.004042 sec 872reduce (C) time: 0.000194 sec 873rate 32.60 million edges/sec (incl time for L=tril(A)) 874rate 33.70 million edges/sec (just tricount itself) 875C<L>=L*L time (saxpy): 0.003180 sec (nthreads: 8 speedup 4.40848) 876tricount time: 0.003398 sec (saxpy method) 877tri+prep time: 0.003540 sec (incl time to compute L) 878compute C time: 0.003180 sec 879reduce (C) time: 0.000218 sec 880rate 40.32 million edges/sec (incl time for L=tril(A)) 881rate 42.01 million edges/sec (just tricount itself) 882 883-------------------------------------------------------------- 884matrix 183 by 183, 1402 entries, from stdin 885 886total time to read A matrix: 0.000637 sec 887 888n 183 # edges 701 889U=triu(A) time: 0.000030 sec 890L=tril(A) time: 0.000010 sec 891 892------------------------------------- dot product method: 893# triangles 863 894L*U' time (dot): 0.000067 sec 895tricount time: 0.000072 sec (dot product method) 896tri+prep time: 0.000112 sec (incl time to compute L and U) 897compute C time: 0.000067 sec 898reduce (C) time: 0.000005 sec 899rate 6.28 million edges/sec (incl time for U=triu(A)) 900rate 9.71 million edges/sec (just tricount itself) 901L*U' time (dot): 0.000039 sec (nthreads: 2 speedup 1.74055) 902tricount time: 0.000042 sec (dot product method) 903tri+prep time: 0.000081 sec (incl time to compute L and U) 904compute C time: 0.000039 sec 905reduce (C) time: 0.000003 sec 906rate 8.65 million edges/sec (incl time for U=triu(A)) 907rate 16.83 million edges/sec (just tricount itself) 908L*U' time (dot): 0.000035 sec (nthreads: 4 speedup 1.92135) 909tricount time: 0.000038 sec (dot product method) 910tri+prep time: 0.000077 sec (incl time to compute L and U) 911compute C time: 0.000035 sec 912reduce (C) time: 0.000003 sec 913rate 9.09 million edges/sec (incl time for U=triu(A)) 914rate 18.58 million edges/sec (just tricount itself) 915L*U' time (dot): 0.000032 sec (nthreads: 8 speedup 2.0853) 916tricount time: 0.000035 sec (dot product method) 917tri+prep time: 0.000074 sec (incl time to compute L and U) 918compute C time: 0.000032 sec 919reduce (C) time: 0.000003 sec 920rate 9.45 million edges/sec (incl time for U=triu(A)) 921rate 20.15 million edges/sec (just tricount itself) 922L*U' time (dot): 0.000044 sec 923tricount time: 0.000047 sec (dot product method) 924tri+prep time: 0.000086 sec (incl time to compute L and U) 925compute C time: 0.000044 sec 926reduce (C) time: 0.000003 sec 927rate 8.15 million edges/sec (incl time for U=triu(A)) 928rate 15.02 million edges/sec (just tricount itself) 929L*U' time (dot): 0.000040 sec (nthreads: 2 speedup 1.10621) 930tricount time: 0.000042 sec (dot product method) 931tri+prep time: 0.000082 sec (incl time to compute L and U) 932compute C time: 0.000040 sec 933reduce (C) time: 0.000003 sec 934rate 8.59 million edges/sec (incl time for U=triu(A)) 935rate 16.61 million edges/sec (just tricount itself) 936L*U' time (dot): 0.000050 sec (nthreads: 4 speedup 0.871133) 937tricount time: 0.000053 sec (dot product method) 938tri+prep time: 0.000092 sec (incl time to compute L and U) 939compute C time: 0.000050 sec 940reduce (C) time: 0.000003 sec 941rate 7.59 million edges/sec (incl time for U=triu(A)) 942rate 13.24 million edges/sec (just tricount itself) 943L*U' time (dot): 0.000035 sec (nthreads: 8 speedup 1.24625) 944tricount time: 0.000038 sec (dot product method) 945tri+prep time: 0.000077 sec (incl time to compute L and U) 946compute C time: 0.000035 sec 947reduce (C) time: 0.000002 sec 948rate 9.10 million edges/sec (incl time for U=triu(A)) 949rate 18.60 million edges/sec (just tricount itself) 950 951----------------------------------- saxpy method: 952C<L>=L*L time (saxpy): 0.000059 sec 953tricount time: 0.000060 sec (saxpy method) 954tri+prep time: 0.000070 sec (incl time to compute L) 955compute C time: 0.000059 sec 956reduce (C) time: 0.000002 sec 957rate 10.03 million edges/sec (incl time for L=tril(A)) 958rate 11.64 million edges/sec (just tricount itself) 959C<L>=L*L time (saxpy): 0.000035 sec (nthreads: 2 speedup 1.664) 960tricount time: 0.000037 sec (saxpy method) 961tri+prep time: 0.000046 sec (incl time to compute L) 962compute C time: 0.000035 sec 963reduce (C) time: 0.000001 sec 964rate 15.16 million edges/sec (incl time for L=tril(A)) 965rate 19.14 million edges/sec (just tricount itself) 966C<L>=L*L time (saxpy): 0.000032 sec (nthreads: 4 speedup 1.80297) 967tricount time: 0.000034 sec (saxpy method) 968tri+prep time: 0.000044 sec (incl time to compute L) 969compute C time: 0.000032 sec 970reduce (C) time: 0.000001 sec 971rate 16.10 million edges/sec (incl time for L=tril(A)) 972rate 20.67 million edges/sec (just tricount itself) 973C<L>=L*L time (saxpy): 0.000032 sec (nthreads: 8 speedup 1.81218) 974tricount time: 0.000034 sec (saxpy method) 975tri+prep time: 0.000043 sec (incl time to compute L) 976compute C time: 0.000032 sec 977reduce (C) time: 0.000002 sec 978rate 16.13 million edges/sec (incl time for L=tril(A)) 979rate 20.72 million edges/sec (just tricount itself) 980 981-------------------------------------------------------------- 982matrix 63 by 63, 246 entries, from stdin 983 984total time to read A matrix: 0.000214 sec 985 986n 63 # edges 123 987U=triu(A) time: 0.000016 sec 988L=tril(A) time: 0.000005 sec 989 990------------------------------------- dot product method: 991# triangles 0 992L*U' time (dot): 0.000021 sec 993tricount time: 0.000022 sec (dot product method) 994tri+prep time: 0.000043 sec (incl time to compute L and U) 995compute C time: 0.000021 sec 996reduce (C) time: 0.000001 sec 997rate 2.86 million edges/sec (incl time for U=triu(A)) 998rate 5.53 million edges/sec (just tricount itself) 999L*U' time (dot): 0.000004 sec (nthreads: 2 speedup 4.69135) 1000tricount time: 0.000005 sec (dot product method) 1001tri+prep time: 0.000026 sec (incl time to compute L and U) 1002compute C time: 0.000004 sec 1003reduce (C) time: 0.000000 sec 1004rate 4.79 million edges/sec (incl time for U=triu(A)) 1005rate 25.15 million edges/sec (just tricount itself) 1006L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 6.61758) 1007tricount time: 0.000003 sec (dot product method) 1008tri+prep time: 0.000024 sec (incl time to compute L and U) 1009compute C time: 0.000003 sec 1010reduce (C) time: 0.000000 sec 1011rate 5.07 million edges/sec (incl time for U=triu(A)) 1012rate 35.47 million edges/sec (just tricount itself) 1013L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 7.13425) 1014tricount time: 0.000003 sec (dot product method) 1015tri+prep time: 0.000024 sec (incl time to compute L and U) 1016compute C time: 0.000003 sec 1017reduce (C) time: 0.000000 sec 1018rate 5.12 million edges/sec (incl time for U=triu(A)) 1019rate 37.80 million edges/sec (just tricount itself) 1020L*U' time (dot): 0.000003 sec 1021tricount time: 0.000004 sec (dot product method) 1022tri+prep time: 0.000025 sec (incl time to compute L and U) 1023compute C time: 0.000003 sec 1024reduce (C) time: 0.000000 sec 1025rate 4.98 million edges/sec (incl time for U=triu(A)) 1026rate 31.26 million edges/sec (just tricount itself) 1027L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.15055) 1028tricount time: 0.000003 sec (dot product method) 1029tri+prep time: 0.000024 sec (incl time to compute L and U) 1030compute C time: 0.000003 sec 1031reduce (C) time: 0.000000 sec 1032rate 5.09 million edges/sec (incl time for U=triu(A)) 1033rate 36.33 million edges/sec (just tricount itself) 1034L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.26994) 1035tricount time: 0.000003 sec (dot product method) 1036tri+prep time: 0.000024 sec (incl time to compute L and U) 1037compute C time: 0.000003 sec 1038reduce (C) time: 0.000000 sec 1039rate 5.16 million edges/sec (incl time for U=triu(A)) 1040rate 39.87 million edges/sec (just tricount itself) 1041L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.35606) 1042tricount time: 0.000003 sec (dot product method) 1043tri+prep time: 0.000024 sec (incl time to compute L and U) 1044compute C time: 0.000003 sec 1045reduce (C) time: 0.000000 sec 1046rate 5.19 million edges/sec (incl time for U=triu(A)) 1047rate 42.14 million edges/sec (just tricount itself) 1048 1049----------------------------------- saxpy method: 1050C<L>=L*L time (saxpy): 0.000023 sec 1051tricount time: 0.000023 sec (saxpy method) 1052tri+prep time: 0.000028 sec (incl time to compute L) 1053compute C time: 0.000023 sec 1054reduce (C) time: 0.000000 sec 1055rate 4.44 million edges/sec (incl time for L=tril(A)) 1056rate 5.33 million edges/sec (just tricount itself) 1057C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 2 speedup 1.72664) 1058tricount time: 0.000013 sec (saxpy method) 1059tri+prep time: 0.000018 sec (incl time to compute L) 1060compute C time: 0.000013 sec 1061reduce (C) time: 0.000000 sec 1062rate 6.83 million edges/sec (incl time for L=tril(A)) 1063rate 9.19 million edges/sec (just tricount itself) 1064C<L>=L*L time (saxpy): 0.000012 sec (nthreads: 4 speedup 1.85064) 1065tricount time: 0.000012 sec (saxpy method) 1066tri+prep time: 0.000017 sec (incl time to compute L) 1067compute C time: 0.000012 sec 1068reduce (C) time: 0.000000 sec 1069rate 7.20 million edges/sec (incl time for L=tril(A)) 1070rate 9.85 million edges/sec (just tricount itself) 1071C<L>=L*L time (saxpy): 0.000016 sec (nthreads: 8 speedup 1.40767) 1072tricount time: 0.000016 sec (saxpy method) 1073tri+prep time: 0.000021 sec (incl time to compute L) 1074compute C time: 0.000016 sec 1075reduce (C) time: 0.000000 sec 1076rate 5.84 million edges/sec (incl time for L=tril(A)) 1077rate 7.48 million edges/sec (just tricount itself) 1078 1079-------------------------------------------------------------- 1080matrix 63 by 63, 246 entries, from stdin 1081 1082total time to read A matrix: 0.000211 sec 1083 1084n 63 # edges 123 1085U=triu(A) time: 0.000019 sec 1086L=tril(A) time: 0.000005 sec 1087 1088------------------------------------- dot product method: 1089# triangles 0 1090L*U' time (dot): 0.000026 sec 1091tricount time: 0.000027 sec (dot product method) 1092tri+prep time: 0.000051 sec (incl time to compute L and U) 1093compute C time: 0.000026 sec 1094reduce (C) time: 0.000002 sec 1095rate 2.42 million edges/sec (incl time for U=triu(A)) 1096rate 4.49 million edges/sec (just tricount itself) 1097L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 5.61418) 1098tricount time: 0.000005 sec (dot product method) 1099tri+prep time: 0.000028 sec (incl time to compute L and U) 1100compute C time: 0.000005 sec 1101reduce (C) time: 0.000000 sec 1102rate 4.34 million edges/sec (incl time for U=triu(A)) 1103rate 24.47 million edges/sec (just tricount itself) 1104L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 8.20958) 1105tricount time: 0.000003 sec (dot product method) 1106tri+prep time: 0.000027 sec (incl time to compute L and U) 1107compute C time: 0.000003 sec 1108reduce (C) time: 0.000000 sec 1109rate 4.59 million edges/sec (incl time for U=triu(A)) 1110rate 35.33 million edges/sec (just tricount itself) 1111L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 8.76997) 1112tricount time: 0.000003 sec (dot product method) 1113tri+prep time: 0.000027 sec (incl time to compute L and U) 1114compute C time: 0.000003 sec 1115reduce (C) time: 0.000000 sec 1116rate 4.62 million edges/sec (incl time for U=triu(A)) 1117rate 37.63 million edges/sec (just tricount itself) 1118L*U' time (dot): 0.000003 sec 1119tricount time: 0.000004 sec (dot product method) 1120tri+prep time: 0.000027 sec (incl time to compute L and U) 1121compute C time: 0.000003 sec 1122reduce (C) time: 0.000000 sec 1123rate 4.52 million edges/sec (incl time for U=triu(A)) 1124rate 31.65 million edges/sec (just tricount itself) 1125L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.0719) 1126tricount time: 0.000004 sec (dot product method) 1127tri+prep time: 0.000027 sec (incl time to compute L and U) 1128compute C time: 0.000003 sec 1129reduce (C) time: 0.000000 sec 1130rate 4.57 million edges/sec (incl time for U=triu(A)) 1131rate 34.42 million edges/sec (just tricount itself) 1132L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.18954) 1133tricount time: 0.000003 sec (dot product method) 1134tri+prep time: 0.000027 sec (incl time to compute L and U) 1135compute C time: 0.000003 sec 1136reduce (C) time: 0.000000 sec 1137rate 4.62 million edges/sec (incl time for U=triu(A)) 1138rate 37.64 million edges/sec (just tricount itself) 1139L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.21036) 1140tricount time: 0.000003 sec (dot product method) 1141tri+prep time: 0.000027 sec (incl time to compute L and U) 1142compute C time: 0.000003 sec 1143reduce (C) time: 0.000000 sec 1144rate 4.64 million edges/sec (incl time for U=triu(A)) 1145rate 38.43 million edges/sec (just tricount itself) 1146 1147----------------------------------- saxpy method: 1148C<L>=L*L time (saxpy): 0.000025 sec 1149tricount time: 0.000025 sec (saxpy method) 1150tri+prep time: 0.000030 sec (incl time to compute L) 1151compute C time: 0.000025 sec 1152reduce (C) time: 0.000000 sec 1153rate 4.13 million edges/sec (incl time for L=tril(A)) 1154rate 4.87 million edges/sec (just tricount itself) 1155C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 2 speedup 1.89479) 1156tricount time: 0.000013 sec (saxpy method) 1157tri+prep time: 0.000018 sec (incl time to compute L) 1158compute C time: 0.000013 sec 1159reduce (C) time: 0.000000 sec 1160rate 6.84 million edges/sec (incl time for L=tril(A)) 1161rate 9.15 million edges/sec (just tricount itself) 1162C<L>=L*L time (saxpy): 0.000012 sec (nthreads: 4 speedup 1.99589) 1163tricount time: 0.000013 sec (saxpy method) 1164tri+prep time: 0.000017 sec (incl time to compute L) 1165compute C time: 0.000012 sec 1166reduce (C) time: 0.000000 sec 1167rate 7.11 million edges/sec (incl time for L=tril(A)) 1168rate 9.65 million edges/sec (just tricount itself) 1169C<L>=L*L time (saxpy): 0.000016 sec (nthreads: 8 speedup 1.52101) 1170tricount time: 0.000017 sec (saxpy method) 1171tri+prep time: 0.000021 sec (incl time to compute L) 1172compute C time: 0.000016 sec 1173reduce (C) time: 0.000000 sec 1174rate 5.80 million edges/sec (incl time for L=tril(A)) 1175rate 7.39 million edges/sec (just tricount itself) 1176 1177-------------------------------------------------------------- 1178matrix 78 by 78, 204 entries, from stdin 1179 1180total time to read A matrix: 0.000179 sec 1181 1182n 78 # edges 102 1183U=triu(A) time: 0.000017 sec 1184L=tril(A) time: 0.000004 sec 1185 1186------------------------------------- dot product method: 1187# triangles 0 1188L*U' time (dot): 0.000021 sec 1189tricount time: 0.000023 sec (dot product method) 1190tri+prep time: 0.000044 sec (incl time to compute L and U) 1191compute C time: 0.000021 sec 1192reduce (C) time: 0.000002 sec 1193rate 2.32 million edges/sec (incl time for U=triu(A)) 1194rate 4.46 million edges/sec (just tricount itself) 1195L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 4.33904) 1196tricount time: 0.000005 sec (dot product method) 1197tri+prep time: 0.000026 sec (incl time to compute L and U) 1198compute C time: 0.000005 sec 1199reduce (C) time: 0.000001 sec 1200rate 3.85 million edges/sec (incl time for U=triu(A)) 1201rate 18.71 million edges/sec (just tricount itself) 1202L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 6.50257) 1203tricount time: 0.000004 sec (dot product method) 1204tri+prep time: 0.000025 sec (incl time to compute L and U) 1205compute C time: 0.000003 sec 1206reduce (C) time: 0.000000 sec 1207rate 4.13 million edges/sec (incl time for U=triu(A)) 1208rate 28.13 million edges/sec (just tricount itself) 1209L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 7.15529) 1210tricount time: 0.000003 sec (dot product method) 1211tri+prep time: 0.000024 sec (incl time to compute L and U) 1212compute C time: 0.000003 sec 1213reduce (C) time: 0.000000 sec 1214rate 4.19 million edges/sec (incl time for U=triu(A)) 1215rate 30.93 million edges/sec (just tricount itself) 1216L*U' time (dot): 0.000003 sec 1217tricount time: 0.000004 sec (dot product method) 1218tri+prep time: 0.000025 sec (incl time to compute L and U) 1219compute C time: 0.000003 sec 1220reduce (C) time: 0.000000 sec 1221rate 4.13 million edges/sec (incl time for U=triu(A)) 1222rate 27.94 million edges/sec (just tricount itself) 1223L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.08112) 1224tricount time: 0.000003 sec (dot product method) 1225tri+prep time: 0.000024 sec (incl time to compute L and U) 1226compute C time: 0.000003 sec 1227reduce (C) time: 0.000000 sec 1228rate 4.18 million edges/sec (incl time for U=triu(A)) 1229rate 30.56 million edges/sec (just tricount itself) 1230L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.16769) 1231tricount time: 0.000003 sec (dot product method) 1232tri+prep time: 0.000024 sec (incl time to compute L and U) 1233compute C time: 0.000003 sec 1234reduce (C) time: 0.000000 sec 1235rate 4.22 million edges/sec (incl time for U=triu(A)) 1236rate 32.77 million edges/sec (just tricount itself) 1237L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 0.821643) 1238tricount time: 0.000004 sec (dot product method) 1239tri+prep time: 0.000025 sec (incl time to compute L and U) 1240compute C time: 0.000004 sec 1241reduce (C) time: 0.000000 sec 1242rate 4.01 million edges/sec (incl time for U=triu(A)) 1243rate 23.28 million edges/sec (just tricount itself) 1244 1245----------------------------------- saxpy method: 1246C<L>=L*L time (saxpy): 0.000022 sec 1247tricount time: 0.000022 sec (saxpy method) 1248tri+prep time: 0.000026 sec (incl time to compute L) 1249compute C time: 0.000022 sec 1250reduce (C) time: 0.000000 sec 1251rate 3.86 million edges/sec (incl time for L=tril(A)) 1252rate 4.59 million edges/sec (just tricount itself) 1253C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 2 speedup 1.7295) 1254tricount time: 0.000013 sec (saxpy method) 1255tri+prep time: 0.000017 sec (incl time to compute L) 1256compute C time: 0.000013 sec 1257reduce (C) time: 0.000000 sec 1258rate 5.94 million edges/sec (incl time for L=tril(A)) 1259rate 7.86 million edges/sec (just tricount itself) 1260C<L>=L*L time (saxpy): 0.000012 sec (nthreads: 4 speedup 1.76988) 1261tricount time: 0.000013 sec (saxpy method) 1262tri+prep time: 0.000017 sec (incl time to compute L) 1263compute C time: 0.000012 sec 1264reduce (C) time: 0.000000 sec 1265rate 6.06 million edges/sec (incl time for L=tril(A)) 1266rate 8.07 million edges/sec (just tricount itself) 1267C<L>=L*L time (saxpy): 0.000025 sec (nthreads: 8 speedup 0.885225) 1268tricount time: 0.000025 sec (saxpy method) 1269tri+prep time: 0.000029 sec (incl time to compute L) 1270compute C time: 0.000025 sec 1271reduce (C) time: 0.000000 sec 1272rate 3.49 million edges/sec (incl time for L=tril(A)) 1273rate 4.07 million edges/sec (just tricount itself) 1274 1275-------------------------------------------------------------- 1276matrix 982 by 982, 99840 entries, from stdin 1277 1278total time to read A matrix: 0.029471 sec 1279 1280n 982 # edges 49920 1281U=triu(A) time: 0.000174 sec 1282L=tril(A) time: 0.000148 sec 1283 1284------------------------------------- dot product method: 1285# triangles 0 1286L*U' time (dot): 0.000362 sec 1287tricount time: 0.000395 sec (dot product method) 1288tri+prep time: 0.000716 sec (incl time to compute L and U) 1289compute C time: 0.000362 sec 1290reduce (C) time: 0.000032 sec 1291rate 69.75 million edges/sec (incl time for U=triu(A)) 1292rate 126.53 million edges/sec (just tricount itself) 1293L*U' time (dot): 0.000333 sec (nthreads: 2 speedup 1.08962) 1294tricount time: 0.000363 sec (dot product method) 1295tri+prep time: 0.000684 sec (incl time to compute L and U) 1296compute C time: 0.000333 sec 1297reduce (C) time: 0.000030 sec 1298rate 72.98 million edges/sec (incl time for U=triu(A)) 1299rate 137.56 million edges/sec (just tricount itself) 1300L*U' time (dot): 0.000273 sec (nthreads: 4 speedup 1.32884) 1301tricount time: 0.000318 sec (dot product method) 1302tri+prep time: 0.000639 sec (incl time to compute L and U) 1303compute C time: 0.000273 sec 1304reduce (C) time: 0.000045 sec 1305rate 78.14 million edges/sec (incl time for U=triu(A)) 1306rate 157.11 million edges/sec (just tricount itself) 1307L*U' time (dot): 0.002922 sec (nthreads: 8 speedup 0.124043) 1308tricount time: 0.002957 sec (dot product method) 1309tri+prep time: 0.003278 sec (incl time to compute L and U) 1310compute C time: 0.002922 sec 1311reduce (C) time: 0.000035 sec 1312rate 15.23 million edges/sec (incl time for U=triu(A)) 1313rate 16.88 million edges/sec (just tricount itself) 1314L*U' time (dot): 0.000374 sec 1315tricount time: 0.000402 sec (dot product method) 1316tri+prep time: 0.000723 sec (incl time to compute L and U) 1317compute C time: 0.000374 sec 1318reduce (C) time: 0.000028 sec 1319rate 69.08 million edges/sec (incl time for U=triu(A)) 1320rate 124.32 million edges/sec (just tricount itself) 1321L*U' time (dot): 0.000279 sec (nthreads: 2 speedup 1.33844) 1322tricount time: 0.000307 sec (dot product method) 1323tri+prep time: 0.000628 sec (incl time to compute L and U) 1324compute C time: 0.000279 sec 1325reduce (C) time: 0.000027 sec 1326rate 79.52 million edges/sec (incl time for U=triu(A)) 1327rate 162.79 million edges/sec (just tricount itself) 1328L*U' time (dot): 0.000236 sec (nthreads: 4 speedup 1.58021) 1329tricount time: 0.000266 sec (dot product method) 1330tri+prep time: 0.000587 sec (incl time to compute L and U) 1331compute C time: 0.000236 sec 1332reduce (C) time: 0.000030 sec 1333rate 85.01 million edges/sec (incl time for U=triu(A)) 1334rate 187.59 million edges/sec (just tricount itself) 1335L*U' time (dot): 0.001664 sec (nthreads: 8 speedup 0.224596) 1336tricount time: 0.001696 sec (dot product method) 1337tri+prep time: 0.002017 sec (incl time to compute L and U) 1338compute C time: 0.001664 sec 1339reduce (C) time: 0.000033 sec 1340rate 24.75 million edges/sec (incl time for U=triu(A)) 1341rate 29.43 million edges/sec (just tricount itself) 1342 1343----------------------------------- saxpy method: 1344C<L>=L*L time (saxpy): 0.000412 sec 1345tricount time: 0.000413 sec (saxpy method) 1346tri+prep time: 0.000560 sec (incl time to compute L) 1347compute C time: 0.000412 sec 1348reduce (C) time: 0.000001 sec 1349rate 89.11 million edges/sec (incl time for L=tril(A)) 1350rate 120.99 million edges/sec (just tricount itself) 1351C<L>=L*L time (saxpy): 0.000348 sec (nthreads: 2 speedup 1.1835) 1352tricount time: 0.000349 sec (saxpy method) 1353tri+prep time: 0.000496 sec (incl time to compute L) 1354compute C time: 0.000348 sec 1355reduce (C) time: 0.000001 sec 1356rate 100.62 million edges/sec (incl time for L=tril(A)) 1357rate 143.23 million edges/sec (just tricount itself) 1358C<L>=L*L time (saxpy): 0.000373 sec (nthreads: 4 speedup 1.10476) 1359tricount time: 0.000373 sec (saxpy method) 1360tri+prep time: 0.000521 sec (incl time to compute L) 1361compute C time: 0.000373 sec 1362reduce (C) time: 0.000001 sec 1363rate 95.85 million edges/sec (incl time for L=tril(A)) 1364rate 133.75 million edges/sec (just tricount itself) 1365C<L>=L*L time (saxpy): 0.000377 sec (nthreads: 8 speedup 1.0916) 1366tricount time: 0.000378 sec (saxpy method) 1367tri+prep time: 0.000526 sec (incl time to compute L) 1368compute C time: 0.000377 sec 1369reduce (C) time: 0.000001 sec 1370rate 94.97 million edges/sec (incl time for L=tril(A)) 1371rate 132.06 million edges/sec (just tricount itself) 1372 1373-------------------------------------------------------------- 1374matrix 67 by 67, 574 entries, from stdin 1375 1376total time to read A matrix: 0.000275 sec 1377 1378n 67 # edges 287 1379U=triu(A) time: 0.000024 sec 1380L=tril(A) time: 0.000007 sec 1381 1382------------------------------------- dot product method: 1383# triangles 120 1384L*U' time (dot): 0.000032 sec 1385tricount time: 0.000035 sec (dot product method) 1386tri+prep time: 0.000065 sec (incl time to compute L and U) 1387compute C time: 0.000032 sec 1388reduce (C) time: 0.000003 sec 1389rate 4.41 million edges/sec (incl time for U=triu(A)) 1390rate 8.25 million edges/sec (just tricount itself) 1391L*U' time (dot): 0.000011 sec (nthreads: 2 speedup 2.86698) 1392tricount time: 0.000012 sec (dot product method) 1393tri+prep time: 0.000043 sec (incl time to compute L and U) 1394compute C time: 0.000011 sec 1395reduce (C) time: 0.000001 sec 1396rate 6.75 million edges/sec (incl time for U=triu(A)) 1397rate 23.50 million edges/sec (just tricount itself) 1398L*U' time (dot): 0.000008 sec (nthreads: 4 speedup 3.78072) 1399tricount time: 0.000009 sec (dot product method) 1400tri+prep time: 0.000040 sec (incl time to compute L and U) 1401compute C time: 0.000008 sec 1402reduce (C) time: 0.000001 sec 1403rate 7.26 million edges/sec (incl time for U=triu(A)) 1404rate 31.12 million edges/sec (just tricount itself) 1405L*U' time (dot): 0.000007 sec (nthreads: 8 speedup 4.41994) 1406tricount time: 0.000008 sec (dot product method) 1407tri+prep time: 0.000038 sec (incl time to compute L and U) 1408compute C time: 0.000007 sec 1409reduce (C) time: 0.000001 sec 1410rate 7.52 million edges/sec (incl time for U=triu(A)) 1411rate 36.41 million edges/sec (just tricount itself) 1412L*U' time (dot): 0.000012 sec 1413tricount time: 0.000013 sec (dot product method) 1414tri+prep time: 0.000044 sec (incl time to compute L and U) 1415compute C time: 0.000012 sec 1416reduce (C) time: 0.000001 sec 1417rate 6.56 million edges/sec (incl time for U=triu(A)) 1418rate 21.38 million edges/sec (just tricount itself) 1419L*U' time (dot): 0.000010 sec (nthreads: 2 speedup 1.20171) 1420tricount time: 0.000011 sec (dot product method) 1421tri+prep time: 0.000041 sec (incl time to compute L and U) 1422compute C time: 0.000010 sec 1423reduce (C) time: 0.000001 sec 1424rate 6.94 million edges/sec (incl time for U=triu(A)) 1425rate 25.94 million edges/sec (just tricount itself) 1426L*U' time (dot): 0.000009 sec (nthreads: 4 speedup 1.38725) 1427tricount time: 0.000010 sec (dot product method) 1428tri+prep time: 0.000040 sec (incl time to compute L and U) 1429compute C time: 0.000009 sec 1430reduce (C) time: 0.000001 sec 1431rate 7.19 million edges/sec (incl time for U=triu(A)) 1432rate 29.86 million edges/sec (just tricount itself) 1433L*U' time (dot): 0.000008 sec (nthreads: 8 speedup 1.50481) 1434tricount time: 0.000009 sec (dot product method) 1435tri+prep time: 0.000039 sec (incl time to compute L and U) 1436compute C time: 0.000008 sec 1437reduce (C) time: 0.000001 sec 1438rate 7.33 million edges/sec (incl time for U=triu(A)) 1439rate 32.36 million edges/sec (just tricount itself) 1440 1441----------------------------------- saxpy method: 1442C<L>=L*L time (saxpy): 0.000033 sec 1443tricount time: 0.000034 sec (saxpy method) 1444tri+prep time: 0.000041 sec (incl time to compute L) 1445compute C time: 0.000033 sec 1446reduce (C) time: 0.000001 sec 1447rate 7.05 million edges/sec (incl time for L=tril(A)) 1448rate 8.43 million edges/sec (just tricount itself) 1449C<L>=L*L time (saxpy): 0.000016 sec (nthreads: 2 speedup 2.01304) 1450tricount time: 0.000017 sec (saxpy method) 1451tri+prep time: 0.000024 sec (incl time to compute L) 1452compute C time: 0.000016 sec 1453reduce (C) time: 0.000001 sec 1454rate 12.08 million edges/sec (incl time for L=tril(A)) 1455rate 16.79 million edges/sec (just tricount itself) 1456C<L>=L*L time (saxpy): 0.000014 sec (nthreads: 4 speedup 2.44745) 1457tricount time: 0.000014 sec (saxpy method) 1458tri+prep time: 0.000021 sec (incl time to compute L) 1459compute C time: 0.000014 sec 1460reduce (C) time: 0.000001 sec 1461rate 13.81 million edges/sec (incl time for L=tril(A)) 1462rate 20.31 million edges/sec (just tricount itself) 1463C<L>=L*L time (saxpy): 0.000014 sec (nthreads: 8 speedup 2.43042) 1464tricount time: 0.000014 sec (saxpy method) 1465tri+prep time: 0.000021 sec (incl time to compute L) 1466compute C time: 0.000014 sec 1467reduce (C) time: 0.000001 sec 1468rate 13.68 million edges/sec (incl time for L=tril(A)) 1469rate 20.04 million edges/sec (just tricount itself) 1470 1471-------------------------------------------------------------- 1472Wathen: nx 200 ny 200 n 120801 nz 1762400 method 0, time: 0.166 sec 1473 1474total time to read A matrix: 0.168617 sec 1475 1476n 120801 # edges 881200 1477U=triu(A) time: 0.002978 sec 1478L=tril(A) time: 0.002865 sec 1479 1480------------------------------------- dot product method: 1481# triangles 2160400 1482L*U' time (dot): 0.029921 sec 1483tricount time: 0.032427 sec (dot product method) 1484tri+prep time: 0.038270 sec (incl time to compute L and U) 1485compute C time: 0.029921 sec 1486reduce (C) time: 0.002506 sec 1487rate 23.03 million edges/sec (incl time for U=triu(A)) 1488rate 27.18 million edges/sec (just tricount itself) 1489L*U' time (dot): 0.011246 sec (nthreads: 2 speedup 2.66055) 1490tricount time: 0.012483 sec (dot product method) 1491tri+prep time: 0.018327 sec (incl time to compute L and U) 1492compute C time: 0.011246 sec 1493reduce (C) time: 0.001237 sec 1494rate 48.08 million edges/sec (incl time for U=triu(A)) 1495rate 70.59 million edges/sec (just tricount itself) 1496L*U' time (dot): 0.008601 sec (nthreads: 4 speedup 3.47878) 1497tricount time: 0.009216 sec (dot product method) 1498tri+prep time: 0.015059 sec (incl time to compute L and U) 1499compute C time: 0.008601 sec 1500reduce (C) time: 0.000615 sec 1501rate 58.52 million edges/sec (incl time for U=triu(A)) 1502rate 95.62 million edges/sec (just tricount itself) 1503L*U' time (dot): 0.006418 sec (nthreads: 8 speedup 4.66232) 1504tricount time: 0.006907 sec (dot product method) 1505tri+prep time: 0.012751 sec (incl time to compute L and U) 1506compute C time: 0.006418 sec 1507reduce (C) time: 0.000490 sec 1508rate 69.11 million edges/sec (incl time for U=triu(A)) 1509rate 127.57 million edges/sec (just tricount itself) 1510L*U' time (dot): 0.023521 sec 1511tricount time: 0.026194 sec (dot product method) 1512tri+prep time: 0.032037 sec (incl time to compute L and U) 1513compute C time: 0.023521 sec 1514reduce (C) time: 0.002673 sec 1515rate 27.51 million edges/sec (incl time for U=triu(A)) 1516rate 33.64 million edges/sec (just tricount itself) 1517L*U' time (dot): 0.011493 sec (nthreads: 2 speedup 2.04665) 1518tricount time: 0.012796 sec (dot product method) 1519tri+prep time: 0.018639 sec (incl time to compute L and U) 1520compute C time: 0.011493 sec 1521reduce (C) time: 0.001303 sec 1522rate 47.28 million edges/sec (incl time for U=triu(A)) 1523rate 68.87 million edges/sec (just tricount itself) 1524L*U' time (dot): 0.006705 sec (nthreads: 4 speedup 3.50821) 1525tricount time: 0.007384 sec (dot product method) 1526tri+prep time: 0.013228 sec (incl time to compute L and U) 1527compute C time: 0.006705 sec 1528reduce (C) time: 0.000680 sec 1529rate 66.62 million edges/sec (incl time for U=triu(A)) 1530rate 119.34 million edges/sec (just tricount itself) 1531L*U' time (dot): 0.009763 sec (nthreads: 8 speedup 2.4093) 1532tricount time: 0.010669 sec (dot product method) 1533tri+prep time: 0.016512 sec (incl time to compute L and U) 1534compute C time: 0.009763 sec 1535reduce (C) time: 0.000906 sec 1536rate 53.37 million edges/sec (incl time for U=triu(A)) 1537rate 82.59 million edges/sec (just tricount itself) 1538 1539----------------------------------- saxpy method: 1540C<L>=L*L time (saxpy): 0.026566 sec 1541tricount time: 0.028627 sec (saxpy method) 1542tri+prep time: 0.031492 sec (incl time to compute L) 1543compute C time: 0.026566 sec 1544reduce (C) time: 0.002061 sec 1545rate 27.98 million edges/sec (incl time for L=tril(A)) 1546rate 30.78 million edges/sec (just tricount itself) 1547C<L>=L*L time (saxpy): 0.022131 sec (nthreads: 2 speedup 1.2004) 1548tricount time: 0.023573 sec (saxpy method) 1549tri+prep time: 0.026438 sec (incl time to compute L) 1550compute C time: 0.022131 sec 1551reduce (C) time: 0.001442 sec 1552rate 33.33 million edges/sec (incl time for L=tril(A)) 1553rate 37.38 million edges/sec (just tricount itself) 1554C<L>=L*L time (saxpy): 0.011668 sec (nthreads: 4 speedup 2.27679) 1555tricount time: 0.012288 sec (saxpy method) 1556tri+prep time: 0.015153 sec (incl time to compute L) 1557compute C time: 0.011668 sec 1558reduce (C) time: 0.000620 sec 1559rate 58.15 million edges/sec (incl time for L=tril(A)) 1560rate 71.71 million edges/sec (just tricount itself) 1561C<L>=L*L time (saxpy): 0.016841 sec (nthreads: 8 speedup 1.57751) 1562tricount time: 0.018066 sec (saxpy method) 1563tri+prep time: 0.020931 sec (incl time to compute L) 1564compute C time: 0.016841 sec 1565reduce (C) time: 0.001225 sec 1566rate 42.10 million edges/sec (incl time for L=tril(A)) 1567rate 48.78 million edges/sec (just tricount itself) 1568 1569-------------------------------------------------------------- 1570random 10000 by 10000, nz: 199768, method 0 time 0.027 sec 1571 1572total time to read A matrix: 0.028004 sec 1573 1574n 10000 # edges 99884 1575U=triu(A) time: 0.000362 sec 1576L=tril(A) time: 0.000234 sec 1577 1578------------------------------------- dot product method: 1579# triangles 1357 1580L*U' time (dot): 0.011664 sec 1581tricount time: 0.011843 sec (dot product method) 1582tri+prep time: 0.012439 sec (incl time to compute L and U) 1583compute C time: 0.011664 sec 1584reduce (C) time: 0.000179 sec 1585rate 8.03 million edges/sec (incl time for U=triu(A)) 1586rate 8.43 million edges/sec (just tricount itself) 1587L*U' time (dot): 0.005893 sec (nthreads: 2 speedup 1.97936) 1588tricount time: 0.006089 sec (dot product method) 1589tri+prep time: 0.006686 sec (incl time to compute L and U) 1590compute C time: 0.005893 sec 1591reduce (C) time: 0.000196 sec 1592rate 14.94 million edges/sec (incl time for U=triu(A)) 1593rate 16.40 million edges/sec (just tricount itself) 1594L*U' time (dot): 0.003444 sec (nthreads: 4 speedup 3.387) 1595tricount time: 0.003609 sec (dot product method) 1596tri+prep time: 0.004206 sec (incl time to compute L and U) 1597compute C time: 0.003444 sec 1598reduce (C) time: 0.000165 sec 1599rate 23.75 million edges/sec (incl time for U=triu(A)) 1600rate 27.67 million edges/sec (just tricount itself) 1601L*U' time (dot): 0.002678 sec (nthreads: 8 speedup 4.35594) 1602tricount time: 0.002885 sec (dot product method) 1603tri+prep time: 0.003481 sec (incl time to compute L and U) 1604compute C time: 0.002678 sec 1605reduce (C) time: 0.000207 sec 1606rate 28.69 million edges/sec (incl time for U=triu(A)) 1607rate 34.63 million edges/sec (just tricount itself) 1608L*U' time (dot): 0.012640 sec 1609tricount time: 0.012779 sec (dot product method) 1610tri+prep time: 0.013376 sec (incl time to compute L and U) 1611compute C time: 0.012640 sec 1612reduce (C) time: 0.000139 sec 1613rate 7.47 million edges/sec (incl time for U=triu(A)) 1614rate 7.82 million edges/sec (just tricount itself) 1615L*U' time (dot): 0.004852 sec (nthreads: 2 speedup 2.60499) 1616tricount time: 0.004964 sec (dot product method) 1617tri+prep time: 0.005561 sec (incl time to compute L and U) 1618compute C time: 0.004852 sec 1619reduce (C) time: 0.000112 sec 1620rate 17.96 million edges/sec (incl time for U=triu(A)) 1621rate 20.12 million edges/sec (just tricount itself) 1622L*U' time (dot): 0.002892 sec (nthreads: 4 speedup 4.37131) 1623tricount time: 0.002976 sec (dot product method) 1624tri+prep time: 0.003572 sec (incl time to compute L and U) 1625compute C time: 0.002892 sec 1626reduce (C) time: 0.000085 sec 1627rate 27.96 million edges/sec (incl time for U=triu(A)) 1628rate 33.56 million edges/sec (just tricount itself) 1629L*U' time (dot): 0.004180 sec (nthreads: 8 speedup 3.02402) 1630tricount time: 0.004349 sec (dot product method) 1631tri+prep time: 0.004946 sec (incl time to compute L and U) 1632compute C time: 0.004180 sec 1633reduce (C) time: 0.000169 sec 1634rate 20.20 million edges/sec (incl time for U=triu(A)) 1635rate 22.97 million edges/sec (just tricount itself) 1636 1637----------------------------------- saxpy method: 1638C<L>=L*L time (saxpy): 0.003369 sec 1639tricount time: 0.003378 sec (saxpy method) 1640tri+prep time: 0.003612 sec (incl time to compute L) 1641compute C time: 0.003369 sec 1642reduce (C) time: 0.000009 sec 1643rate 27.65 million edges/sec (incl time for L=tril(A)) 1644rate 29.57 million edges/sec (just tricount itself) 1645C<L>=L*L time (saxpy): 0.002108 sec (nthreads: 2 speedup 1.59874) 1646tricount time: 0.002115 sec (saxpy method) 1647tri+prep time: 0.002349 sec (incl time to compute L) 1648compute C time: 0.002108 sec 1649reduce (C) time: 0.000007 sec 1650rate 42.53 million edges/sec (incl time for L=tril(A)) 1651rate 47.24 million edges/sec (just tricount itself) 1652C<L>=L*L time (saxpy): 0.001484 sec (nthreads: 4 speedup 2.27006) 1653tricount time: 0.001490 sec (saxpy method) 1654tri+prep time: 0.001724 sec (incl time to compute L) 1655compute C time: 0.001484 sec 1656reduce (C) time: 0.000006 sec 1657rate 57.92 million edges/sec (incl time for L=tril(A)) 1658rate 67.02 million edges/sec (just tricount itself) 1659C<L>=L*L time (saxpy): 0.005230 sec (nthreads: 8 speedup 0.644297) 1660tricount time: 0.005238 sec (saxpy method) 1661tri+prep time: 0.005472 sec (incl time to compute L) 1662compute C time: 0.005230 sec 1663reduce (C) time: 0.000008 sec 1664rate 18.25 million edges/sec (incl time for L=tril(A)) 1665rate 19.07 million edges/sec (just tricount itself) 1666 1667-------------------------------------------------------------- 1668random 10000 by 10000, nz: 199768, method 1 time 0.017 sec 1669 1670total time to read A matrix: 0.017593 sec 1671 1672n 10000 # edges 99884 1673U=triu(A) time: 0.000807 sec 1674L=tril(A) time: 0.000660 sec 1675 1676------------------------------------- dot product method: 1677# triangles 1357 1678L*U' time (dot): 0.014539 sec 1679tricount time: 0.014694 sec (dot product method) 1680tri+prep time: 0.016162 sec (incl time to compute L and U) 1681compute C time: 0.014539 sec 1682reduce (C) time: 0.000156 sec 1683rate 6.18 million edges/sec (incl time for U=triu(A)) 1684rate 6.80 million edges/sec (just tricount itself) 1685L*U' time (dot): 0.005467 sec (nthreads: 2 speedup 2.65947) 1686tricount time: 0.005544 sec (dot product method) 1687tri+prep time: 0.007011 sec (incl time to compute L and U) 1688compute C time: 0.005467 sec 1689reduce (C) time: 0.000077 sec 1690rate 14.25 million edges/sec (incl time for U=triu(A)) 1691rate 18.02 million edges/sec (just tricount itself) 1692L*U' time (dot): 0.003181 sec (nthreads: 4 speedup 4.57045) 1693tricount time: 0.003257 sec (dot product method) 1694tri+prep time: 0.004724 sec (incl time to compute L and U) 1695compute C time: 0.003181 sec 1696reduce (C) time: 0.000076 sec 1697rate 21.14 million edges/sec (incl time for U=triu(A)) 1698rate 30.67 million edges/sec (just tricount itself) 1699L*U' time (dot): 0.002482 sec (nthreads: 8 speedup 5.85712) 1700tricount time: 0.002570 sec (dot product method) 1701tri+prep time: 0.004037 sec (incl time to compute L and U) 1702compute C time: 0.002482 sec 1703reduce (C) time: 0.000088 sec 1704rate 24.74 million edges/sec (incl time for U=triu(A)) 1705rate 38.87 million edges/sec (just tricount itself) 1706L*U' time (dot): 0.013548 sec 1707tricount time: 0.013735 sec (dot product method) 1708tri+prep time: 0.015202 sec (incl time to compute L and U) 1709compute C time: 0.013548 sec 1710reduce (C) time: 0.000187 sec 1711rate 6.57 million edges/sec (incl time for U=triu(A)) 1712rate 7.27 million edges/sec (just tricount itself) 1713L*U' time (dot): 0.005883 sec (nthreads: 2 speedup 2.30282) 1714tricount time: 0.006074 sec (dot product method) 1715tri+prep time: 0.007542 sec (incl time to compute L and U) 1716compute C time: 0.005883 sec 1717reduce (C) time: 0.000191 sec 1718rate 13.24 million edges/sec (incl time for U=triu(A)) 1719rate 16.44 million edges/sec (just tricount itself) 1720L*U' time (dot): 0.003481 sec (nthreads: 4 speedup 3.89243) 1721tricount time: 0.003664 sec (dot product method) 1722tri+prep time: 0.005131 sec (incl time to compute L and U) 1723compute C time: 0.003481 sec 1724reduce (C) time: 0.000183 sec 1725rate 19.47 million edges/sec (incl time for U=triu(A)) 1726rate 27.26 million edges/sec (just tricount itself) 1727L*U' time (dot): 0.002990 sec (nthreads: 8 speedup 4.53042) 1728tricount time: 0.003239 sec (dot product method) 1729tri+prep time: 0.004706 sec (incl time to compute L and U) 1730compute C time: 0.002990 sec 1731reduce (C) time: 0.000249 sec 1732rate 21.22 million edges/sec (incl time for U=triu(A)) 1733rate 30.84 million edges/sec (just tricount itself) 1734 1735----------------------------------- saxpy method: 1736C<L>=L*L time (saxpy): 0.004303 sec 1737tricount time: 0.004314 sec (saxpy method) 1738tri+prep time: 0.004974 sec (incl time to compute L) 1739compute C time: 0.004303 sec 1740reduce (C) time: 0.000011 sec 1741rate 20.08 million edges/sec (incl time for L=tril(A)) 1742rate 23.15 million edges/sec (just tricount itself) 1743C<L>=L*L time (saxpy): 0.002223 sec (nthreads: 2 speedup 1.93561) 1744tricount time: 0.002230 sec (saxpy method) 1745tri+prep time: 0.002890 sec (incl time to compute L) 1746compute C time: 0.002223 sec 1747reduce (C) time: 0.000008 sec 1748rate 34.56 million edges/sec (incl time for L=tril(A)) 1749rate 44.78 million edges/sec (just tricount itself) 1750C<L>=L*L time (saxpy): 0.001506 sec (nthreads: 4 speedup 2.8577) 1751tricount time: 0.001511 sec (saxpy method) 1752tri+prep time: 0.002171 sec (incl time to compute L) 1753compute C time: 0.001506 sec 1754reduce (C) time: 0.000005 sec 1755rate 46.01 million edges/sec (incl time for L=tril(A)) 1756rate 66.10 million edges/sec (just tricount itself) 1757C<L>=L*L time (saxpy): 0.001319 sec (nthreads: 8 speedup 3.26257) 1758tricount time: 0.001325 sec (saxpy method) 1759tri+prep time: 0.001985 sec (incl time to compute L) 1760compute C time: 0.001319 sec 1761reduce (C) time: 0.000006 sec 1762rate 50.33 million edges/sec (incl time for L=tril(A)) 1763rate 75.40 million edges/sec (just tricount itself) 1764 1765-------------------------------------------------------------- 1766random 100000 by 100000, nz: 19980330, method 0 time 2.496 sec 1767 1768total time to read A matrix: 2.523121 sec 1769 1770n 100000 # edges 9990165 1771U=triu(A) time: 0.018984 sec 1772L=tril(A) time: 0.020506 sec 1773 1774------------------------------------- dot product method: 1775# triangles 1330131 1776L*U' time (dot): 10.037756 sec 1777tricount time: 10.065191 sec (dot product method) 1778tri+prep time: 10.104681 sec (incl time to compute L and U) 1779compute C time: 10.037756 sec 1780reduce (C) time: 0.027436 sec 1781rate 0.99 million edges/sec (incl time for U=triu(A)) 1782rate 0.99 million edges/sec (just tricount itself) 1783L*U' time (dot): 5.268859 sec (nthreads: 2 speedup 1.90511) 1784tricount time: 5.287288 sec (dot product method) 1785tri+prep time: 5.326778 sec (incl time to compute L and U) 1786compute C time: 5.268859 sec 1787reduce (C) time: 0.018428 sec 1788rate 1.88 million edges/sec (incl time for U=triu(A)) 1789rate 1.89 million edges/sec (just tricount itself) 1790L*U' time (dot): 3.710080 sec (nthreads: 4 speedup 2.70554) 1791tricount time: 3.724638 sec (dot product method) 1792tri+prep time: 3.764128 sec (incl time to compute L and U) 1793compute C time: 3.710080 sec 1794reduce (C) time: 0.014557 sec 1795rate 2.65 million edges/sec (incl time for U=triu(A)) 1796rate 2.68 million edges/sec (just tricount itself) 1797L*U' time (dot): 2.599948 sec (nthreads: 8 speedup 3.86075) 1798tricount time: 2.615894 sec (dot product method) 1799tri+prep time: 2.655384 sec (incl time to compute L and U) 1800compute C time: 2.599948 sec 1801reduce (C) time: 0.015946 sec 1802rate 3.76 million edges/sec (incl time for U=triu(A)) 1803rate 3.82 million edges/sec (just tricount itself) 1804L*U' time (dot): 10.711924 sec 1805tricount time: 10.739376 sec (dot product method) 1806tri+prep time: 10.778866 sec (incl time to compute L and U) 1807compute C time: 10.711924 sec 1808reduce (C) time: 0.027452 sec 1809rate 0.93 million edges/sec (incl time for U=triu(A)) 1810rate 0.93 million edges/sec (just tricount itself) 1811L*U' time (dot): 6.001916 sec (nthreads: 2 speedup 1.78475) 1812tricount time: 6.019951 sec (dot product method) 1813tri+prep time: 6.059441 sec (incl time to compute L and U) 1814compute C time: 6.001916 sec 1815reduce (C) time: 0.018035 sec 1816rate 1.65 million edges/sec (incl time for U=triu(A)) 1817rate 1.66 million edges/sec (just tricount itself) 1818L*U' time (dot): 3.885379 sec (nthreads: 4 speedup 2.75698) 1819tricount time: 3.899436 sec (dot product method) 1820tri+prep time: 3.938926 sec (incl time to compute L and U) 1821compute C time: 3.885379 sec 1822reduce (C) time: 0.014056 sec 1823rate 2.54 million edges/sec (incl time for U=triu(A)) 1824rate 2.56 million edges/sec (just tricount itself) 1825L*U' time (dot): 2.636954 sec (nthreads: 8 speedup 4.06223) 1826tricount time: 2.652757 sec (dot product method) 1827tri+prep time: 2.692247 sec (incl time to compute L and U) 1828compute C time: 2.636954 sec 1829reduce (C) time: 0.015802 sec 1830rate 3.71 million edges/sec (incl time for U=triu(A)) 1831rate 3.77 million edges/sec (just tricount itself) 1832 1833----------------------------------- saxpy method: 1834C<L>=L*L time (saxpy): 5.043538 sec 1835tricount time: 5.049500 sec (saxpy method) 1836tri+prep time: 5.070006 sec (incl time to compute L) 1837compute C time: 5.043538 sec 1838reduce (C) time: 0.005962 sec 1839rate 1.97 million edges/sec (incl time for L=tril(A)) 1840rate 1.98 million edges/sec (just tricount itself) 1841C<L>=L*L time (saxpy): 3.138652 sec (nthreads: 2 speedup 1.60691) 1842tricount time: 3.141832 sec (saxpy method) 1843tri+prep time: 3.162339 sec (incl time to compute L) 1844compute C time: 3.138652 sec 1845reduce (C) time: 0.003181 sec 1846rate 3.16 million edges/sec (incl time for L=tril(A)) 1847rate 3.18 million edges/sec (just tricount itself) 1848C<L>=L*L time (saxpy): 1.656651 sec (nthreads: 4 speedup 3.04442) 1849tricount time: 1.658998 sec (saxpy method) 1850tri+prep time: 1.679504 sec (incl time to compute L) 1851compute C time: 1.656651 sec 1852reduce (C) time: 0.002346 sec 1853rate 5.95 million edges/sec (incl time for L=tril(A)) 1854rate 6.02 million edges/sec (just tricount itself) 1855C<L>=L*L time (saxpy): 1.782248 sec (nthreads: 8 speedup 2.82988) 1856tricount time: 1.783162 sec (saxpy method) 1857tri+prep time: 1.803668 sec (incl time to compute L) 1858compute C time: 1.782248 sec 1859reduce (C) time: 0.000914 sec 1860rate 5.54 million edges/sec (incl time for L=tril(A)) 1861rate 5.60 million edges/sec (just tricount itself) 1862 1863-------------------------------------------------------------- 1864random 100000 by 100000, nz: 19980330, method 1 time 1.848 sec 1865 1866total time to read A matrix: 1.877002 sec 1867 1868n 100000 # edges 9990165 1869U=triu(A) time: 0.019503 sec 1870L=tril(A) time: 0.026980 sec 1871 1872------------------------------------- dot product method: 1873# triangles 1330131 1874L*U' time (dot): 9.740372 sec 1875tricount time: 9.767869 sec (dot product method) 1876tri+prep time: 9.814351 sec (incl time to compute L and U) 1877compute C time: 9.740372 sec 1878reduce (C) time: 0.027498 sec 1879rate 1.02 million edges/sec (incl time for U=triu(A)) 1880rate 1.02 million edges/sec (just tricount itself) 1881L*U' time (dot): 5.274905 sec (nthreads: 2 speedup 1.84655) 1882tricount time: 5.291900 sec (dot product method) 1883tri+prep time: 5.338383 sec (incl time to compute L and U) 1884compute C time: 5.274905 sec 1885reduce (C) time: 0.016996 sec 1886rate 1.87 million edges/sec (incl time for U=triu(A)) 1887rate 1.89 million edges/sec (just tricount itself) 1888L*U' time (dot): 3.592400 sec (nthreads: 4 speedup 2.71138) 1889tricount time: 3.607020 sec (dot product method) 1890tri+prep time: 3.653502 sec (incl time to compute L and U) 1891compute C time: 3.592400 sec 1892reduce (C) time: 0.014619 sec 1893rate 2.73 million edges/sec (incl time for U=triu(A)) 1894rate 2.77 million edges/sec (just tricount itself) 1895L*U' time (dot): 2.499505 sec (nthreads: 8 speedup 3.89692) 1896tricount time: 2.515554 sec (dot product method) 1897tri+prep time: 2.562037 sec (incl time to compute L and U) 1898compute C time: 2.499505 sec 1899reduce (C) time: 0.016050 sec 1900rate 3.90 million edges/sec (incl time for U=triu(A)) 1901rate 3.97 million edges/sec (just tricount itself) 1902L*U' time (dot): 10.443740 sec 1903tricount time: 10.472049 sec (dot product method) 1904tri+prep time: 10.518531 sec (incl time to compute L and U) 1905compute C time: 10.443740 sec 1906reduce (C) time: 0.028309 sec 1907rate 0.95 million edges/sec (incl time for U=triu(A)) 1908rate 0.95 million edges/sec (just tricount itself) 1909L*U' time (dot): 5.903907 sec (nthreads: 2 speedup 1.76895) 1910tricount time: 5.922306 sec (dot product method) 1911tri+prep time: 5.968789 sec (incl time to compute L and U) 1912compute C time: 5.903907 sec 1913reduce (C) time: 0.018399 sec 1914rate 1.67 million edges/sec (incl time for U=triu(A)) 1915rate 1.69 million edges/sec (just tricount itself) 1916L*U' time (dot): 3.949521 sec (nthreads: 4 speedup 2.64431) 1917tricount time: 3.962544 sec (dot product method) 1918tri+prep time: 4.009026 sec (incl time to compute L and U) 1919compute C time: 3.949521 sec 1920reduce (C) time: 0.013023 sec 1921rate 2.49 million edges/sec (incl time for U=triu(A)) 1922rate 2.52 million edges/sec (just tricount itself) 1923L*U' time (dot): 2.604668 sec (nthreads: 8 speedup 4.00962) 1924tricount time: 2.620189 sec (dot product method) 1925tri+prep time: 2.666671 sec (incl time to compute L and U) 1926compute C time: 2.604668 sec 1927reduce (C) time: 0.015521 sec 1928rate 3.75 million edges/sec (incl time for U=triu(A)) 1929rate 3.81 million edges/sec (just tricount itself) 1930 1931----------------------------------- saxpy method: 1932C<L>=L*L time (saxpy): 4.623672 sec 1933tricount time: 4.629221 sec (saxpy method) 1934tri+prep time: 4.656200 sec (incl time to compute L) 1935compute C time: 4.623672 sec 1936reduce (C) time: 0.005549 sec 1937rate 2.15 million edges/sec (incl time for L=tril(A)) 1938rate 2.16 million edges/sec (just tricount itself) 1939C<L>=L*L time (saxpy): 2.570878 sec (nthreads: 2 speedup 1.79848) 1940tricount time: 2.574308 sec (saxpy method) 1941tri+prep time: 2.601288 sec (incl time to compute L) 1942compute C time: 2.570878 sec 1943reduce (C) time: 0.003430 sec 1944rate 3.84 million edges/sec (incl time for L=tril(A)) 1945rate 3.88 million edges/sec (just tricount itself) 1946C<L>=L*L time (saxpy): 1.508288 sec (nthreads: 4 speedup 3.06551) 1947tricount time: 1.510577 sec (saxpy method) 1948tri+prep time: 1.537557 sec (incl time to compute L) 1949compute C time: 1.508288 sec 1950reduce (C) time: 0.002289 sec 1951rate 6.50 million edges/sec (incl time for L=tril(A)) 1952rate 6.61 million edges/sec (just tricount itself) 1953C<L>=L*L time (saxpy): 1.565095 sec (nthreads: 8 speedup 2.95424) 1954tricount time: 1.578662 sec (saxpy method) 1955tri+prep time: 1.605642 sec (incl time to compute L) 1956compute C time: 1.565095 sec 1957reduce (C) time: 0.013567 sec 1958rate 6.22 million edges/sec (incl time for L=tril(A)) 1959rate 6.33 million edges/sec (just tricount itself) 1960 1961