Lines Matching refs:sec

2 Wathen: nx 4 ny 4 n 65 nz 752 method 0, time: 0.000 sec
4 total time to read A matrix: 0.000254 sec
7 U=triu(A) time: 0.000028 sec
8 L=tril(A) time: 0.000007 sec
12 L*U' time (dot): 0.000057 sec
13 tricount time: 0.000061 sec (dot product method)
14 tri+prep time: 0.000097 sec (incl time to compute L and U)
15 compute C time: 0.000057 sec
16 reduce (C) time: 0.000005 sec
17 rate 3.89 million edges/sec (incl time for U=triu(A))
18 rate 6.13 million edges/sec (just tricount itself)
19 L*U' time (dot): 0.000014 sec (nthreads: 2 speedup 4.15599)
20 tricount time: 0.000015 sec (dot product method)
21 tri+prep time: 0.000051 sec (incl time to compute L and U)
22 compute C time: 0.000014 sec
23 reduce (C) time: 0.000002 sec
24 rate 7.42 million edges/sec (incl time for U=triu(A))
25 rate 24.38 million edges/sec (just tricount itself)
26 L*U' time (dot): 0.000011 sec (nthreads: 4 speedup 5.05627)
27 tricount time: 0.000013 sec (dot product method)
28 tri+prep time: 0.000048 sec (incl time to compute L and U)
29 compute C time: 0.000011 sec
30 reduce (C) time: 0.000002 sec
31 rate 7.84 million edges/sec (incl time for U=triu(A))
32 rate 29.50 million edges/sec (just tricount itself)
33 L*U' time (dot): 0.000011 sec (nthreads: 8 speedup 5.0794)
34 tricount time: 0.000013 sec (dot product method)
35 tri+prep time: 0.000048 sec (incl time to compute L and U)
36 compute C time: 0.000011 sec
37 reduce (C) time: 0.000002 sec
38 rate 7.85 million edges/sec (incl time for U=triu(A))
39 rate 29.65 million edges/sec (just tricount itself)
40 L*U' time (dot): 0.000016 sec
41 tricount time: 0.000018 sec (dot product method)
42 tri+prep time: 0.000053 sec (incl time to compute L and U)
43 compute C time: 0.000016 sec
44 reduce (C) time: 0.000002 sec
45 rate 7.07 million edges/sec (incl time for U=triu(A))
46 rate 20.95 million edges/sec (just tricount itself)
47 L*U' time (dot): 0.000012 sec (nthreads: 2 speedup 1.36091)
48 tricount time: 0.000013 sec (dot product method)
49 tri+prep time: 0.000049 sec (incl time to compute L and U)
50 compute C time: 0.000012 sec
51 reduce (C) time: 0.000002 sec
52 rate 7.72 million edges/sec (incl time for U=triu(A))
53 rate 27.87 million edges/sec (just tricount itself)
54 L*U' time (dot): 0.000012 sec (nthreads: 4 speedup 1.38573)
55 tricount time: 0.000013 sec (dot product method)
56 tri+prep time: 0.000049 sec (incl time to compute L and U)
57 compute C time: 0.000012 sec
58 reduce (C) time: 0.000002 sec
59 rate 7.75 million edges/sec (incl time for U=triu(A))
60 rate 28.26 million edges/sec (just tricount itself)
61 L*U' time (dot): 0.000012 sec (nthreads: 8 speedup 1.39356)
62 tricount time: 0.000013 sec (dot product method)
63 tri+prep time: 0.000048 sec (incl time to compute L and U)
64 compute C time: 0.000012 sec
65 reduce (C) time: 0.000002 sec
66 rate 7.77 million edges/sec (incl time for U=triu(A))
67 rate 28.54 million edges/sec (just tricount itself)
70 C<L>=L*L time (saxpy): 0.000051 sec
71 tricount time: 0.000052 sec (saxpy method)
72 tri+prep time: 0.000060 sec (incl time to compute L)
73 compute C time: 0.000051 sec
74 reduce (C) time: 0.000002 sec
75 rate 6.31 million edges/sec (incl time for L=tril(A))
76 rate 7.17 million edges/sec (just tricount itself)
77 C<L>=L*L time (saxpy): 0.000025 sec (nthreads: 2 speedup 2.00982)
78 tricount time: 0.000027 sec (saxpy method)
79 tri+prep time: 0.000034 sec (incl time to compute L)
80 compute C time: 0.000025 sec
81 reduce (C) time: 0.000001 sec
82 rate 11.11 million edges/sec (incl time for L=tril(A))
83 rate 14.05 million edges/sec (just tricount itself)
84 C<L>=L*L time (saxpy): 0.000022 sec (nthreads: 4 speedup 2.34526)
85 tricount time: 0.000023 sec (saxpy method)
86 tri+prep time: 0.000030 sec (incl time to compute L)
87 compute C time: 0.000022 sec
88 reduce (C) time: 0.000001 sec
89 rate 12.45 million edges/sec (incl time for L=tril(A))
90 rate 16.29 million edges/sec (just tricount itself)
91 C<L>=L*L time (saxpy): 0.000022 sec (nthreads: 8 speedup 2.29399)
92 tricount time: 0.000024 sec (saxpy method)
93 tri+prep time: 0.000031 sec (incl time to compute L)
94 compute C time: 0.000022 sec
95 reduce (C) time: 0.000002 sec
96 rate 12.23 million edges/sec (incl time for L=tril(A))
97 rate 15.90 million edges/sec (just tricount itself)
100 random 5 by 5, nz: 18, method 1 time 0.000 sec
102 total time to read A matrix: 0.000101 sec
105 U=triu(A) time: 0.000024 sec
106 L=tril(A) time: 0.000003 sec
110 L*U' time (dot): 0.000024 sec
111 tricount time: 0.000027 sec (dot product method)
112 tri+prep time: 0.000054 sec (incl time to compute L and U)
113 compute C time: 0.000024 sec
114 reduce (C) time: 0.000003 sec
115 rate 0.17 million edges/sec (incl time for U=triu(A))
116 rate 0.33 million edges/sec (just tricount itself)
117 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 4.80951)
118 tricount time: 0.000006 sec (dot product method)
119 tri+prep time: 0.000033 sec (incl time to compute L and U)
120 compute C time: 0.000005 sec
121 reduce (C) time: 0.000001 sec
122 rate 0.27 million edges/sec (incl time for U=triu(A))
123 rate 1.57 million edges/sec (just tricount itself)
124 L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 6.80389)
125 tricount time: 0.000004 sec (dot product method)
126 tri+prep time: 0.000031 sec (incl time to compute L and U)
127 compute C time: 0.000004 sec
128 reduce (C) time: 0.000000 sec
129 rate 0.29 million edges/sec (incl time for U=triu(A))
130 rate 2.26 million edges/sec (just tricount itself)
131 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 6.99995)
132 tricount time: 0.000004 sec (dot product method)
133 tri+prep time: 0.000031 sec (incl time to compute L and U)
134 compute C time: 0.000003 sec
135 reduce (C) time: 0.000000 sec
136 rate 0.29 million edges/sec (incl time for U=triu(A))
137 rate 2.34 million edges/sec (just tricount itself)
138 L*U' time (dot): 0.000005 sec
139 tricount time: 0.000005 sec (dot product method)
140 tri+prep time: 0.000032 sec (incl time to compute L and U)
141 compute C time: 0.000005 sec
142 reduce (C) time: 0.000001 sec
143 rate 0.28 million edges/sec (incl time for U=triu(A))
144 rate 1.74 million edges/sec (just tricount itself)
145 L*U' time (dot): 0.000004 sec (nthreads: 2 speedup 1.28269)
146 tricount time: 0.000004 sec (dot product method)
147 tri+prep time: 0.000031 sec (incl time to compute L and U)
148 compute C time: 0.000004 sec
149 reduce (C) time: 0.000000 sec
150 rate 0.29 million edges/sec (incl time for U=triu(A))
151 rate 2.27 million edges/sec (just tricount itself)
152 L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 1.23356)
153 tricount time: 0.000004 sec (dot product method)
154 tri+prep time: 0.000031 sec (incl time to compute L and U)
155 compute C time: 0.000004 sec
156 reduce (C) time: 0.000000 sec
157 rate 0.29 million edges/sec (incl time for U=triu(A))
158 rate 2.18 million edges/sec (just tricount itself)
159 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.37847)
160 tricount time: 0.000004 sec (dot product method)
161 tri+prep time: 0.000031 sec (incl time to compute L and U)
162 compute C time: 0.000003 sec
163 reduce (C) time: 0.000000 sec
164 rate 0.29 million edges/sec (incl time for U=triu(A))
165 rate 2.42 million edges/sec (just tricount itself)
168 C<L>=L*L time (saxpy): 0.000012 sec
169 tricount time: 0.000013 sec (saxpy method)
170 tri+prep time: 0.000016 sec (incl time to compute L)
171 compute C time: 0.000012 sec
172 reduce (C) time: 0.000001 sec
173 rate 0.56 million edges/sec (incl time for L=tril(A))
174 rate 0.69 million edges/sec (just tricount itself)
175 C<L>=L*L time (saxpy): 0.000003 sec (nthreads: 2 speedup 4.12281)
176 tricount time: 0.000003 sec (saxpy method)
177 tri+prep time: 0.000007 sec (incl time to compute L)
178 compute C time: 0.000003 sec
179 reduce (C) time: 0.000000 sec
180 rate 1.37 million edges/sec (incl time for L=tril(A))
181 rate 2.68 million edges/sec (just tricount itself)
182 C<L>=L*L time (saxpy): 0.000002 sec (nthreads: 4 speedup 5.12891)
183 tricount time: 0.000003 sec (saxpy method)
184 tri+prep time: 0.000006 sec (incl time to compute L)
185 compute C time: 0.000002 sec
186 reduce (C) time: 0.000000 sec
187 rate 1.50 million edges/sec (incl time for L=tril(A))
188 rate 3.25 million edges/sec (just tricount itself)
189 C<L>=L*L time (saxpy): 0.000003 sec (nthreads: 8 speedup 3.73818)
190 tricount time: 0.000004 sec (saxpy method)
191 tri+prep time: 0.000007 sec (incl time to compute L)
192 compute C time: 0.000003 sec
193 reduce (C) time: 0.000000 sec
194 rate 1.28 million edges/sec (incl time for L=tril(A))
195 rate 2.37 million edges/sec (just tricount itself)
200 total time to read A matrix: 0.000136 sec
203 U=triu(A) time: 0.000023 sec
204 L=tril(A) time: 0.000004 sec
208 L*U' time (dot): 0.000032 sec
209 tricount time: 0.000034 sec (dot product method)
210 tri+prep time: 0.000061 sec (incl time to compute L and U)
211 compute C time: 0.000032 sec
212 reduce (C) time: 0.000002 sec
213 rate 0.00 million edges/sec (incl time for U=triu(A))
214 rate 0.00 million edges/sec (just tricount itself)
215 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 6.70786)
216 tricount time: 0.000005 sec (dot product method)
217 tri+prep time: 0.000032 sec (incl time to compute L and U)
218 compute C time: 0.000005 sec
219 reduce (C) time: 0.000001 sec
220 rate 0.00 million edges/sec (incl time for U=triu(A))
221 rate 0.00 million edges/sec (just tricount itself)
222 L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 8.34564)
223 tricount time: 0.000004 sec (dot product method)
224 tri+prep time: 0.000031 sec (incl time to compute L and U)
225 compute C time: 0.000004 sec
226 reduce (C) time: 0.000000 sec
227 rate 0.00 million edges/sec (incl time for U=triu(A))
228 rate 0.00 million edges/sec (just tricount itself)
229 L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 8.56331)
230 tricount time: 0.000004 sec (dot product method)
231 tri+prep time: 0.000031 sec (incl time to compute L and U)
232 compute C time: 0.000004 sec
233 reduce (C) time: 0.000000 sec
234 rate 0.00 million edges/sec (incl time for U=triu(A))
235 rate 0.00 million edges/sec (just tricount itself)
236 L*U' time (dot): 0.000003 sec
237 tricount time: 0.000003 sec (dot product method)
238 tri+prep time: 0.000030 sec (incl time to compute L and U)
239 compute C time: 0.000003 sec
240 reduce (C) time: 0.000000 sec
241 rate 0.00 million edges/sec (incl time for U=triu(A))
242 rate 0.00 million edges/sec (just tricount itself)
243 L*U' time (dot): 0.000002 sec (nthreads: 2 speedup 1.32441)
244 tricount time: 0.000002 sec (dot product method)
245 tri+prep time: 0.000029 sec (incl time to compute L and U)
246 compute C time: 0.000002 sec
247 reduce (C) time: 0.000000 sec
248 rate 0.00 million edges/sec (incl time for U=triu(A))
249 rate 0.00 million edges/sec (just tricount itself)
250 L*U' time (dot): 0.000002 sec (nthreads: 4 speedup 1.26397)
251 tricount time: 0.000002 sec (dot product method)
252 tri+prep time: 0.000029 sec (incl time to compute L and U)
253 compute C time: 0.000002 sec
254 reduce (C) time: 0.000000 sec
255 rate 0.00 million edges/sec (incl time for U=triu(A))
256 rate 0.00 million edges/sec (just tricount itself)
257 L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 0.710361)
258 tricount time: 0.000004 sec (dot product method)
259 tri+prep time: 0.000031 sec (incl time to compute L and U)
260 compute C time: 0.000004 sec
261 reduce (C) time: 0.000000 sec
262 rate 0.00 million edges/sec (incl time for U=triu(A))
263 rate 0.00 million edges/sec (just tricount itself)
266 C<L>=L*L time (saxpy): 0.000026 sec
267 tricount time: 0.000026 sec (saxpy method)
268 tri+prep time: 0.000031 sec (incl time to compute L)
269 compute C time: 0.000026 sec
270 reduce (C) time: 0.000001 sec
271 rate 0.00 million edges/sec (incl time for L=tril(A))
272 rate 0.00 million edges/sec (just tricount itself)
273 C<L>=L*L time (saxpy): 0.000006 sec (nthreads: 2 speedup 3.9542)
274 tricount time: 0.000007 sec (saxpy method)
275 tri+prep time: 0.000011 sec (incl time to compute L)
276 compute C time: 0.000006 sec
277 reduce (C) time: 0.000000 sec
278 rate 0.00 million edges/sec (incl time for L=tril(A))
279 rate 0.00 million edges/sec (just tricount itself)
280 C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 4 speedup 6.0321)
281 tricount time: 0.000005 sec (saxpy method)
282 tri+prep time: 0.000009 sec (incl time to compute L)
283 compute C time: 0.000004 sec
284 reduce (C) time: 0.000000 sec
285 rate 0.00 million edges/sec (incl time for L=tril(A))
286 rate 0.00 million edges/sec (just tricount itself)
287 C<L>=L*L time (saxpy): 0.000005 sec (nthreads: 8 speedup 4.8894)
288 tricount time: 0.000006 sec (saxpy method)
289 tri+prep time: 0.000010 sec (incl time to compute L)
290 compute C time: 0.000005 sec
291 reduce (C) time: 0.000000 sec
292 rate 0.00 million edges/sec (incl time for L=tril(A))
293 rate 0.00 million edges/sec (just tricount itself)
298 total time to read A matrix: 0.000182 sec
301 U=triu(A) time: 0.000042 sec
302 L=tril(A) time: 0.000005 sec
306 L*U' time (dot): 0.000035 sec
307 tricount time: 0.000038 sec (dot product method)
308 tri+prep time: 0.000085 sec (incl time to compute L and U)
309 compute C time: 0.000035 sec
310 reduce (C) time: 0.000003 sec
311 rate 0.02 million edges/sec (incl time for U=triu(A))
312 rate 0.05 million edges/sec (just tricount itself)
313 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 6.83555)
314 tricount time: 0.000006 sec (dot product method)
315 tri+prep time: 0.000053 sec (incl time to compute L and U)
316 compute C time: 0.000005 sec
317 reduce (C) time: 0.000001 sec
318 rate 0.04 million edges/sec (incl time for U=triu(A))
319 rate 0.35 million edges/sec (just tricount itself)
320 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 11.2006)
321 tricount time: 0.000003 sec (dot product method)
322 tri+prep time: 0.000050 sec (incl time to compute L and U)
323 compute C time: 0.000003 sec
324 reduce (C) time: 0.000000 sec
325 rate 0.04 million edges/sec (incl time for U=triu(A))
326 rate 0.57 million edges/sec (just tricount itself)
327 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 10.4373)
328 tricount time: 0.000004 sec (dot product method)
329 tri+prep time: 0.000051 sec (incl time to compute L and U)
330 compute C time: 0.000003 sec
331 reduce (C) time: 0.000000 sec
332 rate 0.04 million edges/sec (incl time for U=triu(A))
333 rate 0.54 million edges/sec (just tricount itself)
334 L*U' time (dot): 0.000006 sec
335 tricount time: 0.000007 sec (dot product method)
336 tri+prep time: 0.000054 sec (incl time to compute L and U)
337 compute C time: 0.000006 sec
338 reduce (C) time: 0.000001 sec
339 rate 0.04 million edges/sec (incl time for U=triu(A))
340 rate 0.30 million edges/sec (just tricount itself)
341 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 1.09848)
342 tricount time: 0.000006 sec (dot product method)
343 tri+prep time: 0.000053 sec (incl time to compute L and U)
344 compute C time: 0.000005 sec
345 reduce (C) time: 0.000000 sec
346 rate 0.04 million edges/sec (incl time for U=triu(A))
347 rate 0.34 million edges/sec (just tricount itself)
348 L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 1.68923)
349 tricount time: 0.000004 sec (dot product method)
350 tri+prep time: 0.000051 sec (incl time to compute L and U)
351 compute C time: 0.000004 sec
352 reduce (C) time: 0.000000 sec
353 rate 0.04 million edges/sec (incl time for U=triu(A))
354 rate 0.51 million edges/sec (just tricount itself)
355 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.97691)
356 tricount time: 0.000003 sec (dot product method)
357 tri+prep time: 0.000050 sec (incl time to compute L and U)
358 compute C time: 0.000003 sec
359 reduce (C) time: 0.000000 sec
360 rate 0.04 million edges/sec (incl time for U=triu(A))
361 rate 0.60 million edges/sec (just tricount itself)
364 C<L>=L*L time (saxpy): 0.000022 sec
365 tricount time: 0.000023 sec (saxpy method)
366 tri+prep time: 0.000028 sec (incl time to compute L)
367 compute C time: 0.000022 sec
368 reduce (C) time: 0.000001 sec
369 rate 0.07 million edges/sec (incl time for L=tril(A))
370 rate 0.09 million edges/sec (just tricount itself)
371 C<L>=L*L time (saxpy): 0.000008 sec (nthreads: 2 speedup 2.89783)
372 tricount time: 0.000008 sec (saxpy method)
373 tri+prep time: 0.000013 sec (incl time to compute L)
374 compute C time: 0.000008 sec
375 reduce (C) time: 0.000001 sec
376 rate 0.15 million edges/sec (incl time for L=tril(A))
377 rate 0.24 million edges/sec (just tricount itself)
378 C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 4 speedup 4.92595)
379 tricount time: 0.000005 sec (saxpy method)
380 tri+prep time: 0.000010 sec (incl time to compute L)
381 compute C time: 0.000004 sec
382 reduce (C) time: 0.000000 sec
383 rate 0.20 million edges/sec (incl time for L=tril(A))
384 rate 0.42 million edges/sec (just tricount itself)
385 C<L>=L*L time (saxpy): 0.000008 sec (nthreads: 8 speedup 2.76205)
386 tricount time: 0.000008 sec (saxpy method)
387 tri+prep time: 0.000014 sec (incl time to compute L)
388 compute C time: 0.000008 sec
389 reduce (C) time: 0.000001 sec
390 rate 0.15 million edges/sec (incl time for L=tril(A))
391 rate 0.24 million edges/sec (just tricount itself)
396 total time to read A matrix: 0.000188 sec
399 U=triu(A) time: 0.000030 sec
400 L=tril(A) time: 0.000004 sec
404 L*U' time (dot): 0.000028 sec
405 tricount time: 0.000031 sec (dot product method)
406 tri+prep time: 0.000066 sec (incl time to compute L and U)
407 compute C time: 0.000028 sec
408 reduce (C) time: 0.000003 sec
409 rate 0.08 million edges/sec (incl time for U=triu(A))
410 rate 0.16 million edges/sec (just tricount itself)
411 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 5.73686)
412 tricount time: 0.000006 sec (dot product method)
413 tri+prep time: 0.000040 sec (incl time to compute L and U)
414 compute C time: 0.000005 sec
415 reduce (C) time: 0.000001 sec
416 rate 0.13 million edges/sec (incl time for U=triu(A))
417 rate 0.90 million edges/sec (just tricount itself)
418 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 8.37966)
419 tricount time: 0.000004 sec (dot product method)
420 tri+prep time: 0.000038 sec (incl time to compute L and U)
421 compute C time: 0.000003 sec
422 reduce (C) time: 0.000000 sec
423 rate 0.13 million edges/sec (incl time for U=triu(A))
424 rate 1.32 million edges/sec (just tricount itself)
425 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 8.68117)
426 tricount time: 0.000004 sec (dot product method)
427 tri+prep time: 0.000038 sec (incl time to compute L and U)
428 compute C time: 0.000003 sec
429 reduce (C) time: 0.000000 sec
430 rate 0.13 million edges/sec (incl time for U=triu(A))
431 rate 1.35 million edges/sec (just tricount itself)
432 L*U' time (dot): 0.000004 sec
433 tricount time: 0.000004 sec (dot product method)
434 tri+prep time: 0.000038 sec (incl time to compute L and U)
435 compute C time: 0.000004 sec
436 reduce (C) time: 0.000001 sec
437 rate 0.13 million edges/sec (incl time for U=triu(A))
438 rate 1.17 million edges/sec (just tricount itself)
439 L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.20088)
440 tricount time: 0.000004 sec (dot product method)
441 tri+prep time: 0.000038 sec (incl time to compute L and U)
442 compute C time: 0.000003 sec
443 reduce (C) time: 0.000000 sec
444 rate 0.13 million edges/sec (incl time for U=triu(A))
445 rate 1.40 million edges/sec (just tricount itself)
446 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.08018)
447 tricount time: 0.000004 sec (dot product method)
448 tri+prep time: 0.000038 sec (incl time to compute L and U)
449 compute C time: 0.000003 sec
450 reduce (C) time: 0.000000 sec
451 rate 0.13 million edges/sec (incl time for U=triu(A))
452 rate 1.27 million edges/sec (just tricount itself)
453 L*U' time (dot): 0.000002 sec (nthreads: 8 speedup 1.68545)
454 tricount time: 0.000003 sec (dot product method)
455 tri+prep time: 0.000037 sec (incl time to compute L and U)
456 compute C time: 0.000002 sec
457 reduce (C) time: 0.000001 sec
458 rate 0.14 million edges/sec (incl time for U=triu(A))
459 rate 1.79 million edges/sec (just tricount itself)
462 C<L>=L*L time (saxpy): 0.000008 sec
463 tricount time: 0.000009 sec (saxpy method)
464 tri+prep time: 0.000013 sec (incl time to compute L)
465 compute C time: 0.000008 sec
466 reduce (C) time: 0.000000 sec
467 rate 0.40 million edges/sec (incl time for L=tril(A))
468 rate 0.57 million edges/sec (just tricount itself)
469 C<L>=L*L time (saxpy): 0.000002 sec (nthreads: 2 speedup 3.33955)
470 tricount time: 0.000003 sec (saxpy method)
471 tri+prep time: 0.000007 sec (incl time to compute L)
472 compute C time: 0.000002 sec
473 reduce (C) time: 0.000000 sec
474 rate 0.75 million edges/sec (incl time for L=tril(A))
475 rate 1.75 million edges/sec (just tricount itself)
476 C<L>=L*L time (saxpy): 0.000002 sec (nthreads: 4 speedup 4.10873)
477 tricount time: 0.000002 sec (saxpy method)
478 tri+prep time: 0.000006 sec (incl time to compute L)
479 compute C time: 0.000002 sec
480 reduce (C) time: 0.000000 sec
481 rate 0.81 million edges/sec (incl time for L=tril(A))
482 rate 2.11 million edges/sec (just tricount itself)
483 C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 8 speedup 1.98804)
484 tricount time: 0.000005 sec (saxpy method)
485 tri+prep time: 0.000008 sec (incl time to compute L)
486 compute C time: 0.000004 sec
487 reduce (C) time: 0.000000 sec
488 rate 0.59 million edges/sec (incl time for L=tril(A))
489 rate 1.07 million edges/sec (just tricount itself)
494 total time to read A matrix: 0.000242 sec
497 U=triu(A) time: 0.000018 sec
498 L=tril(A) time: 0.000003 sec
502 L*U' time (dot): 0.000033 sec
503 tricount time: 0.000035 sec (dot product method)
504 tri+prep time: 0.000057 sec (incl time to compute L and U)
505 compute C time: 0.000033 sec
506 reduce (C) time: 0.000003 sec
507 rate 0.14 million edges/sec (incl time for U=triu(A))
508 rate 0.23 million edges/sec (just tricount itself)
509 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 6.33494)
510 tricount time: 0.000006 sec (dot product method)
511 tri+prep time: 0.000027 sec (incl time to compute L and U)
512 compute C time: 0.000005 sec
513 reduce (C) time: 0.000001 sec
514 rate 0.29 million edges/sec (incl time for U=triu(A))
515 rate 1.40 million edges/sec (just tricount itself)
516 L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 8.49876)
517 tricount time: 0.000004 sec (dot product method)
518 tri+prep time: 0.000026 sec (incl time to compute L and U)
519 compute C time: 0.000004 sec
520 reduce (C) time: 0.000000 sec
521 rate 0.31 million edges/sec (incl time for U=triu(A))
522 rate 1.90 million edges/sec (just tricount itself)
523 L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 9.15498)
524 tricount time: 0.000004 sec (dot product method)
525 tri+prep time: 0.000025 sec (incl time to compute L and U)
526 compute C time: 0.000004 sec
527 reduce (C) time: 0.000000 sec
528 rate 0.31 million edges/sec (incl time for U=triu(A))
529 rate 2.04 million edges/sec (just tricount itself)
530 L*U' time (dot): 0.000004 sec
531 tricount time: 0.000005 sec (dot product method)
532 tri+prep time: 0.000026 sec (incl time to compute L and U)
533 compute C time: 0.000004 sec
534 reduce (C) time: 0.000000 sec
535 rate 0.31 million edges/sec (incl time for U=triu(A))
536 rate 1.71 million edges/sec (just tricount itself)
537 L*U' time (dot): 0.000004 sec (nthreads: 2 speedup 1.09382)
538 tricount time: 0.000004 sec (dot product method)
539 tri+prep time: 0.000026 sec (incl time to compute L and U)
540 compute C time: 0.000004 sec
541 reduce (C) time: 0.000000 sec
542 rate 0.31 million edges/sec (incl time for U=triu(A))
543 rate 1.89 million edges/sec (just tricount itself)
544 L*U' time (dot): 0.000004 sec (nthreads: 4 speedup 1.21016)
545 tricount time: 0.000004 sec (dot product method)
546 tri+prep time: 0.000025 sec (incl time to compute L and U)
547 compute C time: 0.000004 sec
548 reduce (C) time: 0.000000 sec
549 rate 0.32 million edges/sec (incl time for U=triu(A))
550 rate 2.07 million edges/sec (just tricount itself)
551 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.26045)
552 tricount time: 0.000004 sec (dot product method)
553 tri+prep time: 0.000025 sec (incl time to compute L and U)
554 compute C time: 0.000003 sec
555 reduce (C) time: 0.000000 sec
556 rate 0.32 million edges/sec (incl time for U=triu(A))
557 rate 2.16 million edges/sec (just tricount itself)
560 C<L>=L*L time (saxpy): 0.000020 sec
561 tricount time: 0.000020 sec (saxpy method)
562 tri+prep time: 0.000024 sec (incl time to compute L)
563 compute C time: 0.000020 sec
564 reduce (C) time: 0.000000 sec
565 rate 0.34 million edges/sec (incl time for L=tril(A))
566 rate 0.40 million edges/sec (just tricount itself)
567 C<L>=L*L time (saxpy): 0.000005 sec (nthreads: 2 speedup 4.04016)
568 tricount time: 0.000005 sec (saxpy method)
569 tri+prep time: 0.000009 sec (incl time to compute L)
570 compute C time: 0.000005 sec
571 reduce (C) time: 0.000000 sec
572 rate 0.93 million edges/sec (incl time for L=tril(A))
573 rate 1.53 million edges/sec (just tricount itself)
574 C<L>=L*L time (saxpy): 0.000004 sec (nthreads: 4 speedup 4.86751)
575 tricount time: 0.000004 sec (saxpy method)
576 tri+prep time: 0.000008 sec (incl time to compute L)
577 compute C time: 0.000004 sec
578 reduce (C) time: 0.000000 sec
579 rate 1.03 million edges/sec (incl time for L=tril(A))
580 rate 1.83 million edges/sec (just tricount itself)
581 C<L>=L*L time (saxpy): 0.000005 sec (nthreads: 8 speedup 3.92911)
582 tricount time: 0.000005 sec (saxpy method)
583 tri+prep time: 0.000009 sec (incl time to compute L)
584 compute C time: 0.000005 sec
585 reduce (C) time: 0.000000 sec
586 rate 0.91 million edges/sec (incl time for L=tril(A))
587 rate 1.47 million edges/sec (just tricount itself)
592 total time to read A matrix: 0.000394 sec
595 U=triu(A) time: 0.000025 sec
596 L=tril(A) time: 0.000008 sec
600 L*U' time (dot): 0.000036 sec
601 tricount time: 0.000039 sec (dot product method)
602 tri+prep time: 0.000071 sec (incl time to compute L and U)
603 compute C time: 0.000036 sec
604 reduce (C) time: 0.000002 sec
605 rate 6.16 million edges/sec (incl time for U=triu(A))
606 rate 11.25 million edges/sec (just tricount itself)
607 L*U' time (dot): 0.000010 sec (nthreads: 2 speedup 3.71621)
608 tricount time: 0.000011 sec (dot product method)
609 tri+prep time: 0.000043 sec (incl time to compute L and U)
610 compute C time: 0.000010 sec
611 reduce (C) time: 0.000001 sec
612 rate 10.23 million edges/sec (incl time for U=triu(A))
613 rate 41.16 million edges/sec (just tricount itself)
614 L*U' time (dot): 0.000008 sec (nthreads: 4 speedup 4.44921)
615 tricount time: 0.000009 sec (dot product method)
616 tri+prep time: 0.000041 sec (incl time to compute L and U)
617 compute C time: 0.000008 sec
618 reduce (C) time: 0.000001 sec
619 rate 10.67 million edges/sec (incl time for U=triu(A))
620 rate 49.47 million edges/sec (just tricount itself)
621 L*U' time (dot): 0.000008 sec (nthreads: 8 speedup 4.49582)
622 tricount time: 0.000009 sec (dot product method)
623 tri+prep time: 0.000041 sec (incl time to compute L and U)
624 compute C time: 0.000008 sec
625 reduce (C) time: 0.000001 sec
626 rate 10.69 million edges/sec (incl time for U=triu(A))
627 rate 49.93 million edges/sec (just tricount itself)
628 L*U' time (dot): 0.000008 sec
629 tricount time: 0.000008 sec (dot product method)
630 tri+prep time: 0.000041 sec (incl time to compute L and U)
631 compute C time: 0.000008 sec
632 reduce (C) time: 0.000001 sec
633 rate 10.80 million edges/sec (incl time for U=triu(A))
634 rate 52.26 million edges/sec (just tricount itself)
635 L*U' time (dot): 0.000007 sec (nthreads: 2 speedup 1.081)
636 tricount time: 0.000008 sec (dot product method)
637 tri+prep time: 0.000040 sec (incl time to compute L and U)
638 compute C time: 0.000007 sec
639 reduce (C) time: 0.000001 sec
640 rate 10.99 million edges/sec (incl time for U=triu(A))
641 rate 57.20 million edges/sec (just tricount itself)
642 L*U' time (dot): 0.000007 sec (nthreads: 4 speedup 1.09474)
643 tricount time: 0.000008 sec (dot product method)
644 tri+prep time: 0.000040 sec (incl time to compute L and U)
645 compute C time: 0.000007 sec
646 reduce (C) time: 0.000001 sec
647 rate 11.03 million edges/sec (incl time for U=triu(A))
648 rate 58.17 million edges/sec (just tricount itself)
649 L*U' time (dot): 0.000007 sec (nthreads: 8 speedup 1.109)
650 tricount time: 0.000007 sec (dot product method)
651 tri+prep time: 0.000040 sec (incl time to compute L and U)
652 compute C time: 0.000007 sec
653 reduce (C) time: 0.000001 sec
654 rate 11.05 million edges/sec (incl time for U=triu(A))
655 rate 58.79 million edges/sec (just tricount itself)
658 C<L>=L*L time (saxpy): 0.000048 sec
659 tricount time: 0.000048 sec (saxpy method)
660 tri+prep time: 0.000056 sec (incl time to compute L)
661 compute C time: 0.000048 sec
662 reduce (C) time: 0.000001 sec
663 rate 7.82 million edges/sec (incl time for L=tril(A))
664 rate 9.06 million edges/sec (just tricount itself)
665 C<L>=L*L time (saxpy): 0.000028 sec (nthreads: 2 speedup 1.71429)
666 tricount time: 0.000028 sec (saxpy method)
667 tri+prep time: 0.000036 sec (incl time to compute L)
668 compute C time: 0.000028 sec
669 reduce (C) time: 0.000000 sec
670 rate 12.16 million edges/sec (incl time for L=tril(A))
671 rate 15.46 million edges/sec (just tricount itself)
672 C<L>=L*L time (saxpy): 0.000027 sec (nthreads: 4 speedup 1.75412)
673 tricount time: 0.000028 sec (saxpy method)
674 tri+prep time: 0.000035 sec (incl time to compute L)
675 compute C time: 0.000027 sec
676 reduce (C) time: 0.000000 sec
677 rate 12.39 million edges/sec (incl time for L=tril(A))
678 rate 15.83 million edges/sec (just tricount itself)
679 C<L>=L*L time (saxpy): 0.000030 sec (nthreads: 8 speedup 1.59233)
680 tricount time: 0.000030 sec (saxpy method)
681 tri+prep time: 0.000038 sec (incl time to compute L)
682 compute C time: 0.000030 sec
683 reduce (C) time: 0.000000 sec
684 rate 11.49 million edges/sec (incl time for L=tril(A))
685 rate 14.39 million edges/sec (just tricount itself)
690 total time to read A matrix: 0.000287 sec
693 U=triu(A) time: 0.000028 sec
694 L=tril(A) time: 0.000009 sec
698 L*U' time (dot): 0.000043 sec
699 tricount time: 0.000047 sec (dot product method)
700 tri+prep time: 0.000084 sec (incl time to compute L and U)
701 compute C time: 0.000043 sec
702 reduce (C) time: 0.000003 sec
703 rate 2.10 million edges/sec (incl time for U=triu(A))
704 rate 3.78 million edges/sec (just tricount itself)
705 L*U' time (dot): 0.000010 sec (nthreads: 2 speedup 4.16297)
706 tricount time: 0.000012 sec (dot product method)
707 tri+prep time: 0.000049 sec (incl time to compute L and U)
708 compute C time: 0.000010 sec
709 reduce (C) time: 0.000001 sec
710 rate 3.60 million edges/sec (incl time for U=triu(A))
711 rate 14.98 million edges/sec (just tricount itself)
712 L*U' time (dot): 0.000007 sec (nthreads: 4 speedup 5.94283)
713 tricount time: 0.000008 sec (dot product method)
714 tri+prep time: 0.000045 sec (incl time to compute L and U)
715 compute C time: 0.000007 sec
716 reduce (C) time: 0.000001 sec
717 rate 3.88 million edges/sec (incl time for U=triu(A))
718 rate 21.31 million edges/sec (just tricount itself)
719 L*U' time (dot): 0.000011 sec (nthreads: 8 speedup 4.00864)
720 tricount time: 0.000012 sec (dot product method)
721 tri+prep time: 0.000049 sec (incl time to compute L and U)
722 compute C time: 0.000011 sec
723 reduce (C) time: 0.000001 sec
724 rate 3.57 million edges/sec (incl time for U=triu(A))
725 rate 14.40 million edges/sec (just tricount itself)
726 L*U' time (dot): 0.000014 sec
727 tricount time: 0.000015 sec (dot product method)
728 tri+prep time: 0.000052 sec (incl time to compute L and U)
729 compute C time: 0.000014 sec
730 reduce (C) time: 0.000001 sec
731 rate 3.37 million edges/sec (incl time for U=triu(A))
732 rate 11.63 million edges/sec (just tricount itself)
733 L*U' time (dot): 0.000009 sec (nthreads: 2 speedup 1.56815)
734 tricount time: 0.000010 sec (dot product method)
735 tri+prep time: 0.000047 sec (incl time to compute L and U)
736 compute C time: 0.000009 sec
737 reduce (C) time: 0.000001 sec
738 rate 3.76 million edges/sec (incl time for U=triu(A))
739 rate 18.17 million edges/sec (just tricount itself)
740 L*U' time (dot): 0.000011 sec (nthreads: 4 speedup 1.21667)
741 tricount time: 0.000013 sec (dot product method)
742 tri+prep time: 0.000050 sec (incl time to compute L and U)
743 compute C time: 0.000011 sec
744 reduce (C) time: 0.000001 sec
745 rate 3.54 million edges/sec (incl time for U=triu(A))
746 rate 13.86 million edges/sec (just tricount itself)
747 L*U' time (dot): 0.000011 sec (nthreads: 8 speedup 1.20288)
748 tricount time: 0.000013 sec (dot product method)
749 tri+prep time: 0.000050 sec (incl time to compute L and U)
750 compute C time: 0.000011 sec
751 reduce (C) time: 0.000001 sec
752 rate 3.53 million edges/sec (incl time for U=triu(A))
753 rate 13.81 million edges/sec (just tricount itself)
756 C<L>=L*L time (saxpy): 0.000047 sec
757 tricount time: 0.000048 sec (saxpy method)
758 tri+prep time: 0.000057 sec (incl time to compute L)
759 compute C time: 0.000047 sec
760 reduce (C) time: 0.000001 sec
761 rate 3.06 million edges/sec (incl time for L=tril(A))
762 rate 3.66 million edges/sec (just tricount itself)
763 C<L>=L*L time (saxpy): 0.000019 sec (nthreads: 2 speedup 2.466)
764 tricount time: 0.000020 sec (saxpy method)
765 tri+prep time: 0.000029 sec (incl time to compute L)
766 compute C time: 0.000019 sec
767 reduce (C) time: 0.000001 sec
768 rate 6.04 million edges/sec (incl time for L=tril(A))
769 rate 8.93 million edges/sec (just tricount itself)
770 C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 4 speedup 3.50546)
771 tricount time: 0.000014 sec (saxpy method)
772 tri+prep time: 0.000023 sec (incl time to compute L)
773 compute C time: 0.000013 sec
774 reduce (C) time: 0.000001 sec
775 rate 7.51 million edges/sec (incl time for L=tril(A))
776 rate 12.58 million edges/sec (just tricount itself)
777 C<L>=L*L time (saxpy): 0.000015 sec (nthreads: 8 speedup 3.0676)
778 tricount time: 0.000016 sec (saxpy method)
779 tri+prep time: 0.000025 sec (incl time to compute L)
780 compute C time: 0.000015 sec
781 reduce (C) time: 0.000001 sec
782 rate 6.91 million edges/sec (incl time for L=tril(A))
783 rate 10.97 million edges/sec (just tricount itself)
788 total time to read A matrix: 0.073128 sec
791 U=triu(A) time: 0.000225 sec
792 L=tril(A) time: 0.000142 sec
796 L*U' time (dot): 0.013911 sec
797 tricount time: 0.014396 sec (dot product method)
798 tri+prep time: 0.014764 sec (incl time to compute L and U)
799 compute C time: 0.013911 sec
800 reduce (C) time: 0.000486 sec
801 rate 9.67 million edges/sec (incl time for U=triu(A))
802 rate 9.92 million edges/sec (just tricount itself)
803 L*U' time (dot): 0.006919 sec (nthreads: 2 speedup 2.01037)
804 tricount time: 0.007159 sec (dot product method)
805 tri+prep time: 0.007527 sec (incl time to compute L and U)
806 compute C time: 0.006919 sec
807 reduce (C) time: 0.000239 sec
808 rate 18.97 million edges/sec (incl time for U=triu(A))
809 rate 19.94 million edges/sec (just tricount itself)
810 L*U' time (dot): 0.003827 sec (nthreads: 4 speedup 3.63466)
811 tricount time: 0.004121 sec (dot product method)
812 tri+prep time: 0.004488 sec (incl time to compute L and U)
813 compute C time: 0.003827 sec
814 reduce (C) time: 0.000293 sec
815 rate 31.80 million edges/sec (incl time for U=triu(A))
816 rate 34.64 million edges/sec (just tricount itself)
817 L*U' time (dot): 0.005970 sec (nthreads: 8 speedup 2.33004)
818 tricount time: 0.006280 sec (dot product method)
819 tri+prep time: 0.006648 sec (incl time to compute L and U)
820 compute C time: 0.005970 sec
821 reduce (C) time: 0.000310 sec
822 rate 21.47 million edges/sec (incl time for U=triu(A))
823 rate 22.73 million edges/sec (just tricount itself)
824 L*U' time (dot): 0.015373 sec
825 tricount time: 0.015847 sec (dot product method)
826 tri+prep time: 0.016215 sec (incl time to compute L and U)
827 compute C time: 0.015373 sec
828 reduce (C) time: 0.000475 sec
829 rate 8.80 million edges/sec (incl time for U=triu(A))
830 rate 9.01 million edges/sec (just tricount itself)
831 L*U' time (dot): 0.007376 sec (nthreads: 2 speedup 2.08416)
832 tricount time: 0.007622 sec (dot product method)
833 tri+prep time: 0.007989 sec (incl time to compute L and U)
834 compute C time: 0.007376 sec
835 reduce (C) time: 0.000246 sec
836 rate 17.87 million edges/sec (incl time for U=triu(A))
837 rate 18.73 million edges/sec (just tricount itself)
838 L*U' time (dot): 0.004246 sec (nthreads: 4 speedup 3.62042)
839 tricount time: 0.004506 sec (dot product method)
840 tri+prep time: 0.004874 sec (incl time to compute L and U)
841 compute C time: 0.004246 sec
842 reduce (C) time: 0.000260 sec
843 rate 29.29 million edges/sec (incl time for U=triu(A))
844 rate 31.68 million edges/sec (just tricount itself)
845 L*U' time (dot): 0.006729 sec (nthreads: 8 speedup 2.28465)
846 tricount time: 0.007020 sec (dot product method)
847 tri+prep time: 0.007388 sec (incl time to compute L and U)
848 compute C time: 0.006729 sec
849 reduce (C) time: 0.000292 sec
850 rate 19.32 million edges/sec (incl time for U=triu(A))
851 rate 20.33 million edges/sec (just tricount itself)
854 C<L>=L*L time (saxpy): 0.014019 sec
855 tricount time: 0.014413 sec (saxpy method)
856 tri+prep time: 0.014556 sec (incl time to compute L)
857 compute C time: 0.014019 sec
858 reduce (C) time: 0.000394 sec
859 rate 9.81 million edges/sec (incl time for L=tril(A))
860 rate 9.90 million edges/sec (just tricount itself)
861 C<L>=L*L time (saxpy): 0.007036 sec (nthreads: 2 speedup 1.99254)
862 tricount time: 0.007225 sec (saxpy method)
863 tri+prep time: 0.007367 sec (incl time to compute L)
864 compute C time: 0.007036 sec
865 reduce (C) time: 0.000189 sec
866 rate 19.38 million edges/sec (incl time for L=tril(A))
867 rate 19.76 million edges/sec (just tricount itself)
868 C<L>=L*L time (saxpy): 0.004042 sec (nthreads: 4 speedup 3.46866)
869 tricount time: 0.004236 sec (saxpy method)
870 tri+prep time: 0.004378 sec (incl time to compute L)
871 compute C time: 0.004042 sec
872 reduce (C) time: 0.000194 sec
873 rate 32.60 million edges/sec (incl time for L=tril(A))
874 rate 33.70 million edges/sec (just tricount itself)
875 C<L>=L*L time (saxpy): 0.003180 sec (nthreads: 8 speedup 4.40848)
876 tricount time: 0.003398 sec (saxpy method)
877 tri+prep time: 0.003540 sec (incl time to compute L)
878 compute C time: 0.003180 sec
879 reduce (C) time: 0.000218 sec
880 rate 40.32 million edges/sec (incl time for L=tril(A))
881 rate 42.01 million edges/sec (just tricount itself)
886 total time to read A matrix: 0.000637 sec
889 U=triu(A) time: 0.000030 sec
890 L=tril(A) time: 0.000010 sec
894 L*U' time (dot): 0.000067 sec
895 tricount time: 0.000072 sec (dot product method)
896 tri+prep time: 0.000112 sec (incl time to compute L and U)
897 compute C time: 0.000067 sec
898 reduce (C) time: 0.000005 sec
899 rate 6.28 million edges/sec (incl time for U=triu(A))
900 rate 9.71 million edges/sec (just tricount itself)
901 L*U' time (dot): 0.000039 sec (nthreads: 2 speedup 1.74055)
902 tricount time: 0.000042 sec (dot product method)
903 tri+prep time: 0.000081 sec (incl time to compute L and U)
904 compute C time: 0.000039 sec
905 reduce (C) time: 0.000003 sec
906 rate 8.65 million edges/sec (incl time for U=triu(A))
907 rate 16.83 million edges/sec (just tricount itself)
908 L*U' time (dot): 0.000035 sec (nthreads: 4 speedup 1.92135)
909 tricount time: 0.000038 sec (dot product method)
910 tri+prep time: 0.000077 sec (incl time to compute L and U)
911 compute C time: 0.000035 sec
912 reduce (C) time: 0.000003 sec
913 rate 9.09 million edges/sec (incl time for U=triu(A))
914 rate 18.58 million edges/sec (just tricount itself)
915 L*U' time (dot): 0.000032 sec (nthreads: 8 speedup 2.0853)
916 tricount time: 0.000035 sec (dot product method)
917 tri+prep time: 0.000074 sec (incl time to compute L and U)
918 compute C time: 0.000032 sec
919 reduce (C) time: 0.000003 sec
920 rate 9.45 million edges/sec (incl time for U=triu(A))
921 rate 20.15 million edges/sec (just tricount itself)
922 L*U' time (dot): 0.000044 sec
923 tricount time: 0.000047 sec (dot product method)
924 tri+prep time: 0.000086 sec (incl time to compute L and U)
925 compute C time: 0.000044 sec
926 reduce (C) time: 0.000003 sec
927 rate 8.15 million edges/sec (incl time for U=triu(A))
928 rate 15.02 million edges/sec (just tricount itself)
929 L*U' time (dot): 0.000040 sec (nthreads: 2 speedup 1.10621)
930 tricount time: 0.000042 sec (dot product method)
931 tri+prep time: 0.000082 sec (incl time to compute L and U)
932 compute C time: 0.000040 sec
933 reduce (C) time: 0.000003 sec
934 rate 8.59 million edges/sec (incl time for U=triu(A))
935 rate 16.61 million edges/sec (just tricount itself)
936 L*U' time (dot): 0.000050 sec (nthreads: 4 speedup 0.871133)
937 tricount time: 0.000053 sec (dot product method)
938 tri+prep time: 0.000092 sec (incl time to compute L and U)
939 compute C time: 0.000050 sec
940 reduce (C) time: 0.000003 sec
941 rate 7.59 million edges/sec (incl time for U=triu(A))
942 rate 13.24 million edges/sec (just tricount itself)
943 L*U' time (dot): 0.000035 sec (nthreads: 8 speedup 1.24625)
944 tricount time: 0.000038 sec (dot product method)
945 tri+prep time: 0.000077 sec (incl time to compute L and U)
946 compute C time: 0.000035 sec
947 reduce (C) time: 0.000002 sec
948 rate 9.10 million edges/sec (incl time for U=triu(A))
949 rate 18.60 million edges/sec (just tricount itself)
952 C<L>=L*L time (saxpy): 0.000059 sec
953 tricount time: 0.000060 sec (saxpy method)
954 tri+prep time: 0.000070 sec (incl time to compute L)
955 compute C time: 0.000059 sec
956 reduce (C) time: 0.000002 sec
957 rate 10.03 million edges/sec (incl time for L=tril(A))
958 rate 11.64 million edges/sec (just tricount itself)
959 C<L>=L*L time (saxpy): 0.000035 sec (nthreads: 2 speedup 1.664)
960 tricount time: 0.000037 sec (saxpy method)
961 tri+prep time: 0.000046 sec (incl time to compute L)
962 compute C time: 0.000035 sec
963 reduce (C) time: 0.000001 sec
964 rate 15.16 million edges/sec (incl time for L=tril(A))
965 rate 19.14 million edges/sec (just tricount itself)
966 C<L>=L*L time (saxpy): 0.000032 sec (nthreads: 4 speedup 1.80297)
967 tricount time: 0.000034 sec (saxpy method)
968 tri+prep time: 0.000044 sec (incl time to compute L)
969 compute C time: 0.000032 sec
970 reduce (C) time: 0.000001 sec
971 rate 16.10 million edges/sec (incl time for L=tril(A))
972 rate 20.67 million edges/sec (just tricount itself)
973 C<L>=L*L time (saxpy): 0.000032 sec (nthreads: 8 speedup 1.81218)
974 tricount time: 0.000034 sec (saxpy method)
975 tri+prep time: 0.000043 sec (incl time to compute L)
976 compute C time: 0.000032 sec
977 reduce (C) time: 0.000002 sec
978 rate 16.13 million edges/sec (incl time for L=tril(A))
979 rate 20.72 million edges/sec (just tricount itself)
984 total time to read A matrix: 0.000214 sec
987 U=triu(A) time: 0.000016 sec
988 L=tril(A) time: 0.000005 sec
992 L*U' time (dot): 0.000021 sec
993 tricount time: 0.000022 sec (dot product method)
994 tri+prep time: 0.000043 sec (incl time to compute L and U)
995 compute C time: 0.000021 sec
996 reduce (C) time: 0.000001 sec
997 rate 2.86 million edges/sec (incl time for U=triu(A))
998 rate 5.53 million edges/sec (just tricount itself)
999 L*U' time (dot): 0.000004 sec (nthreads: 2 speedup 4.69135)
1000 tricount time: 0.000005 sec (dot product method)
1001 tri+prep time: 0.000026 sec (incl time to compute L and U)
1002 compute C time: 0.000004 sec
1003 reduce (C) time: 0.000000 sec
1004 rate 4.79 million edges/sec (incl time for U=triu(A))
1005 rate 25.15 million edges/sec (just tricount itself)
1006 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 6.61758)
1007 tricount time: 0.000003 sec (dot product method)
1008 tri+prep time: 0.000024 sec (incl time to compute L and U)
1009 compute C time: 0.000003 sec
1010 reduce (C) time: 0.000000 sec
1011 rate 5.07 million edges/sec (incl time for U=triu(A))
1012 rate 35.47 million edges/sec (just tricount itself)
1013 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 7.13425)
1014 tricount time: 0.000003 sec (dot product method)
1015 tri+prep time: 0.000024 sec (incl time to compute L and U)
1016 compute C time: 0.000003 sec
1017 reduce (C) time: 0.000000 sec
1018 rate 5.12 million edges/sec (incl time for U=triu(A))
1019 rate 37.80 million edges/sec (just tricount itself)
1020 L*U' time (dot): 0.000003 sec
1021 tricount time: 0.000004 sec (dot product method)
1022 tri+prep time: 0.000025 sec (incl time to compute L and U)
1023 compute C time: 0.000003 sec
1024 reduce (C) time: 0.000000 sec
1025 rate 4.98 million edges/sec (incl time for U=triu(A))
1026 rate 31.26 million edges/sec (just tricount itself)
1027 L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.15055)
1028 tricount time: 0.000003 sec (dot product method)
1029 tri+prep time: 0.000024 sec (incl time to compute L and U)
1030 compute C time: 0.000003 sec
1031 reduce (C) time: 0.000000 sec
1032 rate 5.09 million edges/sec (incl time for U=triu(A))
1033 rate 36.33 million edges/sec (just tricount itself)
1034 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.26994)
1035 tricount time: 0.000003 sec (dot product method)
1036 tri+prep time: 0.000024 sec (incl time to compute L and U)
1037 compute C time: 0.000003 sec
1038 reduce (C) time: 0.000000 sec
1039 rate 5.16 million edges/sec (incl time for U=triu(A))
1040 rate 39.87 million edges/sec (just tricount itself)
1041 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.35606)
1042 tricount time: 0.000003 sec (dot product method)
1043 tri+prep time: 0.000024 sec (incl time to compute L and U)
1044 compute C time: 0.000003 sec
1045 reduce (C) time: 0.000000 sec
1046 rate 5.19 million edges/sec (incl time for U=triu(A))
1047 rate 42.14 million edges/sec (just tricount itself)
1050 C<L>=L*L time (saxpy): 0.000023 sec
1051 tricount time: 0.000023 sec (saxpy method)
1052 tri+prep time: 0.000028 sec (incl time to compute L)
1053 compute C time: 0.000023 sec
1054 reduce (C) time: 0.000000 sec
1055 rate 4.44 million edges/sec (incl time for L=tril(A))
1056 rate 5.33 million edges/sec (just tricount itself)
1057 C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 2 speedup 1.72664)
1058 tricount time: 0.000013 sec (saxpy method)
1059 tri+prep time: 0.000018 sec (incl time to compute L)
1060 compute C time: 0.000013 sec
1061 reduce (C) time: 0.000000 sec
1062 rate 6.83 million edges/sec (incl time for L=tril(A))
1063 rate 9.19 million edges/sec (just tricount itself)
1064 C<L>=L*L time (saxpy): 0.000012 sec (nthreads: 4 speedup 1.85064)
1065 tricount time: 0.000012 sec (saxpy method)
1066 tri+prep time: 0.000017 sec (incl time to compute L)
1067 compute C time: 0.000012 sec
1068 reduce (C) time: 0.000000 sec
1069 rate 7.20 million edges/sec (incl time for L=tril(A))
1070 rate 9.85 million edges/sec (just tricount itself)
1071 C<L>=L*L time (saxpy): 0.000016 sec (nthreads: 8 speedup 1.40767)
1072 tricount time: 0.000016 sec (saxpy method)
1073 tri+prep time: 0.000021 sec (incl time to compute L)
1074 compute C time: 0.000016 sec
1075 reduce (C) time: 0.000000 sec
1076 rate 5.84 million edges/sec (incl time for L=tril(A))
1077 rate 7.48 million edges/sec (just tricount itself)
1082 total time to read A matrix: 0.000211 sec
1085 U=triu(A) time: 0.000019 sec
1086 L=tril(A) time: 0.000005 sec
1090 L*U' time (dot): 0.000026 sec
1091 tricount time: 0.000027 sec (dot product method)
1092 tri+prep time: 0.000051 sec (incl time to compute L and U)
1093 compute C time: 0.000026 sec
1094 reduce (C) time: 0.000002 sec
1095 rate 2.42 million edges/sec (incl time for U=triu(A))
1096 rate 4.49 million edges/sec (just tricount itself)
1097 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 5.61418)
1098 tricount time: 0.000005 sec (dot product method)
1099 tri+prep time: 0.000028 sec (incl time to compute L and U)
1100 compute C time: 0.000005 sec
1101 reduce (C) time: 0.000000 sec
1102 rate 4.34 million edges/sec (incl time for U=triu(A))
1103 rate 24.47 million edges/sec (just tricount itself)
1104 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 8.20958)
1105 tricount time: 0.000003 sec (dot product method)
1106 tri+prep time: 0.000027 sec (incl time to compute L and U)
1107 compute C time: 0.000003 sec
1108 reduce (C) time: 0.000000 sec
1109 rate 4.59 million edges/sec (incl time for U=triu(A))
1110 rate 35.33 million edges/sec (just tricount itself)
1111 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 8.76997)
1112 tricount time: 0.000003 sec (dot product method)
1113 tri+prep time: 0.000027 sec (incl time to compute L and U)
1114 compute C time: 0.000003 sec
1115 reduce (C) time: 0.000000 sec
1116 rate 4.62 million edges/sec (incl time for U=triu(A))
1117 rate 37.63 million edges/sec (just tricount itself)
1118 L*U' time (dot): 0.000003 sec
1119 tricount time: 0.000004 sec (dot product method)
1120 tri+prep time: 0.000027 sec (incl time to compute L and U)
1121 compute C time: 0.000003 sec
1122 reduce (C) time: 0.000000 sec
1123 rate 4.52 million edges/sec (incl time for U=triu(A))
1124 rate 31.65 million edges/sec (just tricount itself)
1125 L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.0719)
1126 tricount time: 0.000004 sec (dot product method)
1127 tri+prep time: 0.000027 sec (incl time to compute L and U)
1128 compute C time: 0.000003 sec
1129 reduce (C) time: 0.000000 sec
1130 rate 4.57 million edges/sec (incl time for U=triu(A))
1131 rate 34.42 million edges/sec (just tricount itself)
1132 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.18954)
1133 tricount time: 0.000003 sec (dot product method)
1134 tri+prep time: 0.000027 sec (incl time to compute L and U)
1135 compute C time: 0.000003 sec
1136 reduce (C) time: 0.000000 sec
1137 rate 4.62 million edges/sec (incl time for U=triu(A))
1138 rate 37.64 million edges/sec (just tricount itself)
1139 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 1.21036)
1140 tricount time: 0.000003 sec (dot product method)
1141 tri+prep time: 0.000027 sec (incl time to compute L and U)
1142 compute C time: 0.000003 sec
1143 reduce (C) time: 0.000000 sec
1144 rate 4.64 million edges/sec (incl time for U=triu(A))
1145 rate 38.43 million edges/sec (just tricount itself)
1148 C<L>=L*L time (saxpy): 0.000025 sec
1149 tricount time: 0.000025 sec (saxpy method)
1150 tri+prep time: 0.000030 sec (incl time to compute L)
1151 compute C time: 0.000025 sec
1152 reduce (C) time: 0.000000 sec
1153 rate 4.13 million edges/sec (incl time for L=tril(A))
1154 rate 4.87 million edges/sec (just tricount itself)
1155 C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 2 speedup 1.89479)
1156 tricount time: 0.000013 sec (saxpy method)
1157 tri+prep time: 0.000018 sec (incl time to compute L)
1158 compute C time: 0.000013 sec
1159 reduce (C) time: 0.000000 sec
1160 rate 6.84 million edges/sec (incl time for L=tril(A))
1161 rate 9.15 million edges/sec (just tricount itself)
1162 C<L>=L*L time (saxpy): 0.000012 sec (nthreads: 4 speedup 1.99589)
1163 tricount time: 0.000013 sec (saxpy method)
1164 tri+prep time: 0.000017 sec (incl time to compute L)
1165 compute C time: 0.000012 sec
1166 reduce (C) time: 0.000000 sec
1167 rate 7.11 million edges/sec (incl time for L=tril(A))
1168 rate 9.65 million edges/sec (just tricount itself)
1169 C<L>=L*L time (saxpy): 0.000016 sec (nthreads: 8 speedup 1.52101)
1170 tricount time: 0.000017 sec (saxpy method)
1171 tri+prep time: 0.000021 sec (incl time to compute L)
1172 compute C time: 0.000016 sec
1173 reduce (C) time: 0.000000 sec
1174 rate 5.80 million edges/sec (incl time for L=tril(A))
1175 rate 7.39 million edges/sec (just tricount itself)
1180 total time to read A matrix: 0.000179 sec
1183 U=triu(A) time: 0.000017 sec
1184 L=tril(A) time: 0.000004 sec
1188 L*U' time (dot): 0.000021 sec
1189 tricount time: 0.000023 sec (dot product method)
1190 tri+prep time: 0.000044 sec (incl time to compute L and U)
1191 compute C time: 0.000021 sec
1192 reduce (C) time: 0.000002 sec
1193 rate 2.32 million edges/sec (incl time for U=triu(A))
1194 rate 4.46 million edges/sec (just tricount itself)
1195 L*U' time (dot): 0.000005 sec (nthreads: 2 speedup 4.33904)
1196 tricount time: 0.000005 sec (dot product method)
1197 tri+prep time: 0.000026 sec (incl time to compute L and U)
1198 compute C time: 0.000005 sec
1199 reduce (C) time: 0.000001 sec
1200 rate 3.85 million edges/sec (incl time for U=triu(A))
1201 rate 18.71 million edges/sec (just tricount itself)
1202 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 6.50257)
1203 tricount time: 0.000004 sec (dot product method)
1204 tri+prep time: 0.000025 sec (incl time to compute L and U)
1205 compute C time: 0.000003 sec
1206 reduce (C) time: 0.000000 sec
1207 rate 4.13 million edges/sec (incl time for U=triu(A))
1208 rate 28.13 million edges/sec (just tricount itself)
1209 L*U' time (dot): 0.000003 sec (nthreads: 8 speedup 7.15529)
1210 tricount time: 0.000003 sec (dot product method)
1211 tri+prep time: 0.000024 sec (incl time to compute L and U)
1212 compute C time: 0.000003 sec
1213 reduce (C) time: 0.000000 sec
1214 rate 4.19 million edges/sec (incl time for U=triu(A))
1215 rate 30.93 million edges/sec (just tricount itself)
1216 L*U' time (dot): 0.000003 sec
1217 tricount time: 0.000004 sec (dot product method)
1218 tri+prep time: 0.000025 sec (incl time to compute L and U)
1219 compute C time: 0.000003 sec
1220 reduce (C) time: 0.000000 sec
1221 rate 4.13 million edges/sec (incl time for U=triu(A))
1222 rate 27.94 million edges/sec (just tricount itself)
1223 L*U' time (dot): 0.000003 sec (nthreads: 2 speedup 1.08112)
1224 tricount time: 0.000003 sec (dot product method)
1225 tri+prep time: 0.000024 sec (incl time to compute L and U)
1226 compute C time: 0.000003 sec
1227 reduce (C) time: 0.000000 sec
1228 rate 4.18 million edges/sec (incl time for U=triu(A))
1229 rate 30.56 million edges/sec (just tricount itself)
1230 L*U' time (dot): 0.000003 sec (nthreads: 4 speedup 1.16769)
1231 tricount time: 0.000003 sec (dot product method)
1232 tri+prep time: 0.000024 sec (incl time to compute L and U)
1233 compute C time: 0.000003 sec
1234 reduce (C) time: 0.000000 sec
1235 rate 4.22 million edges/sec (incl time for U=triu(A))
1236 rate 32.77 million edges/sec (just tricount itself)
1237 L*U' time (dot): 0.000004 sec (nthreads: 8 speedup 0.821643)
1238 tricount time: 0.000004 sec (dot product method)
1239 tri+prep time: 0.000025 sec (incl time to compute L and U)
1240 compute C time: 0.000004 sec
1241 reduce (C) time: 0.000000 sec
1242 rate 4.01 million edges/sec (incl time for U=triu(A))
1243 rate 23.28 million edges/sec (just tricount itself)
1246 C<L>=L*L time (saxpy): 0.000022 sec
1247 tricount time: 0.000022 sec (saxpy method)
1248 tri+prep time: 0.000026 sec (incl time to compute L)
1249 compute C time: 0.000022 sec
1250 reduce (C) time: 0.000000 sec
1251 rate 3.86 million edges/sec (incl time for L=tril(A))
1252 rate 4.59 million edges/sec (just tricount itself)
1253 C<L>=L*L time (saxpy): 0.000013 sec (nthreads: 2 speedup 1.7295)
1254 tricount time: 0.000013 sec (saxpy method)
1255 tri+prep time: 0.000017 sec (incl time to compute L)
1256 compute C time: 0.000013 sec
1257 reduce (C) time: 0.000000 sec
1258 rate 5.94 million edges/sec (incl time for L=tril(A))
1259 rate 7.86 million edges/sec (just tricount itself)
1260 C<L>=L*L time (saxpy): 0.000012 sec (nthreads: 4 speedup 1.76988)
1261 tricount time: 0.000013 sec (saxpy method)
1262 tri+prep time: 0.000017 sec (incl time to compute L)
1263 compute C time: 0.000012 sec
1264 reduce (C) time: 0.000000 sec
1265 rate 6.06 million edges/sec (incl time for L=tril(A))
1266 rate 8.07 million edges/sec (just tricount itself)
1267 C<L>=L*L time (saxpy): 0.000025 sec (nthreads: 8 speedup 0.885225)
1268 tricount time: 0.000025 sec (saxpy method)
1269 tri+prep time: 0.000029 sec (incl time to compute L)
1270 compute C time: 0.000025 sec
1271 reduce (C) time: 0.000000 sec
1272 rate 3.49 million edges/sec (incl time for L=tril(A))
1273 rate 4.07 million edges/sec (just tricount itself)
1278 total time to read A matrix: 0.029471 sec
1281 U=triu(A) time: 0.000174 sec
1282 L=tril(A) time: 0.000148 sec
1286 L*U' time (dot): 0.000362 sec
1287 tricount time: 0.000395 sec (dot product method)
1288 tri+prep time: 0.000716 sec (incl time to compute L and U)
1289 compute C time: 0.000362 sec
1290 reduce (C) time: 0.000032 sec
1291 rate 69.75 million edges/sec (incl time for U=triu(A))
1292 rate 126.53 million edges/sec (just tricount itself)
1293 L*U' time (dot): 0.000333 sec (nthreads: 2 speedup 1.08962)
1294 tricount time: 0.000363 sec (dot product method)
1295 tri+prep time: 0.000684 sec (incl time to compute L and U)
1296 compute C time: 0.000333 sec
1297 reduce (C) time: 0.000030 sec
1298 rate 72.98 million edges/sec (incl time for U=triu(A))
1299 rate 137.56 million edges/sec (just tricount itself)
1300 L*U' time (dot): 0.000273 sec (nthreads: 4 speedup 1.32884)
1301 tricount time: 0.000318 sec (dot product method)
1302 tri+prep time: 0.000639 sec (incl time to compute L and U)
1303 compute C time: 0.000273 sec
1304 reduce (C) time: 0.000045 sec
1305 rate 78.14 million edges/sec (incl time for U=triu(A))
1306 rate 157.11 million edges/sec (just tricount itself)
1307 L*U' time (dot): 0.002922 sec (nthreads: 8 speedup 0.124043)
1308 tricount time: 0.002957 sec (dot product method)
1309 tri+prep time: 0.003278 sec (incl time to compute L and U)
1310 compute C time: 0.002922 sec
1311 reduce (C) time: 0.000035 sec
1312 rate 15.23 million edges/sec (incl time for U=triu(A))
1313 rate 16.88 million edges/sec (just tricount itself)
1314 L*U' time (dot): 0.000374 sec
1315 tricount time: 0.000402 sec (dot product method)
1316 tri+prep time: 0.000723 sec (incl time to compute L and U)
1317 compute C time: 0.000374 sec
1318 reduce (C) time: 0.000028 sec
1319 rate 69.08 million edges/sec (incl time for U=triu(A))
1320 rate 124.32 million edges/sec (just tricount itself)
1321 L*U' time (dot): 0.000279 sec (nthreads: 2 speedup 1.33844)
1322 tricount time: 0.000307 sec (dot product method)
1323 tri+prep time: 0.000628 sec (incl time to compute L and U)
1324 compute C time: 0.000279 sec
1325 reduce (C) time: 0.000027 sec
1326 rate 79.52 million edges/sec (incl time for U=triu(A))
1327 rate 162.79 million edges/sec (just tricount itself)
1328 L*U' time (dot): 0.000236 sec (nthreads: 4 speedup 1.58021)
1329 tricount time: 0.000266 sec (dot product method)
1330 tri+prep time: 0.000587 sec (incl time to compute L and U)
1331 compute C time: 0.000236 sec
1332 reduce (C) time: 0.000030 sec
1333 rate 85.01 million edges/sec (incl time for U=triu(A))
1334 rate 187.59 million edges/sec (just tricount itself)
1335 L*U' time (dot): 0.001664 sec (nthreads: 8 speedup 0.224596)
1336 tricount time: 0.001696 sec (dot product method)
1337 tri+prep time: 0.002017 sec (incl time to compute L and U)
1338 compute C time: 0.001664 sec
1339 reduce (C) time: 0.000033 sec
1340 rate 24.75 million edges/sec (incl time for U=triu(A))
1341 rate 29.43 million edges/sec (just tricount itself)
1344 C<L>=L*L time (saxpy): 0.000412 sec
1345 tricount time: 0.000413 sec (saxpy method)
1346 tri+prep time: 0.000560 sec (incl time to compute L)
1347 compute C time: 0.000412 sec
1348 reduce (C) time: 0.000001 sec
1349 rate 89.11 million edges/sec (incl time for L=tril(A))
1350 rate 120.99 million edges/sec (just tricount itself)
1351 C<L>=L*L time (saxpy): 0.000348 sec (nthreads: 2 speedup 1.1835)
1352 tricount time: 0.000349 sec (saxpy method)
1353 tri+prep time: 0.000496 sec (incl time to compute L)
1354 compute C time: 0.000348 sec
1355 reduce (C) time: 0.000001 sec
1356 rate 100.62 million edges/sec (incl time for L=tril(A))
1357 rate 143.23 million edges/sec (just tricount itself)
1358 C<L>=L*L time (saxpy): 0.000373 sec (nthreads: 4 speedup 1.10476)
1359 tricount time: 0.000373 sec (saxpy method)
1360 tri+prep time: 0.000521 sec (incl time to compute L)
1361 compute C time: 0.000373 sec
1362 reduce (C) time: 0.000001 sec
1363 rate 95.85 million edges/sec (incl time for L=tril(A))
1364 rate 133.75 million edges/sec (just tricount itself)
1365 C<L>=L*L time (saxpy): 0.000377 sec (nthreads: 8 speedup 1.0916)
1366 tricount time: 0.000378 sec (saxpy method)
1367 tri+prep time: 0.000526 sec (incl time to compute L)
1368 compute C time: 0.000377 sec
1369 reduce (C) time: 0.000001 sec
1370 rate 94.97 million edges/sec (incl time for L=tril(A))
1371 rate 132.06 million edges/sec (just tricount itself)
1376 total time to read A matrix: 0.000275 sec
1379 U=triu(A) time: 0.000024 sec
1380 L=tril(A) time: 0.000007 sec
1384 L*U' time (dot): 0.000032 sec
1385 tricount time: 0.000035 sec (dot product method)
1386 tri+prep time: 0.000065 sec (incl time to compute L and U)
1387 compute C time: 0.000032 sec
1388 reduce (C) time: 0.000003 sec
1389 rate 4.41 million edges/sec (incl time for U=triu(A))
1390 rate 8.25 million edges/sec (just tricount itself)
1391 L*U' time (dot): 0.000011 sec (nthreads: 2 speedup 2.86698)
1392 tricount time: 0.000012 sec (dot product method)
1393 tri+prep time: 0.000043 sec (incl time to compute L and U)
1394 compute C time: 0.000011 sec
1395 reduce (C) time: 0.000001 sec
1396 rate 6.75 million edges/sec (incl time for U=triu(A))
1397 rate 23.50 million edges/sec (just tricount itself)
1398 L*U' time (dot): 0.000008 sec (nthreads: 4 speedup 3.78072)
1399 tricount time: 0.000009 sec (dot product method)
1400 tri+prep time: 0.000040 sec (incl time to compute L and U)
1401 compute C time: 0.000008 sec
1402 reduce (C) time: 0.000001 sec
1403 rate 7.26 million edges/sec (incl time for U=triu(A))
1404 rate 31.12 million edges/sec (just tricount itself)
1405 L*U' time (dot): 0.000007 sec (nthreads: 8 speedup 4.41994)
1406 tricount time: 0.000008 sec (dot product method)
1407 tri+prep time: 0.000038 sec (incl time to compute L and U)
1408 compute C time: 0.000007 sec
1409 reduce (C) time: 0.000001 sec
1410 rate 7.52 million edges/sec (incl time for U=triu(A))
1411 rate 36.41 million edges/sec (just tricount itself)
1412 L*U' time (dot): 0.000012 sec
1413 tricount time: 0.000013 sec (dot product method)
1414 tri+prep time: 0.000044 sec (incl time to compute L and U)
1415 compute C time: 0.000012 sec
1416 reduce (C) time: 0.000001 sec
1417 rate 6.56 million edges/sec (incl time for U=triu(A))
1418 rate 21.38 million edges/sec (just tricount itself)
1419 L*U' time (dot): 0.000010 sec (nthreads: 2 speedup 1.20171)
1420 tricount time: 0.000011 sec (dot product method)
1421 tri+prep time: 0.000041 sec (incl time to compute L and U)
1422 compute C time: 0.000010 sec
1423 reduce (C) time: 0.000001 sec
1424 rate 6.94 million edges/sec (incl time for U=triu(A))
1425 rate 25.94 million edges/sec (just tricount itself)
1426 L*U' time (dot): 0.000009 sec (nthreads: 4 speedup 1.38725)
1427 tricount time: 0.000010 sec (dot product method)
1428 tri+prep time: 0.000040 sec (incl time to compute L and U)
1429 compute C time: 0.000009 sec
1430 reduce (C) time: 0.000001 sec
1431 rate 7.19 million edges/sec (incl time for U=triu(A))
1432 rate 29.86 million edges/sec (just tricount itself)
1433 L*U' time (dot): 0.000008 sec (nthreads: 8 speedup 1.50481)
1434 tricount time: 0.000009 sec (dot product method)
1435 tri+prep time: 0.000039 sec (incl time to compute L and U)
1436 compute C time: 0.000008 sec
1437 reduce (C) time: 0.000001 sec
1438 rate 7.33 million edges/sec (incl time for U=triu(A))
1439 rate 32.36 million edges/sec (just tricount itself)
1442 C<L>=L*L time (saxpy): 0.000033 sec
1443 tricount time: 0.000034 sec (saxpy method)
1444 tri+prep time: 0.000041 sec (incl time to compute L)
1445 compute C time: 0.000033 sec
1446 reduce (C) time: 0.000001 sec
1447 rate 7.05 million edges/sec (incl time for L=tril(A))
1448 rate 8.43 million edges/sec (just tricount itself)
1449 C<L>=L*L time (saxpy): 0.000016 sec (nthreads: 2 speedup 2.01304)
1450 tricount time: 0.000017 sec (saxpy method)
1451 tri+prep time: 0.000024 sec (incl time to compute L)
1452 compute C time: 0.000016 sec
1453 reduce (C) time: 0.000001 sec
1454 rate 12.08 million edges/sec (incl time for L=tril(A))
1455 rate 16.79 million edges/sec (just tricount itself)
1456 C<L>=L*L time (saxpy): 0.000014 sec (nthreads: 4 speedup 2.44745)
1457 tricount time: 0.000014 sec (saxpy method)
1458 tri+prep time: 0.000021 sec (incl time to compute L)
1459 compute C time: 0.000014 sec
1460 reduce (C) time: 0.000001 sec
1461 rate 13.81 million edges/sec (incl time for L=tril(A))
1462 rate 20.31 million edges/sec (just tricount itself)
1463 C<L>=L*L time (saxpy): 0.000014 sec (nthreads: 8 speedup 2.43042)
1464 tricount time: 0.000014 sec (saxpy method)
1465 tri+prep time: 0.000021 sec (incl time to compute L)
1466 compute C time: 0.000014 sec
1467 reduce (C) time: 0.000001 sec
1468 rate 13.68 million edges/sec (incl time for L=tril(A))
1469 rate 20.04 million edges/sec (just tricount itself)
1472 Wathen: nx 200 ny 200 n 120801 nz 1762400 method 0, time: 0.166 sec
1474 total time to read A matrix: 0.168617 sec
1477 U=triu(A) time: 0.002978 sec
1478 L=tril(A) time: 0.002865 sec
1482 L*U' time (dot): 0.029921 sec
1483 tricount time: 0.032427 sec (dot product method)
1484 tri+prep time: 0.038270 sec (incl time to compute L and U)
1485 compute C time: 0.029921 sec
1486 reduce (C) time: 0.002506 sec
1487 rate 23.03 million edges/sec (incl time for U=triu(A))
1488 rate 27.18 million edges/sec (just tricount itself)
1489 L*U' time (dot): 0.011246 sec (nthreads: 2 speedup 2.66055)
1490 tricount time: 0.012483 sec (dot product method)
1491 tri+prep time: 0.018327 sec (incl time to compute L and U)
1492 compute C time: 0.011246 sec
1493 reduce (C) time: 0.001237 sec
1494 rate 48.08 million edges/sec (incl time for U=triu(A))
1495 rate 70.59 million edges/sec (just tricount itself)
1496 L*U' time (dot): 0.008601 sec (nthreads: 4 speedup 3.47878)
1497 tricount time: 0.009216 sec (dot product method)
1498 tri+prep time: 0.015059 sec (incl time to compute L and U)
1499 compute C time: 0.008601 sec
1500 reduce (C) time: 0.000615 sec
1501 rate 58.52 million edges/sec (incl time for U=triu(A))
1502 rate 95.62 million edges/sec (just tricount itself)
1503 L*U' time (dot): 0.006418 sec (nthreads: 8 speedup 4.66232)
1504 tricount time: 0.006907 sec (dot product method)
1505 tri+prep time: 0.012751 sec (incl time to compute L and U)
1506 compute C time: 0.006418 sec
1507 reduce (C) time: 0.000490 sec
1508 rate 69.11 million edges/sec (incl time for U=triu(A))
1509 rate 127.57 million edges/sec (just tricount itself)
1510 L*U' time (dot): 0.023521 sec
1511 tricount time: 0.026194 sec (dot product method)
1512 tri+prep time: 0.032037 sec (incl time to compute L and U)
1513 compute C time: 0.023521 sec
1514 reduce (C) time: 0.002673 sec
1515 rate 27.51 million edges/sec (incl time for U=triu(A))
1516 rate 33.64 million edges/sec (just tricount itself)
1517 L*U' time (dot): 0.011493 sec (nthreads: 2 speedup 2.04665)
1518 tricount time: 0.012796 sec (dot product method)
1519 tri+prep time: 0.018639 sec (incl time to compute L and U)
1520 compute C time: 0.011493 sec
1521 reduce (C) time: 0.001303 sec
1522 rate 47.28 million edges/sec (incl time for U=triu(A))
1523 rate 68.87 million edges/sec (just tricount itself)
1524 L*U' time (dot): 0.006705 sec (nthreads: 4 speedup 3.50821)
1525 tricount time: 0.007384 sec (dot product method)
1526 tri+prep time: 0.013228 sec (incl time to compute L and U)
1527 compute C time: 0.006705 sec
1528 reduce (C) time: 0.000680 sec
1529 rate 66.62 million edges/sec (incl time for U=triu(A))
1530 rate 119.34 million edges/sec (just tricount itself)
1531 L*U' time (dot): 0.009763 sec (nthreads: 8 speedup 2.4093)
1532 tricount time: 0.010669 sec (dot product method)
1533 tri+prep time: 0.016512 sec (incl time to compute L and U)
1534 compute C time: 0.009763 sec
1535 reduce (C) time: 0.000906 sec
1536 rate 53.37 million edges/sec (incl time for U=triu(A))
1537 rate 82.59 million edges/sec (just tricount itself)
1540 C<L>=L*L time (saxpy): 0.026566 sec
1541 tricount time: 0.028627 sec (saxpy method)
1542 tri+prep time: 0.031492 sec (incl time to compute L)
1543 compute C time: 0.026566 sec
1544 reduce (C) time: 0.002061 sec
1545 rate 27.98 million edges/sec (incl time for L=tril(A))
1546 rate 30.78 million edges/sec (just tricount itself)
1547 C<L>=L*L time (saxpy): 0.022131 sec (nthreads: 2 speedup 1.2004)
1548 tricount time: 0.023573 sec (saxpy method)
1549 tri+prep time: 0.026438 sec (incl time to compute L)
1550 compute C time: 0.022131 sec
1551 reduce (C) time: 0.001442 sec
1552 rate 33.33 million edges/sec (incl time for L=tril(A))
1553 rate 37.38 million edges/sec (just tricount itself)
1554 C<L>=L*L time (saxpy): 0.011668 sec (nthreads: 4 speedup 2.27679)
1555 tricount time: 0.012288 sec (saxpy method)
1556 tri+prep time: 0.015153 sec (incl time to compute L)
1557 compute C time: 0.011668 sec
1558 reduce (C) time: 0.000620 sec
1559 rate 58.15 million edges/sec (incl time for L=tril(A))
1560 rate 71.71 million edges/sec (just tricount itself)
1561 C<L>=L*L time (saxpy): 0.016841 sec (nthreads: 8 speedup 1.57751)
1562 tricount time: 0.018066 sec (saxpy method)
1563 tri+prep time: 0.020931 sec (incl time to compute L)
1564 compute C time: 0.016841 sec
1565 reduce (C) time: 0.001225 sec
1566 rate 42.10 million edges/sec (incl time for L=tril(A))
1567 rate 48.78 million edges/sec (just tricount itself)
1570 random 10000 by 10000, nz: 199768, method 0 time 0.027 sec
1572 total time to read A matrix: 0.028004 sec
1575 U=triu(A) time: 0.000362 sec
1576 L=tril(A) time: 0.000234 sec
1580 L*U' time (dot): 0.011664 sec
1581 tricount time: 0.011843 sec (dot product method)
1582 tri+prep time: 0.012439 sec (incl time to compute L and U)
1583 compute C time: 0.011664 sec
1584 reduce (C) time: 0.000179 sec
1585 rate 8.03 million edges/sec (incl time for U=triu(A))
1586 rate 8.43 million edges/sec (just tricount itself)
1587 L*U' time (dot): 0.005893 sec (nthreads: 2 speedup 1.97936)
1588 tricount time: 0.006089 sec (dot product method)
1589 tri+prep time: 0.006686 sec (incl time to compute L and U)
1590 compute C time: 0.005893 sec
1591 reduce (C) time: 0.000196 sec
1592 rate 14.94 million edges/sec (incl time for U=triu(A))
1593 rate 16.40 million edges/sec (just tricount itself)
1594 L*U' time (dot): 0.003444 sec (nthreads: 4 speedup 3.387)
1595 tricount time: 0.003609 sec (dot product method)
1596 tri+prep time: 0.004206 sec (incl time to compute L and U)
1597 compute C time: 0.003444 sec
1598 reduce (C) time: 0.000165 sec
1599 rate 23.75 million edges/sec (incl time for U=triu(A))
1600 rate 27.67 million edges/sec (just tricount itself)
1601 L*U' time (dot): 0.002678 sec (nthreads: 8 speedup 4.35594)
1602 tricount time: 0.002885 sec (dot product method)
1603 tri+prep time: 0.003481 sec (incl time to compute L and U)
1604 compute C time: 0.002678 sec
1605 reduce (C) time: 0.000207 sec
1606 rate 28.69 million edges/sec (incl time for U=triu(A))
1607 rate 34.63 million edges/sec (just tricount itself)
1608 L*U' time (dot): 0.012640 sec
1609 tricount time: 0.012779 sec (dot product method)
1610 tri+prep time: 0.013376 sec (incl time to compute L and U)
1611 compute C time: 0.012640 sec
1612 reduce (C) time: 0.000139 sec
1613 rate 7.47 million edges/sec (incl time for U=triu(A))
1614 rate 7.82 million edges/sec (just tricount itself)
1615 L*U' time (dot): 0.004852 sec (nthreads: 2 speedup 2.60499)
1616 tricount time: 0.004964 sec (dot product method)
1617 tri+prep time: 0.005561 sec (incl time to compute L and U)
1618 compute C time: 0.004852 sec
1619 reduce (C) time: 0.000112 sec
1620 rate 17.96 million edges/sec (incl time for U=triu(A))
1621 rate 20.12 million edges/sec (just tricount itself)
1622 L*U' time (dot): 0.002892 sec (nthreads: 4 speedup 4.37131)
1623 tricount time: 0.002976 sec (dot product method)
1624 tri+prep time: 0.003572 sec (incl time to compute L and U)
1625 compute C time: 0.002892 sec
1626 reduce (C) time: 0.000085 sec
1627 rate 27.96 million edges/sec (incl time for U=triu(A))
1628 rate 33.56 million edges/sec (just tricount itself)
1629 L*U' time (dot): 0.004180 sec (nthreads: 8 speedup 3.02402)
1630 tricount time: 0.004349 sec (dot product method)
1631 tri+prep time: 0.004946 sec (incl time to compute L and U)
1632 compute C time: 0.004180 sec
1633 reduce (C) time: 0.000169 sec
1634 rate 20.20 million edges/sec (incl time for U=triu(A))
1635 rate 22.97 million edges/sec (just tricount itself)
1638 C<L>=L*L time (saxpy): 0.003369 sec
1639 tricount time: 0.003378 sec (saxpy method)
1640 tri+prep time: 0.003612 sec (incl time to compute L)
1641 compute C time: 0.003369 sec
1642 reduce (C) time: 0.000009 sec
1643 rate 27.65 million edges/sec (incl time for L=tril(A))
1644 rate 29.57 million edges/sec (just tricount itself)
1645 C<L>=L*L time (saxpy): 0.002108 sec (nthreads: 2 speedup 1.59874)
1646 tricount time: 0.002115 sec (saxpy method)
1647 tri+prep time: 0.002349 sec (incl time to compute L)
1648 compute C time: 0.002108 sec
1649 reduce (C) time: 0.000007 sec
1650 rate 42.53 million edges/sec (incl time for L=tril(A))
1651 rate 47.24 million edges/sec (just tricount itself)
1652 C<L>=L*L time (saxpy): 0.001484 sec (nthreads: 4 speedup 2.27006)
1653 tricount time: 0.001490 sec (saxpy method)
1654 tri+prep time: 0.001724 sec (incl time to compute L)
1655 compute C time: 0.001484 sec
1656 reduce (C) time: 0.000006 sec
1657 rate 57.92 million edges/sec (incl time for L=tril(A))
1658 rate 67.02 million edges/sec (just tricount itself)
1659 C<L>=L*L time (saxpy): 0.005230 sec (nthreads: 8 speedup 0.644297)
1660 tricount time: 0.005238 sec (saxpy method)
1661 tri+prep time: 0.005472 sec (incl time to compute L)
1662 compute C time: 0.005230 sec
1663 reduce (C) time: 0.000008 sec
1664 rate 18.25 million edges/sec (incl time for L=tril(A))
1665 rate 19.07 million edges/sec (just tricount itself)
1668 random 10000 by 10000, nz: 199768, method 1 time 0.017 sec
1670 total time to read A matrix: 0.017593 sec
1673 U=triu(A) time: 0.000807 sec
1674 L=tril(A) time: 0.000660 sec
1678 L*U' time (dot): 0.014539 sec
1679 tricount time: 0.014694 sec (dot product method)
1680 tri+prep time: 0.016162 sec (incl time to compute L and U)
1681 compute C time: 0.014539 sec
1682 reduce (C) time: 0.000156 sec
1683 rate 6.18 million edges/sec (incl time for U=triu(A))
1684 rate 6.80 million edges/sec (just tricount itself)
1685 L*U' time (dot): 0.005467 sec (nthreads: 2 speedup 2.65947)
1686 tricount time: 0.005544 sec (dot product method)
1687 tri+prep time: 0.007011 sec (incl time to compute L and U)
1688 compute C time: 0.005467 sec
1689 reduce (C) time: 0.000077 sec
1690 rate 14.25 million edges/sec (incl time for U=triu(A))
1691 rate 18.02 million edges/sec (just tricount itself)
1692 L*U' time (dot): 0.003181 sec (nthreads: 4 speedup 4.57045)
1693 tricount time: 0.003257 sec (dot product method)
1694 tri+prep time: 0.004724 sec (incl time to compute L and U)
1695 compute C time: 0.003181 sec
1696 reduce (C) time: 0.000076 sec
1697 rate 21.14 million edges/sec (incl time for U=triu(A))
1698 rate 30.67 million edges/sec (just tricount itself)
1699 L*U' time (dot): 0.002482 sec (nthreads: 8 speedup 5.85712)
1700 tricount time: 0.002570 sec (dot product method)
1701 tri+prep time: 0.004037 sec (incl time to compute L and U)
1702 compute C time: 0.002482 sec
1703 reduce (C) time: 0.000088 sec
1704 rate 24.74 million edges/sec (incl time for U=triu(A))
1705 rate 38.87 million edges/sec (just tricount itself)
1706 L*U' time (dot): 0.013548 sec
1707 tricount time: 0.013735 sec (dot product method)
1708 tri+prep time: 0.015202 sec (incl time to compute L and U)
1709 compute C time: 0.013548 sec
1710 reduce (C) time: 0.000187 sec
1711 rate 6.57 million edges/sec (incl time for U=triu(A))
1712 rate 7.27 million edges/sec (just tricount itself)
1713 L*U' time (dot): 0.005883 sec (nthreads: 2 speedup 2.30282)
1714 tricount time: 0.006074 sec (dot product method)
1715 tri+prep time: 0.007542 sec (incl time to compute L and U)
1716 compute C time: 0.005883 sec
1717 reduce (C) time: 0.000191 sec
1718 rate 13.24 million edges/sec (incl time for U=triu(A))
1719 rate 16.44 million edges/sec (just tricount itself)
1720 L*U' time (dot): 0.003481 sec (nthreads: 4 speedup 3.89243)
1721 tricount time: 0.003664 sec (dot product method)
1722 tri+prep time: 0.005131 sec (incl time to compute L and U)
1723 compute C time: 0.003481 sec
1724 reduce (C) time: 0.000183 sec
1725 rate 19.47 million edges/sec (incl time for U=triu(A))
1726 rate 27.26 million edges/sec (just tricount itself)
1727 L*U' time (dot): 0.002990 sec (nthreads: 8 speedup 4.53042)
1728 tricount time: 0.003239 sec (dot product method)
1729 tri+prep time: 0.004706 sec (incl time to compute L and U)
1730 compute C time: 0.002990 sec
1731 reduce (C) time: 0.000249 sec
1732 rate 21.22 million edges/sec (incl time for U=triu(A))
1733 rate 30.84 million edges/sec (just tricount itself)
1736 C<L>=L*L time (saxpy): 0.004303 sec
1737 tricount time: 0.004314 sec (saxpy method)
1738 tri+prep time: 0.004974 sec (incl time to compute L)
1739 compute C time: 0.004303 sec
1740 reduce (C) time: 0.000011 sec
1741 rate 20.08 million edges/sec (incl time for L=tril(A))
1742 rate 23.15 million edges/sec (just tricount itself)
1743 C<L>=L*L time (saxpy): 0.002223 sec (nthreads: 2 speedup 1.93561)
1744 tricount time: 0.002230 sec (saxpy method)
1745 tri+prep time: 0.002890 sec (incl time to compute L)
1746 compute C time: 0.002223 sec
1747 reduce (C) time: 0.000008 sec
1748 rate 34.56 million edges/sec (incl time for L=tril(A))
1749 rate 44.78 million edges/sec (just tricount itself)
1750 C<L>=L*L time (saxpy): 0.001506 sec (nthreads: 4 speedup 2.8577)
1751 tricount time: 0.001511 sec (saxpy method)
1752 tri+prep time: 0.002171 sec (incl time to compute L)
1753 compute C time: 0.001506 sec
1754 reduce (C) time: 0.000005 sec
1755 rate 46.01 million edges/sec (incl time for L=tril(A))
1756 rate 66.10 million edges/sec (just tricount itself)
1757 C<L>=L*L time (saxpy): 0.001319 sec (nthreads: 8 speedup 3.26257)
1758 tricount time: 0.001325 sec (saxpy method)
1759 tri+prep time: 0.001985 sec (incl time to compute L)
1760 compute C time: 0.001319 sec
1761 reduce (C) time: 0.000006 sec
1762 rate 50.33 million edges/sec (incl time for L=tril(A))
1763 rate 75.40 million edges/sec (just tricount itself)
1766 random 100000 by 100000, nz: 19980330, method 0 time 2.496 sec
1768 total time to read A matrix: 2.523121 sec
1771 U=triu(A) time: 0.018984 sec
1772 L=tril(A) time: 0.020506 sec
1776 L*U' time (dot): 10.037756 sec
1777 tricount time: 10.065191 sec (dot product method)
1778 tri+prep time: 10.104681 sec (incl time to compute L and U)
1779 compute C time: 10.037756 sec
1780 reduce (C) time: 0.027436 sec
1781 rate 0.99 million edges/sec (incl time for U=triu(A))
1782 rate 0.99 million edges/sec (just tricount itself)
1783 L*U' time (dot): 5.268859 sec (nthreads: 2 speedup 1.90511)
1784 tricount time: 5.287288 sec (dot product method)
1785 tri+prep time: 5.326778 sec (incl time to compute L and U)
1786 compute C time: 5.268859 sec
1787 reduce (C) time: 0.018428 sec
1788 rate 1.88 million edges/sec (incl time for U=triu(A))
1789 rate 1.89 million edges/sec (just tricount itself)
1790 L*U' time (dot): 3.710080 sec (nthreads: 4 speedup 2.70554)
1791 tricount time: 3.724638 sec (dot product method)
1792 tri+prep time: 3.764128 sec (incl time to compute L and U)
1793 compute C time: 3.710080 sec
1794 reduce (C) time: 0.014557 sec
1795 rate 2.65 million edges/sec (incl time for U=triu(A))
1796 rate 2.68 million edges/sec (just tricount itself)
1797 L*U' time (dot): 2.599948 sec (nthreads: 8 speedup 3.86075)
1798 tricount time: 2.615894 sec (dot product method)
1799 tri+prep time: 2.655384 sec (incl time to compute L and U)
1800 compute C time: 2.599948 sec
1801 reduce (C) time: 0.015946 sec
1802 rate 3.76 million edges/sec (incl time for U=triu(A))
1803 rate 3.82 million edges/sec (just tricount itself)
1804 L*U' time (dot): 10.711924 sec
1805 tricount time: 10.739376 sec (dot product method)
1806 tri+prep time: 10.778866 sec (incl time to compute L and U)
1807 compute C time: 10.711924 sec
1808 reduce (C) time: 0.027452 sec
1809 rate 0.93 million edges/sec (incl time for U=triu(A))
1810 rate 0.93 million edges/sec (just tricount itself)
1811 L*U' time (dot): 6.001916 sec (nthreads: 2 speedup 1.78475)
1812 tricount time: 6.019951 sec (dot product method)
1813 tri+prep time: 6.059441 sec (incl time to compute L and U)
1814 compute C time: 6.001916 sec
1815 reduce (C) time: 0.018035 sec
1816 rate 1.65 million edges/sec (incl time for U=triu(A))
1817 rate 1.66 million edges/sec (just tricount itself)
1818 L*U' time (dot): 3.885379 sec (nthreads: 4 speedup 2.75698)
1819 tricount time: 3.899436 sec (dot product method)
1820 tri+prep time: 3.938926 sec (incl time to compute L and U)
1821 compute C time: 3.885379 sec
1822 reduce (C) time: 0.014056 sec
1823 rate 2.54 million edges/sec (incl time for U=triu(A))
1824 rate 2.56 million edges/sec (just tricount itself)
1825 L*U' time (dot): 2.636954 sec (nthreads: 8 speedup 4.06223)
1826 tricount time: 2.652757 sec (dot product method)
1827 tri+prep time: 2.692247 sec (incl time to compute L and U)
1828 compute C time: 2.636954 sec
1829 reduce (C) time: 0.015802 sec
1830 rate 3.71 million edges/sec (incl time for U=triu(A))
1831 rate 3.77 million edges/sec (just tricount itself)
1834 C<L>=L*L time (saxpy): 5.043538 sec
1835 tricount time: 5.049500 sec (saxpy method)
1836 tri+prep time: 5.070006 sec (incl time to compute L)
1837 compute C time: 5.043538 sec
1838 reduce (C) time: 0.005962 sec
1839 rate 1.97 million edges/sec (incl time for L=tril(A))
1840 rate 1.98 million edges/sec (just tricount itself)
1841 C<L>=L*L time (saxpy): 3.138652 sec (nthreads: 2 speedup 1.60691)
1842 tricount time: 3.141832 sec (saxpy method)
1843 tri+prep time: 3.162339 sec (incl time to compute L)
1844 compute C time: 3.138652 sec
1845 reduce (C) time: 0.003181 sec
1846 rate 3.16 million edges/sec (incl time for L=tril(A))
1847 rate 3.18 million edges/sec (just tricount itself)
1848 C<L>=L*L time (saxpy): 1.656651 sec (nthreads: 4 speedup 3.04442)
1849 tricount time: 1.658998 sec (saxpy method)
1850 tri+prep time: 1.679504 sec (incl time to compute L)
1851 compute C time: 1.656651 sec
1852 reduce (C) time: 0.002346 sec
1853 rate 5.95 million edges/sec (incl time for L=tril(A))
1854 rate 6.02 million edges/sec (just tricount itself)
1855 C<L>=L*L time (saxpy): 1.782248 sec (nthreads: 8 speedup 2.82988)
1856 tricount time: 1.783162 sec (saxpy method)
1857 tri+prep time: 1.803668 sec (incl time to compute L)
1858 compute C time: 1.782248 sec
1859 reduce (C) time: 0.000914 sec
1860 rate 5.54 million edges/sec (incl time for L=tril(A))
1861 rate 5.60 million edges/sec (just tricount itself)
1864 random 100000 by 100000, nz: 19980330, method 1 time 1.848 sec
1866 total time to read A matrix: 1.877002 sec
1869 U=triu(A) time: 0.019503 sec
1870 L=tril(A) time: 0.026980 sec
1874 L*U' time (dot): 9.740372 sec
1875 tricount time: 9.767869 sec (dot product method)
1876 tri+prep time: 9.814351 sec (incl time to compute L and U)
1877 compute C time: 9.740372 sec
1878 reduce (C) time: 0.027498 sec
1879 rate 1.02 million edges/sec (incl time for U=triu(A))
1880 rate 1.02 million edges/sec (just tricount itself)
1881 L*U' time (dot): 5.274905 sec (nthreads: 2 speedup 1.84655)
1882 tricount time: 5.291900 sec (dot product method)
1883 tri+prep time: 5.338383 sec (incl time to compute L and U)
1884 compute C time: 5.274905 sec
1885 reduce (C) time: 0.016996 sec
1886 rate 1.87 million edges/sec (incl time for U=triu(A))
1887 rate 1.89 million edges/sec (just tricount itself)
1888 L*U' time (dot): 3.592400 sec (nthreads: 4 speedup 2.71138)
1889 tricount time: 3.607020 sec (dot product method)
1890 tri+prep time: 3.653502 sec (incl time to compute L and U)
1891 compute C time: 3.592400 sec
1892 reduce (C) time: 0.014619 sec
1893 rate 2.73 million edges/sec (incl time for U=triu(A))
1894 rate 2.77 million edges/sec (just tricount itself)
1895 L*U' time (dot): 2.499505 sec (nthreads: 8 speedup 3.89692)
1896 tricount time: 2.515554 sec (dot product method)
1897 tri+prep time: 2.562037 sec (incl time to compute L and U)
1898 compute C time: 2.499505 sec
1899 reduce (C) time: 0.016050 sec
1900 rate 3.90 million edges/sec (incl time for U=triu(A))
1901 rate 3.97 million edges/sec (just tricount itself)
1902 L*U' time (dot): 10.443740 sec
1903 tricount time: 10.472049 sec (dot product method)
1904 tri+prep time: 10.518531 sec (incl time to compute L and U)
1905 compute C time: 10.443740 sec
1906 reduce (C) time: 0.028309 sec
1907 rate 0.95 million edges/sec (incl time for U=triu(A))
1908 rate 0.95 million edges/sec (just tricount itself)
1909 L*U' time (dot): 5.903907 sec (nthreads: 2 speedup 1.76895)
1910 tricount time: 5.922306 sec (dot product method)
1911 tri+prep time: 5.968789 sec (incl time to compute L and U)
1912 compute C time: 5.903907 sec
1913 reduce (C) time: 0.018399 sec
1914 rate 1.67 million edges/sec (incl time for U=triu(A))
1915 rate 1.69 million edges/sec (just tricount itself)
1916 L*U' time (dot): 3.949521 sec (nthreads: 4 speedup 2.64431)
1917 tricount time: 3.962544 sec (dot product method)
1918 tri+prep time: 4.009026 sec (incl time to compute L and U)
1919 compute C time: 3.949521 sec
1920 reduce (C) time: 0.013023 sec
1921 rate 2.49 million edges/sec (incl time for U=triu(A))
1922 rate 2.52 million edges/sec (just tricount itself)
1923 L*U' time (dot): 2.604668 sec (nthreads: 8 speedup 4.00962)
1924 tricount time: 2.620189 sec (dot product method)
1925 tri+prep time: 2.666671 sec (incl time to compute L and U)
1926 compute C time: 2.604668 sec
1927 reduce (C) time: 0.015521 sec
1928 rate 3.75 million edges/sec (incl time for U=triu(A))
1929 rate 3.81 million edges/sec (just tricount itself)
1932 C<L>=L*L time (saxpy): 4.623672 sec
1933 tricount time: 4.629221 sec (saxpy method)
1934 tri+prep time: 4.656200 sec (incl time to compute L)
1935 compute C time: 4.623672 sec
1936 reduce (C) time: 0.005549 sec
1937 rate 2.15 million edges/sec (incl time for L=tril(A))
1938 rate 2.16 million edges/sec (just tricount itself)
1939 C<L>=L*L time (saxpy): 2.570878 sec (nthreads: 2 speedup 1.79848)
1940 tricount time: 2.574308 sec (saxpy method)
1941 tri+prep time: 2.601288 sec (incl time to compute L)
1942 compute C time: 2.570878 sec
1943 reduce (C) time: 0.003430 sec
1944 rate 3.84 million edges/sec (incl time for L=tril(A))
1945 rate 3.88 million edges/sec (just tricount itself)
1946 C<L>=L*L time (saxpy): 1.508288 sec (nthreads: 4 speedup 3.06551)
1947 tricount time: 1.510577 sec (saxpy method)
1948 tri+prep time: 1.537557 sec (incl time to compute L)
1949 compute C time: 1.508288 sec
1950 reduce (C) time: 0.002289 sec
1951 rate 6.50 million edges/sec (incl time for L=tril(A))
1952 rate 6.61 million edges/sec (just tricount itself)
1953 C<L>=L*L time (saxpy): 1.565095 sec (nthreads: 8 speedup 2.95424)
1954 tricount time: 1.578662 sec (saxpy method)
1955 tri+prep time: 1.605642 sec (incl time to compute L)
1956 compute C time: 1.565095 sec
1957 reduce (C) time: 0.013567 sec
1958 rate 6.22 million edges/sec (incl time for L=tril(A))
1959 rate 6.33 million edges/sec (just tricount itself)