1--------------------------------------------------------------
2Wathen: nx 4 ny 4 n 65 nz 752 method 0, time: 0.000 sec
3
4total time to read A matrix:       0.000254 sec
5
6n 65 # edges 376
7U=triu(A) time:        0.000028 sec
8L=tril(A) time:        0.000007 sec
9
10------------------------------------- dot product method:
11# triangles 872
12L*U' time (dot):         0.000057 sec
13tricount time:         0.000061 sec (dot product method)
14tri+prep time:         0.000097 sec (incl time to compute L and U)
15compute C time:        0.000057 sec
16reduce (C) time:       0.000005 sec
17rate     3.89 million edges/sec (incl time for U=triu(A))
18rate     6.13 million edges/sec (just tricount itself)
19L*U' time (dot):         0.000014 sec (nthreads: 2 speedup 4.15599)
20tricount time:         0.000015 sec (dot product method)
21tri+prep time:         0.000051 sec (incl time to compute L and U)
22compute C time:        0.000014 sec
23reduce (C) time:       0.000002 sec
24rate     7.42 million edges/sec (incl time for U=triu(A))
25rate    24.38 million edges/sec (just tricount itself)
26L*U' time (dot):         0.000011 sec (nthreads: 4 speedup 5.05627)
27tricount time:         0.000013 sec (dot product method)
28tri+prep time:         0.000048 sec (incl time to compute L and U)
29compute C time:        0.000011 sec
30reduce (C) time:       0.000002 sec
31rate     7.84 million edges/sec (incl time for U=triu(A))
32rate    29.50 million edges/sec (just tricount itself)
33L*U' time (dot):         0.000011 sec (nthreads: 8 speedup 5.0794)
34tricount time:         0.000013 sec (dot product method)
35tri+prep time:         0.000048 sec (incl time to compute L and U)
36compute C time:        0.000011 sec
37reduce (C) time:       0.000002 sec
38rate     7.85 million edges/sec (incl time for U=triu(A))
39rate    29.65 million edges/sec (just tricount itself)
40L*U' time (dot):         0.000016 sec
41tricount time:         0.000018 sec (dot product method)
42tri+prep time:         0.000053 sec (incl time to compute L and U)
43compute C time:        0.000016 sec
44reduce (C) time:       0.000002 sec
45rate     7.07 million edges/sec (incl time for U=triu(A))
46rate    20.95 million edges/sec (just tricount itself)
47L*U' time (dot):         0.000012 sec (nthreads: 2 speedup 1.36091)
48tricount time:         0.000013 sec (dot product method)
49tri+prep time:         0.000049 sec (incl time to compute L and U)
50compute C time:        0.000012 sec
51reduce (C) time:       0.000002 sec
52rate     7.72 million edges/sec (incl time for U=triu(A))
53rate    27.87 million edges/sec (just tricount itself)
54L*U' time (dot):         0.000012 sec (nthreads: 4 speedup 1.38573)
55tricount time:         0.000013 sec (dot product method)
56tri+prep time:         0.000049 sec (incl time to compute L and U)
57compute C time:        0.000012 sec
58reduce (C) time:       0.000002 sec
59rate     7.75 million edges/sec (incl time for U=triu(A))
60rate    28.26 million edges/sec (just tricount itself)
61L*U' time (dot):         0.000012 sec (nthreads: 8 speedup 1.39356)
62tricount time:         0.000013 sec (dot product method)
63tri+prep time:         0.000048 sec (incl time to compute L and U)
64compute C time:        0.000012 sec
65reduce (C) time:       0.000002 sec
66rate     7.77 million edges/sec (incl time for U=triu(A))
67rate    28.54 million edges/sec (just tricount itself)
68
69----------------------------------- saxpy method:
70C<L>=L*L time (saxpy):         0.000051 sec
71tricount time:         0.000052 sec (saxpy method)
72tri+prep time:         0.000060 sec (incl time to compute L)
73compute C time:        0.000051 sec
74reduce (C) time:       0.000002 sec
75rate     6.31 million edges/sec (incl time for L=tril(A))
76rate     7.17 million edges/sec (just tricount itself)
77C<L>=L*L time (saxpy):         0.000025 sec (nthreads: 2 speedup 2.00982)
78tricount time:         0.000027 sec (saxpy method)
79tri+prep time:         0.000034 sec (incl time to compute L)
80compute C time:        0.000025 sec
81reduce (C) time:       0.000001 sec
82rate    11.11 million edges/sec (incl time for L=tril(A))
83rate    14.05 million edges/sec (just tricount itself)
84C<L>=L*L time (saxpy):         0.000022 sec (nthreads: 4 speedup 2.34526)
85tricount time:         0.000023 sec (saxpy method)
86tri+prep time:         0.000030 sec (incl time to compute L)
87compute C time:        0.000022 sec
88reduce (C) time:       0.000001 sec
89rate    12.45 million edges/sec (incl time for L=tril(A))
90rate    16.29 million edges/sec (just tricount itself)
91C<L>=L*L time (saxpy):         0.000022 sec (nthreads: 8 speedup 2.29399)
92tricount time:         0.000024 sec (saxpy method)
93tri+prep time:         0.000031 sec (incl time to compute L)
94compute C time:        0.000022 sec
95reduce (C) time:       0.000002 sec
96rate    12.23 million edges/sec (incl time for L=tril(A))
97rate    15.90 million edges/sec (just tricount itself)
98
99--------------------------------------------------------------
100random 5 by 5, nz: 18, method 1 time 0.000 sec
101
102total time to read A matrix:       0.000101 sec
103
104n 5 # edges 9
105U=triu(A) time:        0.000024 sec
106L=tril(A) time:        0.000003 sec
107
108------------------------------------- dot product method:
109# triangles 7
110L*U' time (dot):         0.000024 sec
111tricount time:         0.000027 sec (dot product method)
112tri+prep time:         0.000054 sec (incl time to compute L and U)
113compute C time:        0.000024 sec
114reduce (C) time:       0.000003 sec
115rate     0.17 million edges/sec (incl time for U=triu(A))
116rate     0.33 million edges/sec (just tricount itself)
117L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 4.80951)
118tricount time:         0.000006 sec (dot product method)
119tri+prep time:         0.000033 sec (incl time to compute L and U)
120compute C time:        0.000005 sec
121reduce (C) time:       0.000001 sec
122rate     0.27 million edges/sec (incl time for U=triu(A))
123rate     1.57 million edges/sec (just tricount itself)
124L*U' time (dot):         0.000004 sec (nthreads: 4 speedup 6.80389)
125tricount time:         0.000004 sec (dot product method)
126tri+prep time:         0.000031 sec (incl time to compute L and U)
127compute C time:        0.000004 sec
128reduce (C) time:       0.000000 sec
129rate     0.29 million edges/sec (incl time for U=triu(A))
130rate     2.26 million edges/sec (just tricount itself)
131L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 6.99995)
132tricount time:         0.000004 sec (dot product method)
133tri+prep time:         0.000031 sec (incl time to compute L and U)
134compute C time:        0.000003 sec
135reduce (C) time:       0.000000 sec
136rate     0.29 million edges/sec (incl time for U=triu(A))
137rate     2.34 million edges/sec (just tricount itself)
138L*U' time (dot):         0.000005 sec
139tricount time:         0.000005 sec (dot product method)
140tri+prep time:         0.000032 sec (incl time to compute L and U)
141compute C time:        0.000005 sec
142reduce (C) time:       0.000001 sec
143rate     0.28 million edges/sec (incl time for U=triu(A))
144rate     1.74 million edges/sec (just tricount itself)
145L*U' time (dot):         0.000004 sec (nthreads: 2 speedup 1.28269)
146tricount time:         0.000004 sec (dot product method)
147tri+prep time:         0.000031 sec (incl time to compute L and U)
148compute C time:        0.000004 sec
149reduce (C) time:       0.000000 sec
150rate     0.29 million edges/sec (incl time for U=triu(A))
151rate     2.27 million edges/sec (just tricount itself)
152L*U' time (dot):         0.000004 sec (nthreads: 4 speedup 1.23356)
153tricount time:         0.000004 sec (dot product method)
154tri+prep time:         0.000031 sec (incl time to compute L and U)
155compute C time:        0.000004 sec
156reduce (C) time:       0.000000 sec
157rate     0.29 million edges/sec (incl time for U=triu(A))
158rate     2.18 million edges/sec (just tricount itself)
159L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 1.37847)
160tricount time:         0.000004 sec (dot product method)
161tri+prep time:         0.000031 sec (incl time to compute L and U)
162compute C time:        0.000003 sec
163reduce (C) time:       0.000000 sec
164rate     0.29 million edges/sec (incl time for U=triu(A))
165rate     2.42 million edges/sec (just tricount itself)
166
167----------------------------------- saxpy method:
168C<L>=L*L time (saxpy):         0.000012 sec
169tricount time:         0.000013 sec (saxpy method)
170tri+prep time:         0.000016 sec (incl time to compute L)
171compute C time:        0.000012 sec
172reduce (C) time:       0.000001 sec
173rate     0.56 million edges/sec (incl time for L=tril(A))
174rate     0.69 million edges/sec (just tricount itself)
175C<L>=L*L time (saxpy):         0.000003 sec (nthreads: 2 speedup 4.12281)
176tricount time:         0.000003 sec (saxpy method)
177tri+prep time:         0.000007 sec (incl time to compute L)
178compute C time:        0.000003 sec
179reduce (C) time:       0.000000 sec
180rate     1.37 million edges/sec (incl time for L=tril(A))
181rate     2.68 million edges/sec (just tricount itself)
182C<L>=L*L time (saxpy):         0.000002 sec (nthreads: 4 speedup 5.12891)
183tricount time:         0.000003 sec (saxpy method)
184tri+prep time:         0.000006 sec (incl time to compute L)
185compute C time:        0.000002 sec
186reduce (C) time:       0.000000 sec
187rate     1.50 million edges/sec (incl time for L=tril(A))
188rate     3.25 million edges/sec (just tricount itself)
189C<L>=L*L time (saxpy):         0.000003 sec (nthreads: 8 speedup 3.73818)
190tricount time:         0.000004 sec (saxpy method)
191tri+prep time:         0.000007 sec (incl time to compute L)
192compute C time:        0.000003 sec
193reduce (C) time:       0.000000 sec
194rate     1.28 million edges/sec (incl time for L=tril(A))
195rate     2.37 million edges/sec (just tricount itself)
196
197--------------------------------------------------------------
198matrix 3 by 3, 0 entries, from stdin
199
200total time to read A matrix:       0.000136 sec
201
202n 3 # edges 0
203U=triu(A) time:        0.000023 sec
204L=tril(A) time:        0.000004 sec
205
206------------------------------------- dot product method:
207# triangles 0
208L*U' time (dot):         0.000032 sec
209tricount time:         0.000034 sec (dot product method)
210tri+prep time:         0.000061 sec (incl time to compute L and U)
211compute C time:        0.000032 sec
212reduce (C) time:       0.000002 sec
213rate     0.00 million edges/sec (incl time for U=triu(A))
214rate     0.00 million edges/sec (just tricount itself)
215L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 6.70786)
216tricount time:         0.000005 sec (dot product method)
217tri+prep time:         0.000032 sec (incl time to compute L and U)
218compute C time:        0.000005 sec
219reduce (C) time:       0.000001 sec
220rate     0.00 million edges/sec (incl time for U=triu(A))
221rate     0.00 million edges/sec (just tricount itself)
222L*U' time (dot):         0.000004 sec (nthreads: 4 speedup 8.34564)
223tricount time:         0.000004 sec (dot product method)
224tri+prep time:         0.000031 sec (incl time to compute L and U)
225compute C time:        0.000004 sec
226reduce (C) time:       0.000000 sec
227rate     0.00 million edges/sec (incl time for U=triu(A))
228rate     0.00 million edges/sec (just tricount itself)
229L*U' time (dot):         0.000004 sec (nthreads: 8 speedup 8.56331)
230tricount time:         0.000004 sec (dot product method)
231tri+prep time:         0.000031 sec (incl time to compute L and U)
232compute C time:        0.000004 sec
233reduce (C) time:       0.000000 sec
234rate     0.00 million edges/sec (incl time for U=triu(A))
235rate     0.00 million edges/sec (just tricount itself)
236L*U' time (dot):         0.000003 sec
237tricount time:         0.000003 sec (dot product method)
238tri+prep time:         0.000030 sec (incl time to compute L and U)
239compute C time:        0.000003 sec
240reduce (C) time:       0.000000 sec
241rate     0.00 million edges/sec (incl time for U=triu(A))
242rate     0.00 million edges/sec (just tricount itself)
243L*U' time (dot):         0.000002 sec (nthreads: 2 speedup 1.32441)
244tricount time:         0.000002 sec (dot product method)
245tri+prep time:         0.000029 sec (incl time to compute L and U)
246compute C time:        0.000002 sec
247reduce (C) time:       0.000000 sec
248rate     0.00 million edges/sec (incl time for U=triu(A))
249rate     0.00 million edges/sec (just tricount itself)
250L*U' time (dot):         0.000002 sec (nthreads: 4 speedup 1.26397)
251tricount time:         0.000002 sec (dot product method)
252tri+prep time:         0.000029 sec (incl time to compute L and U)
253compute C time:        0.000002 sec
254reduce (C) time:       0.000000 sec
255rate     0.00 million edges/sec (incl time for U=triu(A))
256rate     0.00 million edges/sec (just tricount itself)
257L*U' time (dot):         0.000004 sec (nthreads: 8 speedup 0.710361)
258tricount time:         0.000004 sec (dot product method)
259tri+prep time:         0.000031 sec (incl time to compute L and U)
260compute C time:        0.000004 sec
261reduce (C) time:       0.000000 sec
262rate     0.00 million edges/sec (incl time for U=triu(A))
263rate     0.00 million edges/sec (just tricount itself)
264
265----------------------------------- saxpy method:
266C<L>=L*L time (saxpy):         0.000026 sec
267tricount time:         0.000026 sec (saxpy method)
268tri+prep time:         0.000031 sec (incl time to compute L)
269compute C time:        0.000026 sec
270reduce (C) time:       0.000001 sec
271rate     0.00 million edges/sec (incl time for L=tril(A))
272rate     0.00 million edges/sec (just tricount itself)
273C<L>=L*L time (saxpy):         0.000006 sec (nthreads: 2 speedup 3.9542)
274tricount time:         0.000007 sec (saxpy method)
275tri+prep time:         0.000011 sec (incl time to compute L)
276compute C time:        0.000006 sec
277reduce (C) time:       0.000000 sec
278rate     0.00 million edges/sec (incl time for L=tril(A))
279rate     0.00 million edges/sec (just tricount itself)
280C<L>=L*L time (saxpy):         0.000004 sec (nthreads: 4 speedup 6.0321)
281tricount time:         0.000005 sec (saxpy method)
282tri+prep time:         0.000009 sec (incl time to compute L)
283compute C time:        0.000004 sec
284reduce (C) time:       0.000000 sec
285rate     0.00 million edges/sec (incl time for L=tril(A))
286rate     0.00 million edges/sec (just tricount itself)
287C<L>=L*L time (saxpy):         0.000005 sec (nthreads: 8 speedup 4.8894)
288tricount time:         0.000006 sec (saxpy method)
289tri+prep time:         0.000010 sec (incl time to compute L)
290compute C time:        0.000005 sec
291reduce (C) time:       0.000000 sec
292rate     0.00 million edges/sec (incl time for L=tril(A))
293rate     0.00 million edges/sec (just tricount itself)
294
295--------------------------------------------------------------
296matrix 4 by 4, 4 entries, from stdin
297
298total time to read A matrix:       0.000182 sec
299
300n 4 # edges 2
301U=triu(A) time:        0.000042 sec
302L=tril(A) time:        0.000005 sec
303
304------------------------------------- dot product method:
305# triangles 0
306L*U' time (dot):         0.000035 sec
307tricount time:         0.000038 sec (dot product method)
308tri+prep time:         0.000085 sec (incl time to compute L and U)
309compute C time:        0.000035 sec
310reduce (C) time:       0.000003 sec
311rate     0.02 million edges/sec (incl time for U=triu(A))
312rate     0.05 million edges/sec (just tricount itself)
313L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 6.83555)
314tricount time:         0.000006 sec (dot product method)
315tri+prep time:         0.000053 sec (incl time to compute L and U)
316compute C time:        0.000005 sec
317reduce (C) time:       0.000001 sec
318rate     0.04 million edges/sec (incl time for U=triu(A))
319rate     0.35 million edges/sec (just tricount itself)
320L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 11.2006)
321tricount time:         0.000003 sec (dot product method)
322tri+prep time:         0.000050 sec (incl time to compute L and U)
323compute C time:        0.000003 sec
324reduce (C) time:       0.000000 sec
325rate     0.04 million edges/sec (incl time for U=triu(A))
326rate     0.57 million edges/sec (just tricount itself)
327L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 10.4373)
328tricount time:         0.000004 sec (dot product method)
329tri+prep time:         0.000051 sec (incl time to compute L and U)
330compute C time:        0.000003 sec
331reduce (C) time:       0.000000 sec
332rate     0.04 million edges/sec (incl time for U=triu(A))
333rate     0.54 million edges/sec (just tricount itself)
334L*U' time (dot):         0.000006 sec
335tricount time:         0.000007 sec (dot product method)
336tri+prep time:         0.000054 sec (incl time to compute L and U)
337compute C time:        0.000006 sec
338reduce (C) time:       0.000001 sec
339rate     0.04 million edges/sec (incl time for U=triu(A))
340rate     0.30 million edges/sec (just tricount itself)
341L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 1.09848)
342tricount time:         0.000006 sec (dot product method)
343tri+prep time:         0.000053 sec (incl time to compute L and U)
344compute C time:        0.000005 sec
345reduce (C) time:       0.000000 sec
346rate     0.04 million edges/sec (incl time for U=triu(A))
347rate     0.34 million edges/sec (just tricount itself)
348L*U' time (dot):         0.000004 sec (nthreads: 4 speedup 1.68923)
349tricount time:         0.000004 sec (dot product method)
350tri+prep time:         0.000051 sec (incl time to compute L and U)
351compute C time:        0.000004 sec
352reduce (C) time:       0.000000 sec
353rate     0.04 million edges/sec (incl time for U=triu(A))
354rate     0.51 million edges/sec (just tricount itself)
355L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 1.97691)
356tricount time:         0.000003 sec (dot product method)
357tri+prep time:         0.000050 sec (incl time to compute L and U)
358compute C time:        0.000003 sec
359reduce (C) time:       0.000000 sec
360rate     0.04 million edges/sec (incl time for U=triu(A))
361rate     0.60 million edges/sec (just tricount itself)
362
363----------------------------------- saxpy method:
364C<L>=L*L time (saxpy):         0.000022 sec
365tricount time:         0.000023 sec (saxpy method)
366tri+prep time:         0.000028 sec (incl time to compute L)
367compute C time:        0.000022 sec
368reduce (C) time:       0.000001 sec
369rate     0.07 million edges/sec (incl time for L=tril(A))
370rate     0.09 million edges/sec (just tricount itself)
371C<L>=L*L time (saxpy):         0.000008 sec (nthreads: 2 speedup 2.89783)
372tricount time:         0.000008 sec (saxpy method)
373tri+prep time:         0.000013 sec (incl time to compute L)
374compute C time:        0.000008 sec
375reduce (C) time:       0.000001 sec
376rate     0.15 million edges/sec (incl time for L=tril(A))
377rate     0.24 million edges/sec (just tricount itself)
378C<L>=L*L time (saxpy):         0.000004 sec (nthreads: 4 speedup 4.92595)
379tricount time:         0.000005 sec (saxpy method)
380tri+prep time:         0.000010 sec (incl time to compute L)
381compute C time:        0.000004 sec
382reduce (C) time:       0.000000 sec
383rate     0.20 million edges/sec (incl time for L=tril(A))
384rate     0.42 million edges/sec (just tricount itself)
385C<L>=L*L time (saxpy):         0.000008 sec (nthreads: 8 speedup 2.76205)
386tricount time:         0.000008 sec (saxpy method)
387tri+prep time:         0.000014 sec (incl time to compute L)
388compute C time:        0.000008 sec
389reduce (C) time:       0.000001 sec
390rate     0.15 million edges/sec (incl time for L=tril(A))
391rate     0.24 million edges/sec (just tricount itself)
392
393--------------------------------------------------------------
394matrix 4 by 4, 10 entries, from stdin
395
396total time to read A matrix:       0.000188 sec
397
398n 4 # edges 5
399U=triu(A) time:        0.000030 sec
400L=tril(A) time:        0.000004 sec
401
402------------------------------------- dot product method:
403# triangles 2
404L*U' time (dot):         0.000028 sec
405tricount time:         0.000031 sec (dot product method)
406tri+prep time:         0.000066 sec (incl time to compute L and U)
407compute C time:        0.000028 sec
408reduce (C) time:       0.000003 sec
409rate     0.08 million edges/sec (incl time for U=triu(A))
410rate     0.16 million edges/sec (just tricount itself)
411L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 5.73686)
412tricount time:         0.000006 sec (dot product method)
413tri+prep time:         0.000040 sec (incl time to compute L and U)
414compute C time:        0.000005 sec
415reduce (C) time:       0.000001 sec
416rate     0.13 million edges/sec (incl time for U=triu(A))
417rate     0.90 million edges/sec (just tricount itself)
418L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 8.37966)
419tricount time:         0.000004 sec (dot product method)
420tri+prep time:         0.000038 sec (incl time to compute L and U)
421compute C time:        0.000003 sec
422reduce (C) time:       0.000000 sec
423rate     0.13 million edges/sec (incl time for U=triu(A))
424rate     1.32 million edges/sec (just tricount itself)
425L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 8.68117)
426tricount time:         0.000004 sec (dot product method)
427tri+prep time:         0.000038 sec (incl time to compute L and U)
428compute C time:        0.000003 sec
429reduce (C) time:       0.000000 sec
430rate     0.13 million edges/sec (incl time for U=triu(A))
431rate     1.35 million edges/sec (just tricount itself)
432L*U' time (dot):         0.000004 sec
433tricount time:         0.000004 sec (dot product method)
434tri+prep time:         0.000038 sec (incl time to compute L and U)
435compute C time:        0.000004 sec
436reduce (C) time:       0.000001 sec
437rate     0.13 million edges/sec (incl time for U=triu(A))
438rate     1.17 million edges/sec (just tricount itself)
439L*U' time (dot):         0.000003 sec (nthreads: 2 speedup 1.20088)
440tricount time:         0.000004 sec (dot product method)
441tri+prep time:         0.000038 sec (incl time to compute L and U)
442compute C time:        0.000003 sec
443reduce (C) time:       0.000000 sec
444rate     0.13 million edges/sec (incl time for U=triu(A))
445rate     1.40 million edges/sec (just tricount itself)
446L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 1.08018)
447tricount time:         0.000004 sec (dot product method)
448tri+prep time:         0.000038 sec (incl time to compute L and U)
449compute C time:        0.000003 sec
450reduce (C) time:       0.000000 sec
451rate     0.13 million edges/sec (incl time for U=triu(A))
452rate     1.27 million edges/sec (just tricount itself)
453L*U' time (dot):         0.000002 sec (nthreads: 8 speedup 1.68545)
454tricount time:         0.000003 sec (dot product method)
455tri+prep time:         0.000037 sec (incl time to compute L and U)
456compute C time:        0.000002 sec
457reduce (C) time:       0.000001 sec
458rate     0.14 million edges/sec (incl time for U=triu(A))
459rate     1.79 million edges/sec (just tricount itself)
460
461----------------------------------- saxpy method:
462C<L>=L*L time (saxpy):         0.000008 sec
463tricount time:         0.000009 sec (saxpy method)
464tri+prep time:         0.000013 sec (incl time to compute L)
465compute C time:        0.000008 sec
466reduce (C) time:       0.000000 sec
467rate     0.40 million edges/sec (incl time for L=tril(A))
468rate     0.57 million edges/sec (just tricount itself)
469C<L>=L*L time (saxpy):         0.000002 sec (nthreads: 2 speedup 3.33955)
470tricount time:         0.000003 sec (saxpy method)
471tri+prep time:         0.000007 sec (incl time to compute L)
472compute C time:        0.000002 sec
473reduce (C) time:       0.000000 sec
474rate     0.75 million edges/sec (incl time for L=tril(A))
475rate     1.75 million edges/sec (just tricount itself)
476C<L>=L*L time (saxpy):         0.000002 sec (nthreads: 4 speedup 4.10873)
477tricount time:         0.000002 sec (saxpy method)
478tri+prep time:         0.000006 sec (incl time to compute L)
479compute C time:        0.000002 sec
480reduce (C) time:       0.000000 sec
481rate     0.81 million edges/sec (incl time for L=tril(A))
482rate     2.11 million edges/sec (just tricount itself)
483C<L>=L*L time (saxpy):         0.000004 sec (nthreads: 8 speedup 1.98804)
484tricount time:         0.000005 sec (saxpy method)
485tri+prep time:         0.000008 sec (incl time to compute L)
486compute C time:        0.000004 sec
487reduce (C) time:       0.000000 sec
488rate     0.59 million edges/sec (incl time for L=tril(A))
489rate     1.07 million edges/sec (just tricount itself)
490
491--------------------------------------------------------------
492matrix 7 by 7, 16 entries, from stdin
493
494total time to read A matrix:       0.000242 sec
495
496n 7 # edges 8
497U=triu(A) time:        0.000018 sec
498L=tril(A) time:        0.000003 sec
499
500------------------------------------- dot product method:
501# triangles 0
502L*U' time (dot):         0.000033 sec
503tricount time:         0.000035 sec (dot product method)
504tri+prep time:         0.000057 sec (incl time to compute L and U)
505compute C time:        0.000033 sec
506reduce (C) time:       0.000003 sec
507rate     0.14 million edges/sec (incl time for U=triu(A))
508rate     0.23 million edges/sec (just tricount itself)
509L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 6.33494)
510tricount time:         0.000006 sec (dot product method)
511tri+prep time:         0.000027 sec (incl time to compute L and U)
512compute C time:        0.000005 sec
513reduce (C) time:       0.000001 sec
514rate     0.29 million edges/sec (incl time for U=triu(A))
515rate     1.40 million edges/sec (just tricount itself)
516L*U' time (dot):         0.000004 sec (nthreads: 4 speedup 8.49876)
517tricount time:         0.000004 sec (dot product method)
518tri+prep time:         0.000026 sec (incl time to compute L and U)
519compute C time:        0.000004 sec
520reduce (C) time:       0.000000 sec
521rate     0.31 million edges/sec (incl time for U=triu(A))
522rate     1.90 million edges/sec (just tricount itself)
523L*U' time (dot):         0.000004 sec (nthreads: 8 speedup 9.15498)
524tricount time:         0.000004 sec (dot product method)
525tri+prep time:         0.000025 sec (incl time to compute L and U)
526compute C time:        0.000004 sec
527reduce (C) time:       0.000000 sec
528rate     0.31 million edges/sec (incl time for U=triu(A))
529rate     2.04 million edges/sec (just tricount itself)
530L*U' time (dot):         0.000004 sec
531tricount time:         0.000005 sec (dot product method)
532tri+prep time:         0.000026 sec (incl time to compute L and U)
533compute C time:        0.000004 sec
534reduce (C) time:       0.000000 sec
535rate     0.31 million edges/sec (incl time for U=triu(A))
536rate     1.71 million edges/sec (just tricount itself)
537L*U' time (dot):         0.000004 sec (nthreads: 2 speedup 1.09382)
538tricount time:         0.000004 sec (dot product method)
539tri+prep time:         0.000026 sec (incl time to compute L and U)
540compute C time:        0.000004 sec
541reduce (C) time:       0.000000 sec
542rate     0.31 million edges/sec (incl time for U=triu(A))
543rate     1.89 million edges/sec (just tricount itself)
544L*U' time (dot):         0.000004 sec (nthreads: 4 speedup 1.21016)
545tricount time:         0.000004 sec (dot product method)
546tri+prep time:         0.000025 sec (incl time to compute L and U)
547compute C time:        0.000004 sec
548reduce (C) time:       0.000000 sec
549rate     0.32 million edges/sec (incl time for U=triu(A))
550rate     2.07 million edges/sec (just tricount itself)
551L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 1.26045)
552tricount time:         0.000004 sec (dot product method)
553tri+prep time:         0.000025 sec (incl time to compute L and U)
554compute C time:        0.000003 sec
555reduce (C) time:       0.000000 sec
556rate     0.32 million edges/sec (incl time for U=triu(A))
557rate     2.16 million edges/sec (just tricount itself)
558
559----------------------------------- saxpy method:
560C<L>=L*L time (saxpy):         0.000020 sec
561tricount time:         0.000020 sec (saxpy method)
562tri+prep time:         0.000024 sec (incl time to compute L)
563compute C time:        0.000020 sec
564reduce (C) time:       0.000000 sec
565rate     0.34 million edges/sec (incl time for L=tril(A))
566rate     0.40 million edges/sec (just tricount itself)
567C<L>=L*L time (saxpy):         0.000005 sec (nthreads: 2 speedup 4.04016)
568tricount time:         0.000005 sec (saxpy method)
569tri+prep time:         0.000009 sec (incl time to compute L)
570compute C time:        0.000005 sec
571reduce (C) time:       0.000000 sec
572rate     0.93 million edges/sec (incl time for L=tril(A))
573rate     1.53 million edges/sec (just tricount itself)
574C<L>=L*L time (saxpy):         0.000004 sec (nthreads: 4 speedup 4.86751)
575tricount time:         0.000004 sec (saxpy method)
576tri+prep time:         0.000008 sec (incl time to compute L)
577compute C time:        0.000004 sec
578reduce (C) time:       0.000000 sec
579rate     1.03 million edges/sec (incl time for L=tril(A))
580rate     1.83 million edges/sec (just tricount itself)
581C<L>=L*L time (saxpy):         0.000005 sec (nthreads: 8 speedup 3.92911)
582tricount time:         0.000005 sec (saxpy method)
583tri+prep time:         0.000009 sec (incl time to compute L)
584compute C time:        0.000005 sec
585reduce (C) time:       0.000000 sec
586rate     0.91 million edges/sec (incl time for L=tril(A))
587rate     1.47 million edges/sec (just tricount itself)
588
589--------------------------------------------------------------
590matrix 304 by 304, 876 entries, from stdin
591
592total time to read A matrix:       0.000394 sec
593
594n 304 # edges 438
595U=triu(A) time:        0.000025 sec
596L=tril(A) time:        0.000008 sec
597
598------------------------------------- dot product method:
599# triangles 0
600L*U' time (dot):         0.000036 sec
601tricount time:         0.000039 sec (dot product method)
602tri+prep time:         0.000071 sec (incl time to compute L and U)
603compute C time:        0.000036 sec
604reduce (C) time:       0.000002 sec
605rate     6.16 million edges/sec (incl time for U=triu(A))
606rate    11.25 million edges/sec (just tricount itself)
607L*U' time (dot):         0.000010 sec (nthreads: 2 speedup 3.71621)
608tricount time:         0.000011 sec (dot product method)
609tri+prep time:         0.000043 sec (incl time to compute L and U)
610compute C time:        0.000010 sec
611reduce (C) time:       0.000001 sec
612rate    10.23 million edges/sec (incl time for U=triu(A))
613rate    41.16 million edges/sec (just tricount itself)
614L*U' time (dot):         0.000008 sec (nthreads: 4 speedup 4.44921)
615tricount time:         0.000009 sec (dot product method)
616tri+prep time:         0.000041 sec (incl time to compute L and U)
617compute C time:        0.000008 sec
618reduce (C) time:       0.000001 sec
619rate    10.67 million edges/sec (incl time for U=triu(A))
620rate    49.47 million edges/sec (just tricount itself)
621L*U' time (dot):         0.000008 sec (nthreads: 8 speedup 4.49582)
622tricount time:         0.000009 sec (dot product method)
623tri+prep time:         0.000041 sec (incl time to compute L and U)
624compute C time:        0.000008 sec
625reduce (C) time:       0.000001 sec
626rate    10.69 million edges/sec (incl time for U=triu(A))
627rate    49.93 million edges/sec (just tricount itself)
628L*U' time (dot):         0.000008 sec
629tricount time:         0.000008 sec (dot product method)
630tri+prep time:         0.000041 sec (incl time to compute L and U)
631compute C time:        0.000008 sec
632reduce (C) time:       0.000001 sec
633rate    10.80 million edges/sec (incl time for U=triu(A))
634rate    52.26 million edges/sec (just tricount itself)
635L*U' time (dot):         0.000007 sec (nthreads: 2 speedup 1.081)
636tricount time:         0.000008 sec (dot product method)
637tri+prep time:         0.000040 sec (incl time to compute L and U)
638compute C time:        0.000007 sec
639reduce (C) time:       0.000001 sec
640rate    10.99 million edges/sec (incl time for U=triu(A))
641rate    57.20 million edges/sec (just tricount itself)
642L*U' time (dot):         0.000007 sec (nthreads: 4 speedup 1.09474)
643tricount time:         0.000008 sec (dot product method)
644tri+prep time:         0.000040 sec (incl time to compute L and U)
645compute C time:        0.000007 sec
646reduce (C) time:       0.000001 sec
647rate    11.03 million edges/sec (incl time for U=triu(A))
648rate    58.17 million edges/sec (just tricount itself)
649L*U' time (dot):         0.000007 sec (nthreads: 8 speedup 1.109)
650tricount time:         0.000007 sec (dot product method)
651tri+prep time:         0.000040 sec (incl time to compute L and U)
652compute C time:        0.000007 sec
653reduce (C) time:       0.000001 sec
654rate    11.05 million edges/sec (incl time for U=triu(A))
655rate    58.79 million edges/sec (just tricount itself)
656
657----------------------------------- saxpy method:
658C<L>=L*L time (saxpy):         0.000048 sec
659tricount time:         0.000048 sec (saxpy method)
660tri+prep time:         0.000056 sec (incl time to compute L)
661compute C time:        0.000048 sec
662reduce (C) time:       0.000001 sec
663rate     7.82 million edges/sec (incl time for L=tril(A))
664rate     9.06 million edges/sec (just tricount itself)
665C<L>=L*L time (saxpy):         0.000028 sec (nthreads: 2 speedup 1.71429)
666tricount time:         0.000028 sec (saxpy method)
667tri+prep time:         0.000036 sec (incl time to compute L)
668compute C time:        0.000028 sec
669reduce (C) time:       0.000000 sec
670rate    12.16 million edges/sec (incl time for L=tril(A))
671rate    15.46 million edges/sec (just tricount itself)
672C<L>=L*L time (saxpy):         0.000027 sec (nthreads: 4 speedup 1.75412)
673tricount time:         0.000028 sec (saxpy method)
674tri+prep time:         0.000035 sec (incl time to compute L)
675compute C time:        0.000027 sec
676reduce (C) time:       0.000000 sec
677rate    12.39 million edges/sec (incl time for L=tril(A))
678rate    15.83 million edges/sec (just tricount itself)
679C<L>=L*L time (saxpy):         0.000030 sec (nthreads: 8 speedup 1.59233)
680tricount time:         0.000030 sec (saxpy method)
681tri+prep time:         0.000038 sec (incl time to compute L)
682compute C time:        0.000030 sec
683reduce (C) time:       0.000000 sec
684rate    11.49 million edges/sec (incl time for L=tril(A))
685rate    14.39 million edges/sec (just tricount itself)
686
687--------------------------------------------------------------
688matrix 48 by 48, 352 entries, from stdin
689
690total time to read A matrix:       0.000287 sec
691
692n 48 # edges 176
693U=triu(A) time:        0.000028 sec
694L=tril(A) time:        0.000009 sec
695
696------------------------------------- dot product method:
697# triangles 160
698L*U' time (dot):         0.000043 sec
699tricount time:         0.000047 sec (dot product method)
700tri+prep time:         0.000084 sec (incl time to compute L and U)
701compute C time:        0.000043 sec
702reduce (C) time:       0.000003 sec
703rate     2.10 million edges/sec (incl time for U=triu(A))
704rate     3.78 million edges/sec (just tricount itself)
705L*U' time (dot):         0.000010 sec (nthreads: 2 speedup 4.16297)
706tricount time:         0.000012 sec (dot product method)
707tri+prep time:         0.000049 sec (incl time to compute L and U)
708compute C time:        0.000010 sec
709reduce (C) time:       0.000001 sec
710rate     3.60 million edges/sec (incl time for U=triu(A))
711rate    14.98 million edges/sec (just tricount itself)
712L*U' time (dot):         0.000007 sec (nthreads: 4 speedup 5.94283)
713tricount time:         0.000008 sec (dot product method)
714tri+prep time:         0.000045 sec (incl time to compute L and U)
715compute C time:        0.000007 sec
716reduce (C) time:       0.000001 sec
717rate     3.88 million edges/sec (incl time for U=triu(A))
718rate    21.31 million edges/sec (just tricount itself)
719L*U' time (dot):         0.000011 sec (nthreads: 8 speedup 4.00864)
720tricount time:         0.000012 sec (dot product method)
721tri+prep time:         0.000049 sec (incl time to compute L and U)
722compute C time:        0.000011 sec
723reduce (C) time:       0.000001 sec
724rate     3.57 million edges/sec (incl time for U=triu(A))
725rate    14.40 million edges/sec (just tricount itself)
726L*U' time (dot):         0.000014 sec
727tricount time:         0.000015 sec (dot product method)
728tri+prep time:         0.000052 sec (incl time to compute L and U)
729compute C time:        0.000014 sec
730reduce (C) time:       0.000001 sec
731rate     3.37 million edges/sec (incl time for U=triu(A))
732rate    11.63 million edges/sec (just tricount itself)
733L*U' time (dot):         0.000009 sec (nthreads: 2 speedup 1.56815)
734tricount time:         0.000010 sec (dot product method)
735tri+prep time:         0.000047 sec (incl time to compute L and U)
736compute C time:        0.000009 sec
737reduce (C) time:       0.000001 sec
738rate     3.76 million edges/sec (incl time for U=triu(A))
739rate    18.17 million edges/sec (just tricount itself)
740L*U' time (dot):         0.000011 sec (nthreads: 4 speedup 1.21667)
741tricount time:         0.000013 sec (dot product method)
742tri+prep time:         0.000050 sec (incl time to compute L and U)
743compute C time:        0.000011 sec
744reduce (C) time:       0.000001 sec
745rate     3.54 million edges/sec (incl time for U=triu(A))
746rate    13.86 million edges/sec (just tricount itself)
747L*U' time (dot):         0.000011 sec (nthreads: 8 speedup 1.20288)
748tricount time:         0.000013 sec (dot product method)
749tri+prep time:         0.000050 sec (incl time to compute L and U)
750compute C time:        0.000011 sec
751reduce (C) time:       0.000001 sec
752rate     3.53 million edges/sec (incl time for U=triu(A))
753rate    13.81 million edges/sec (just tricount itself)
754
755----------------------------------- saxpy method:
756C<L>=L*L time (saxpy):         0.000047 sec
757tricount time:         0.000048 sec (saxpy method)
758tri+prep time:         0.000057 sec (incl time to compute L)
759compute C time:        0.000047 sec
760reduce (C) time:       0.000001 sec
761rate     3.06 million edges/sec (incl time for L=tril(A))
762rate     3.66 million edges/sec (just tricount itself)
763C<L>=L*L time (saxpy):         0.000019 sec (nthreads: 2 speedup 2.466)
764tricount time:         0.000020 sec (saxpy method)
765tri+prep time:         0.000029 sec (incl time to compute L)
766compute C time:        0.000019 sec
767reduce (C) time:       0.000001 sec
768rate     6.04 million edges/sec (incl time for L=tril(A))
769rate     8.93 million edges/sec (just tricount itself)
770C<L>=L*L time (saxpy):         0.000013 sec (nthreads: 4 speedup 3.50546)
771tricount time:         0.000014 sec (saxpy method)
772tri+prep time:         0.000023 sec (incl time to compute L)
773compute C time:        0.000013 sec
774reduce (C) time:       0.000001 sec
775rate     7.51 million edges/sec (incl time for L=tril(A))
776rate    12.58 million edges/sec (just tricount itself)
777C<L>=L*L time (saxpy):         0.000015 sec (nthreads: 8 speedup 3.0676)
778tricount time:         0.000016 sec (saxpy method)
779tri+prep time:         0.000025 sec (incl time to compute L)
780compute C time:        0.000015 sec
781reduce (C) time:       0.000001 sec
782rate     6.91 million edges/sec (incl time for L=tril(A))
783rate    10.97 million edges/sec (just tricount itself)
784
785--------------------------------------------------------------
786matrix 4884 by 4884, 285494 entries, from stdin
787
788total time to read A matrix:       0.073128 sec
789
790n 4884 # edges 142747
791U=triu(A) time:        0.000225 sec
792L=tril(A) time:        0.000142 sec
793
794------------------------------------- dot product method:
795# triangles 1512964
796L*U' time (dot):         0.013911 sec
797tricount time:         0.014396 sec (dot product method)
798tri+prep time:         0.014764 sec (incl time to compute L and U)
799compute C time:        0.013911 sec
800reduce (C) time:       0.000486 sec
801rate     9.67 million edges/sec (incl time for U=triu(A))
802rate     9.92 million edges/sec (just tricount itself)
803L*U' time (dot):         0.006919 sec (nthreads: 2 speedup 2.01037)
804tricount time:         0.007159 sec (dot product method)
805tri+prep time:         0.007527 sec (incl time to compute L and U)
806compute C time:        0.006919 sec
807reduce (C) time:       0.000239 sec
808rate    18.97 million edges/sec (incl time for U=triu(A))
809rate    19.94 million edges/sec (just tricount itself)
810L*U' time (dot):         0.003827 sec (nthreads: 4 speedup 3.63466)
811tricount time:         0.004121 sec (dot product method)
812tri+prep time:         0.004488 sec (incl time to compute L and U)
813compute C time:        0.003827 sec
814reduce (C) time:       0.000293 sec
815rate    31.80 million edges/sec (incl time for U=triu(A))
816rate    34.64 million edges/sec (just tricount itself)
817L*U' time (dot):         0.005970 sec (nthreads: 8 speedup 2.33004)
818tricount time:         0.006280 sec (dot product method)
819tri+prep time:         0.006648 sec (incl time to compute L and U)
820compute C time:        0.005970 sec
821reduce (C) time:       0.000310 sec
822rate    21.47 million edges/sec (incl time for U=triu(A))
823rate    22.73 million edges/sec (just tricount itself)
824L*U' time (dot):         0.015373 sec
825tricount time:         0.015847 sec (dot product method)
826tri+prep time:         0.016215 sec (incl time to compute L and U)
827compute C time:        0.015373 sec
828reduce (C) time:       0.000475 sec
829rate     8.80 million edges/sec (incl time for U=triu(A))
830rate     9.01 million edges/sec (just tricount itself)
831L*U' time (dot):         0.007376 sec (nthreads: 2 speedup 2.08416)
832tricount time:         0.007622 sec (dot product method)
833tri+prep time:         0.007989 sec (incl time to compute L and U)
834compute C time:        0.007376 sec
835reduce (C) time:       0.000246 sec
836rate    17.87 million edges/sec (incl time for U=triu(A))
837rate    18.73 million edges/sec (just tricount itself)
838L*U' time (dot):         0.004246 sec (nthreads: 4 speedup 3.62042)
839tricount time:         0.004506 sec (dot product method)
840tri+prep time:         0.004874 sec (incl time to compute L and U)
841compute C time:        0.004246 sec
842reduce (C) time:       0.000260 sec
843rate    29.29 million edges/sec (incl time for U=triu(A))
844rate    31.68 million edges/sec (just tricount itself)
845L*U' time (dot):         0.006729 sec (nthreads: 8 speedup 2.28465)
846tricount time:         0.007020 sec (dot product method)
847tri+prep time:         0.007388 sec (incl time to compute L and U)
848compute C time:        0.006729 sec
849reduce (C) time:       0.000292 sec
850rate    19.32 million edges/sec (incl time for U=triu(A))
851rate    20.33 million edges/sec (just tricount itself)
852
853----------------------------------- saxpy method:
854C<L>=L*L time (saxpy):         0.014019 sec
855tricount time:         0.014413 sec (saxpy method)
856tri+prep time:         0.014556 sec (incl time to compute L)
857compute C time:        0.014019 sec
858reduce (C) time:       0.000394 sec
859rate     9.81 million edges/sec (incl time for L=tril(A))
860rate     9.90 million edges/sec (just tricount itself)
861C<L>=L*L time (saxpy):         0.007036 sec (nthreads: 2 speedup 1.99254)
862tricount time:         0.007225 sec (saxpy method)
863tri+prep time:         0.007367 sec (incl time to compute L)
864compute C time:        0.007036 sec
865reduce (C) time:       0.000189 sec
866rate    19.38 million edges/sec (incl time for L=tril(A))
867rate    19.76 million edges/sec (just tricount itself)
868C<L>=L*L time (saxpy):         0.004042 sec (nthreads: 4 speedup 3.46866)
869tricount time:         0.004236 sec (saxpy method)
870tri+prep time:         0.004378 sec (incl time to compute L)
871compute C time:        0.004042 sec
872reduce (C) time:       0.000194 sec
873rate    32.60 million edges/sec (incl time for L=tril(A))
874rate    33.70 million edges/sec (just tricount itself)
875C<L>=L*L time (saxpy):         0.003180 sec (nthreads: 8 speedup 4.40848)
876tricount time:         0.003398 sec (saxpy method)
877tri+prep time:         0.003540 sec (incl time to compute L)
878compute C time:        0.003180 sec
879reduce (C) time:       0.000218 sec
880rate    40.32 million edges/sec (incl time for L=tril(A))
881rate    42.01 million edges/sec (just tricount itself)
882
883--------------------------------------------------------------
884matrix 183 by 183, 1402 entries, from stdin
885
886total time to read A matrix:       0.000637 sec
887
888n 183 # edges 701
889U=triu(A) time:        0.000030 sec
890L=tril(A) time:        0.000010 sec
891
892------------------------------------- dot product method:
893# triangles 863
894L*U' time (dot):         0.000067 sec
895tricount time:         0.000072 sec (dot product method)
896tri+prep time:         0.000112 sec (incl time to compute L and U)
897compute C time:        0.000067 sec
898reduce (C) time:       0.000005 sec
899rate     6.28 million edges/sec (incl time for U=triu(A))
900rate     9.71 million edges/sec (just tricount itself)
901L*U' time (dot):         0.000039 sec (nthreads: 2 speedup 1.74055)
902tricount time:         0.000042 sec (dot product method)
903tri+prep time:         0.000081 sec (incl time to compute L and U)
904compute C time:        0.000039 sec
905reduce (C) time:       0.000003 sec
906rate     8.65 million edges/sec (incl time for U=triu(A))
907rate    16.83 million edges/sec (just tricount itself)
908L*U' time (dot):         0.000035 sec (nthreads: 4 speedup 1.92135)
909tricount time:         0.000038 sec (dot product method)
910tri+prep time:         0.000077 sec (incl time to compute L and U)
911compute C time:        0.000035 sec
912reduce (C) time:       0.000003 sec
913rate     9.09 million edges/sec (incl time for U=triu(A))
914rate    18.58 million edges/sec (just tricount itself)
915L*U' time (dot):         0.000032 sec (nthreads: 8 speedup 2.0853)
916tricount time:         0.000035 sec (dot product method)
917tri+prep time:         0.000074 sec (incl time to compute L and U)
918compute C time:        0.000032 sec
919reduce (C) time:       0.000003 sec
920rate     9.45 million edges/sec (incl time for U=triu(A))
921rate    20.15 million edges/sec (just tricount itself)
922L*U' time (dot):         0.000044 sec
923tricount time:         0.000047 sec (dot product method)
924tri+prep time:         0.000086 sec (incl time to compute L and U)
925compute C time:        0.000044 sec
926reduce (C) time:       0.000003 sec
927rate     8.15 million edges/sec (incl time for U=triu(A))
928rate    15.02 million edges/sec (just tricount itself)
929L*U' time (dot):         0.000040 sec (nthreads: 2 speedup 1.10621)
930tricount time:         0.000042 sec (dot product method)
931tri+prep time:         0.000082 sec (incl time to compute L and U)
932compute C time:        0.000040 sec
933reduce (C) time:       0.000003 sec
934rate     8.59 million edges/sec (incl time for U=triu(A))
935rate    16.61 million edges/sec (just tricount itself)
936L*U' time (dot):         0.000050 sec (nthreads: 4 speedup 0.871133)
937tricount time:         0.000053 sec (dot product method)
938tri+prep time:         0.000092 sec (incl time to compute L and U)
939compute C time:        0.000050 sec
940reduce (C) time:       0.000003 sec
941rate     7.59 million edges/sec (incl time for U=triu(A))
942rate    13.24 million edges/sec (just tricount itself)
943L*U' time (dot):         0.000035 sec (nthreads: 8 speedup 1.24625)
944tricount time:         0.000038 sec (dot product method)
945tri+prep time:         0.000077 sec (incl time to compute L and U)
946compute C time:        0.000035 sec
947reduce (C) time:       0.000002 sec
948rate     9.10 million edges/sec (incl time for U=triu(A))
949rate    18.60 million edges/sec (just tricount itself)
950
951----------------------------------- saxpy method:
952C<L>=L*L time (saxpy):         0.000059 sec
953tricount time:         0.000060 sec (saxpy method)
954tri+prep time:         0.000070 sec (incl time to compute L)
955compute C time:        0.000059 sec
956reduce (C) time:       0.000002 sec
957rate    10.03 million edges/sec (incl time for L=tril(A))
958rate    11.64 million edges/sec (just tricount itself)
959C<L>=L*L time (saxpy):         0.000035 sec (nthreads: 2 speedup 1.664)
960tricount time:         0.000037 sec (saxpy method)
961tri+prep time:         0.000046 sec (incl time to compute L)
962compute C time:        0.000035 sec
963reduce (C) time:       0.000001 sec
964rate    15.16 million edges/sec (incl time for L=tril(A))
965rate    19.14 million edges/sec (just tricount itself)
966C<L>=L*L time (saxpy):         0.000032 sec (nthreads: 4 speedup 1.80297)
967tricount time:         0.000034 sec (saxpy method)
968tri+prep time:         0.000044 sec (incl time to compute L)
969compute C time:        0.000032 sec
970reduce (C) time:       0.000001 sec
971rate    16.10 million edges/sec (incl time for L=tril(A))
972rate    20.67 million edges/sec (just tricount itself)
973C<L>=L*L time (saxpy):         0.000032 sec (nthreads: 8 speedup 1.81218)
974tricount time:         0.000034 sec (saxpy method)
975tri+prep time:         0.000043 sec (incl time to compute L)
976compute C time:        0.000032 sec
977reduce (C) time:       0.000002 sec
978rate    16.13 million edges/sec (incl time for L=tril(A))
979rate    20.72 million edges/sec (just tricount itself)
980
981--------------------------------------------------------------
982matrix 63 by 63, 246 entries, from stdin
983
984total time to read A matrix:       0.000214 sec
985
986n 63 # edges 123
987U=triu(A) time:        0.000016 sec
988L=tril(A) time:        0.000005 sec
989
990------------------------------------- dot product method:
991# triangles 0
992L*U' time (dot):         0.000021 sec
993tricount time:         0.000022 sec (dot product method)
994tri+prep time:         0.000043 sec (incl time to compute L and U)
995compute C time:        0.000021 sec
996reduce (C) time:       0.000001 sec
997rate     2.86 million edges/sec (incl time for U=triu(A))
998rate     5.53 million edges/sec (just tricount itself)
999L*U' time (dot):         0.000004 sec (nthreads: 2 speedup 4.69135)
1000tricount time:         0.000005 sec (dot product method)
1001tri+prep time:         0.000026 sec (incl time to compute L and U)
1002compute C time:        0.000004 sec
1003reduce (C) time:       0.000000 sec
1004rate     4.79 million edges/sec (incl time for U=triu(A))
1005rate    25.15 million edges/sec (just tricount itself)
1006L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 6.61758)
1007tricount time:         0.000003 sec (dot product method)
1008tri+prep time:         0.000024 sec (incl time to compute L and U)
1009compute C time:        0.000003 sec
1010reduce (C) time:       0.000000 sec
1011rate     5.07 million edges/sec (incl time for U=triu(A))
1012rate    35.47 million edges/sec (just tricount itself)
1013L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 7.13425)
1014tricount time:         0.000003 sec (dot product method)
1015tri+prep time:         0.000024 sec (incl time to compute L and U)
1016compute C time:        0.000003 sec
1017reduce (C) time:       0.000000 sec
1018rate     5.12 million edges/sec (incl time for U=triu(A))
1019rate    37.80 million edges/sec (just tricount itself)
1020L*U' time (dot):         0.000003 sec
1021tricount time:         0.000004 sec (dot product method)
1022tri+prep time:         0.000025 sec (incl time to compute L and U)
1023compute C time:        0.000003 sec
1024reduce (C) time:       0.000000 sec
1025rate     4.98 million edges/sec (incl time for U=triu(A))
1026rate    31.26 million edges/sec (just tricount itself)
1027L*U' time (dot):         0.000003 sec (nthreads: 2 speedup 1.15055)
1028tricount time:         0.000003 sec (dot product method)
1029tri+prep time:         0.000024 sec (incl time to compute L and U)
1030compute C time:        0.000003 sec
1031reduce (C) time:       0.000000 sec
1032rate     5.09 million edges/sec (incl time for U=triu(A))
1033rate    36.33 million edges/sec (just tricount itself)
1034L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 1.26994)
1035tricount time:         0.000003 sec (dot product method)
1036tri+prep time:         0.000024 sec (incl time to compute L and U)
1037compute C time:        0.000003 sec
1038reduce (C) time:       0.000000 sec
1039rate     5.16 million edges/sec (incl time for U=triu(A))
1040rate    39.87 million edges/sec (just tricount itself)
1041L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 1.35606)
1042tricount time:         0.000003 sec (dot product method)
1043tri+prep time:         0.000024 sec (incl time to compute L and U)
1044compute C time:        0.000003 sec
1045reduce (C) time:       0.000000 sec
1046rate     5.19 million edges/sec (incl time for U=triu(A))
1047rate    42.14 million edges/sec (just tricount itself)
1048
1049----------------------------------- saxpy method:
1050C<L>=L*L time (saxpy):         0.000023 sec
1051tricount time:         0.000023 sec (saxpy method)
1052tri+prep time:         0.000028 sec (incl time to compute L)
1053compute C time:        0.000023 sec
1054reduce (C) time:       0.000000 sec
1055rate     4.44 million edges/sec (incl time for L=tril(A))
1056rate     5.33 million edges/sec (just tricount itself)
1057C<L>=L*L time (saxpy):         0.000013 sec (nthreads: 2 speedup 1.72664)
1058tricount time:         0.000013 sec (saxpy method)
1059tri+prep time:         0.000018 sec (incl time to compute L)
1060compute C time:        0.000013 sec
1061reduce (C) time:       0.000000 sec
1062rate     6.83 million edges/sec (incl time for L=tril(A))
1063rate     9.19 million edges/sec (just tricount itself)
1064C<L>=L*L time (saxpy):         0.000012 sec (nthreads: 4 speedup 1.85064)
1065tricount time:         0.000012 sec (saxpy method)
1066tri+prep time:         0.000017 sec (incl time to compute L)
1067compute C time:        0.000012 sec
1068reduce (C) time:       0.000000 sec
1069rate     7.20 million edges/sec (incl time for L=tril(A))
1070rate     9.85 million edges/sec (just tricount itself)
1071C<L>=L*L time (saxpy):         0.000016 sec (nthreads: 8 speedup 1.40767)
1072tricount time:         0.000016 sec (saxpy method)
1073tri+prep time:         0.000021 sec (incl time to compute L)
1074compute C time:        0.000016 sec
1075reduce (C) time:       0.000000 sec
1076rate     5.84 million edges/sec (incl time for L=tril(A))
1077rate     7.48 million edges/sec (just tricount itself)
1078
1079--------------------------------------------------------------
1080matrix 63 by 63, 246 entries, from stdin
1081
1082total time to read A matrix:       0.000211 sec
1083
1084n 63 # edges 123
1085U=triu(A) time:        0.000019 sec
1086L=tril(A) time:        0.000005 sec
1087
1088------------------------------------- dot product method:
1089# triangles 0
1090L*U' time (dot):         0.000026 sec
1091tricount time:         0.000027 sec (dot product method)
1092tri+prep time:         0.000051 sec (incl time to compute L and U)
1093compute C time:        0.000026 sec
1094reduce (C) time:       0.000002 sec
1095rate     2.42 million edges/sec (incl time for U=triu(A))
1096rate     4.49 million edges/sec (just tricount itself)
1097L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 5.61418)
1098tricount time:         0.000005 sec (dot product method)
1099tri+prep time:         0.000028 sec (incl time to compute L and U)
1100compute C time:        0.000005 sec
1101reduce (C) time:       0.000000 sec
1102rate     4.34 million edges/sec (incl time for U=triu(A))
1103rate    24.47 million edges/sec (just tricount itself)
1104L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 8.20958)
1105tricount time:         0.000003 sec (dot product method)
1106tri+prep time:         0.000027 sec (incl time to compute L and U)
1107compute C time:        0.000003 sec
1108reduce (C) time:       0.000000 sec
1109rate     4.59 million edges/sec (incl time for U=triu(A))
1110rate    35.33 million edges/sec (just tricount itself)
1111L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 8.76997)
1112tricount time:         0.000003 sec (dot product method)
1113tri+prep time:         0.000027 sec (incl time to compute L and U)
1114compute C time:        0.000003 sec
1115reduce (C) time:       0.000000 sec
1116rate     4.62 million edges/sec (incl time for U=triu(A))
1117rate    37.63 million edges/sec (just tricount itself)
1118L*U' time (dot):         0.000003 sec
1119tricount time:         0.000004 sec (dot product method)
1120tri+prep time:         0.000027 sec (incl time to compute L and U)
1121compute C time:        0.000003 sec
1122reduce (C) time:       0.000000 sec
1123rate     4.52 million edges/sec (incl time for U=triu(A))
1124rate    31.65 million edges/sec (just tricount itself)
1125L*U' time (dot):         0.000003 sec (nthreads: 2 speedup 1.0719)
1126tricount time:         0.000004 sec (dot product method)
1127tri+prep time:         0.000027 sec (incl time to compute L and U)
1128compute C time:        0.000003 sec
1129reduce (C) time:       0.000000 sec
1130rate     4.57 million edges/sec (incl time for U=triu(A))
1131rate    34.42 million edges/sec (just tricount itself)
1132L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 1.18954)
1133tricount time:         0.000003 sec (dot product method)
1134tri+prep time:         0.000027 sec (incl time to compute L and U)
1135compute C time:        0.000003 sec
1136reduce (C) time:       0.000000 sec
1137rate     4.62 million edges/sec (incl time for U=triu(A))
1138rate    37.64 million edges/sec (just tricount itself)
1139L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 1.21036)
1140tricount time:         0.000003 sec (dot product method)
1141tri+prep time:         0.000027 sec (incl time to compute L and U)
1142compute C time:        0.000003 sec
1143reduce (C) time:       0.000000 sec
1144rate     4.64 million edges/sec (incl time for U=triu(A))
1145rate    38.43 million edges/sec (just tricount itself)
1146
1147----------------------------------- saxpy method:
1148C<L>=L*L time (saxpy):         0.000025 sec
1149tricount time:         0.000025 sec (saxpy method)
1150tri+prep time:         0.000030 sec (incl time to compute L)
1151compute C time:        0.000025 sec
1152reduce (C) time:       0.000000 sec
1153rate     4.13 million edges/sec (incl time for L=tril(A))
1154rate     4.87 million edges/sec (just tricount itself)
1155C<L>=L*L time (saxpy):         0.000013 sec (nthreads: 2 speedup 1.89479)
1156tricount time:         0.000013 sec (saxpy method)
1157tri+prep time:         0.000018 sec (incl time to compute L)
1158compute C time:        0.000013 sec
1159reduce (C) time:       0.000000 sec
1160rate     6.84 million edges/sec (incl time for L=tril(A))
1161rate     9.15 million edges/sec (just tricount itself)
1162C<L>=L*L time (saxpy):         0.000012 sec (nthreads: 4 speedup 1.99589)
1163tricount time:         0.000013 sec (saxpy method)
1164tri+prep time:         0.000017 sec (incl time to compute L)
1165compute C time:        0.000012 sec
1166reduce (C) time:       0.000000 sec
1167rate     7.11 million edges/sec (incl time for L=tril(A))
1168rate     9.65 million edges/sec (just tricount itself)
1169C<L>=L*L time (saxpy):         0.000016 sec (nthreads: 8 speedup 1.52101)
1170tricount time:         0.000017 sec (saxpy method)
1171tri+prep time:         0.000021 sec (incl time to compute L)
1172compute C time:        0.000016 sec
1173reduce (C) time:       0.000000 sec
1174rate     5.80 million edges/sec (incl time for L=tril(A))
1175rate     7.39 million edges/sec (just tricount itself)
1176
1177--------------------------------------------------------------
1178matrix 78 by 78, 204 entries, from stdin
1179
1180total time to read A matrix:       0.000179 sec
1181
1182n 78 # edges 102
1183U=triu(A) time:        0.000017 sec
1184L=tril(A) time:        0.000004 sec
1185
1186------------------------------------- dot product method:
1187# triangles 0
1188L*U' time (dot):         0.000021 sec
1189tricount time:         0.000023 sec (dot product method)
1190tri+prep time:         0.000044 sec (incl time to compute L and U)
1191compute C time:        0.000021 sec
1192reduce (C) time:       0.000002 sec
1193rate     2.32 million edges/sec (incl time for U=triu(A))
1194rate     4.46 million edges/sec (just tricount itself)
1195L*U' time (dot):         0.000005 sec (nthreads: 2 speedup 4.33904)
1196tricount time:         0.000005 sec (dot product method)
1197tri+prep time:         0.000026 sec (incl time to compute L and U)
1198compute C time:        0.000005 sec
1199reduce (C) time:       0.000001 sec
1200rate     3.85 million edges/sec (incl time for U=triu(A))
1201rate    18.71 million edges/sec (just tricount itself)
1202L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 6.50257)
1203tricount time:         0.000004 sec (dot product method)
1204tri+prep time:         0.000025 sec (incl time to compute L and U)
1205compute C time:        0.000003 sec
1206reduce (C) time:       0.000000 sec
1207rate     4.13 million edges/sec (incl time for U=triu(A))
1208rate    28.13 million edges/sec (just tricount itself)
1209L*U' time (dot):         0.000003 sec (nthreads: 8 speedup 7.15529)
1210tricount time:         0.000003 sec (dot product method)
1211tri+prep time:         0.000024 sec (incl time to compute L and U)
1212compute C time:        0.000003 sec
1213reduce (C) time:       0.000000 sec
1214rate     4.19 million edges/sec (incl time for U=triu(A))
1215rate    30.93 million edges/sec (just tricount itself)
1216L*U' time (dot):         0.000003 sec
1217tricount time:         0.000004 sec (dot product method)
1218tri+prep time:         0.000025 sec (incl time to compute L and U)
1219compute C time:        0.000003 sec
1220reduce (C) time:       0.000000 sec
1221rate     4.13 million edges/sec (incl time for U=triu(A))
1222rate    27.94 million edges/sec (just tricount itself)
1223L*U' time (dot):         0.000003 sec (nthreads: 2 speedup 1.08112)
1224tricount time:         0.000003 sec (dot product method)
1225tri+prep time:         0.000024 sec (incl time to compute L and U)
1226compute C time:        0.000003 sec
1227reduce (C) time:       0.000000 sec
1228rate     4.18 million edges/sec (incl time for U=triu(A))
1229rate    30.56 million edges/sec (just tricount itself)
1230L*U' time (dot):         0.000003 sec (nthreads: 4 speedup 1.16769)
1231tricount time:         0.000003 sec (dot product method)
1232tri+prep time:         0.000024 sec (incl time to compute L and U)
1233compute C time:        0.000003 sec
1234reduce (C) time:       0.000000 sec
1235rate     4.22 million edges/sec (incl time for U=triu(A))
1236rate    32.77 million edges/sec (just tricount itself)
1237L*U' time (dot):         0.000004 sec (nthreads: 8 speedup 0.821643)
1238tricount time:         0.000004 sec (dot product method)
1239tri+prep time:         0.000025 sec (incl time to compute L and U)
1240compute C time:        0.000004 sec
1241reduce (C) time:       0.000000 sec
1242rate     4.01 million edges/sec (incl time for U=triu(A))
1243rate    23.28 million edges/sec (just tricount itself)
1244
1245----------------------------------- saxpy method:
1246C<L>=L*L time (saxpy):         0.000022 sec
1247tricount time:         0.000022 sec (saxpy method)
1248tri+prep time:         0.000026 sec (incl time to compute L)
1249compute C time:        0.000022 sec
1250reduce (C) time:       0.000000 sec
1251rate     3.86 million edges/sec (incl time for L=tril(A))
1252rate     4.59 million edges/sec (just tricount itself)
1253C<L>=L*L time (saxpy):         0.000013 sec (nthreads: 2 speedup 1.7295)
1254tricount time:         0.000013 sec (saxpy method)
1255tri+prep time:         0.000017 sec (incl time to compute L)
1256compute C time:        0.000013 sec
1257reduce (C) time:       0.000000 sec
1258rate     5.94 million edges/sec (incl time for L=tril(A))
1259rate     7.86 million edges/sec (just tricount itself)
1260C<L>=L*L time (saxpy):         0.000012 sec (nthreads: 4 speedup 1.76988)
1261tricount time:         0.000013 sec (saxpy method)
1262tri+prep time:         0.000017 sec (incl time to compute L)
1263compute C time:        0.000012 sec
1264reduce (C) time:       0.000000 sec
1265rate     6.06 million edges/sec (incl time for L=tril(A))
1266rate     8.07 million edges/sec (just tricount itself)
1267C<L>=L*L time (saxpy):         0.000025 sec (nthreads: 8 speedup 0.885225)
1268tricount time:         0.000025 sec (saxpy method)
1269tri+prep time:         0.000029 sec (incl time to compute L)
1270compute C time:        0.000025 sec
1271reduce (C) time:       0.000000 sec
1272rate     3.49 million edges/sec (incl time for L=tril(A))
1273rate     4.07 million edges/sec (just tricount itself)
1274
1275--------------------------------------------------------------
1276matrix 982 by 982, 99840 entries, from stdin
1277
1278total time to read A matrix:       0.029471 sec
1279
1280n 982 # edges 49920
1281U=triu(A) time:        0.000174 sec
1282L=tril(A) time:        0.000148 sec
1283
1284------------------------------------- dot product method:
1285# triangles 0
1286L*U' time (dot):         0.000362 sec
1287tricount time:         0.000395 sec (dot product method)
1288tri+prep time:         0.000716 sec (incl time to compute L and U)
1289compute C time:        0.000362 sec
1290reduce (C) time:       0.000032 sec
1291rate    69.75 million edges/sec (incl time for U=triu(A))
1292rate   126.53 million edges/sec (just tricount itself)
1293L*U' time (dot):         0.000333 sec (nthreads: 2 speedup 1.08962)
1294tricount time:         0.000363 sec (dot product method)
1295tri+prep time:         0.000684 sec (incl time to compute L and U)
1296compute C time:        0.000333 sec
1297reduce (C) time:       0.000030 sec
1298rate    72.98 million edges/sec (incl time for U=triu(A))
1299rate   137.56 million edges/sec (just tricount itself)
1300L*U' time (dot):         0.000273 sec (nthreads: 4 speedup 1.32884)
1301tricount time:         0.000318 sec (dot product method)
1302tri+prep time:         0.000639 sec (incl time to compute L and U)
1303compute C time:        0.000273 sec
1304reduce (C) time:       0.000045 sec
1305rate    78.14 million edges/sec (incl time for U=triu(A))
1306rate   157.11 million edges/sec (just tricount itself)
1307L*U' time (dot):         0.002922 sec (nthreads: 8 speedup 0.124043)
1308tricount time:         0.002957 sec (dot product method)
1309tri+prep time:         0.003278 sec (incl time to compute L and U)
1310compute C time:        0.002922 sec
1311reduce (C) time:       0.000035 sec
1312rate    15.23 million edges/sec (incl time for U=triu(A))
1313rate    16.88 million edges/sec (just tricount itself)
1314L*U' time (dot):         0.000374 sec
1315tricount time:         0.000402 sec (dot product method)
1316tri+prep time:         0.000723 sec (incl time to compute L and U)
1317compute C time:        0.000374 sec
1318reduce (C) time:       0.000028 sec
1319rate    69.08 million edges/sec (incl time for U=triu(A))
1320rate   124.32 million edges/sec (just tricount itself)
1321L*U' time (dot):         0.000279 sec (nthreads: 2 speedup 1.33844)
1322tricount time:         0.000307 sec (dot product method)
1323tri+prep time:         0.000628 sec (incl time to compute L and U)
1324compute C time:        0.000279 sec
1325reduce (C) time:       0.000027 sec
1326rate    79.52 million edges/sec (incl time for U=triu(A))
1327rate   162.79 million edges/sec (just tricount itself)
1328L*U' time (dot):         0.000236 sec (nthreads: 4 speedup 1.58021)
1329tricount time:         0.000266 sec (dot product method)
1330tri+prep time:         0.000587 sec (incl time to compute L and U)
1331compute C time:        0.000236 sec
1332reduce (C) time:       0.000030 sec
1333rate    85.01 million edges/sec (incl time for U=triu(A))
1334rate   187.59 million edges/sec (just tricount itself)
1335L*U' time (dot):         0.001664 sec (nthreads: 8 speedup 0.224596)
1336tricount time:         0.001696 sec (dot product method)
1337tri+prep time:         0.002017 sec (incl time to compute L and U)
1338compute C time:        0.001664 sec
1339reduce (C) time:       0.000033 sec
1340rate    24.75 million edges/sec (incl time for U=triu(A))
1341rate    29.43 million edges/sec (just tricount itself)
1342
1343----------------------------------- saxpy method:
1344C<L>=L*L time (saxpy):         0.000412 sec
1345tricount time:         0.000413 sec (saxpy method)
1346tri+prep time:         0.000560 sec (incl time to compute L)
1347compute C time:        0.000412 sec
1348reduce (C) time:       0.000001 sec
1349rate    89.11 million edges/sec (incl time for L=tril(A))
1350rate   120.99 million edges/sec (just tricount itself)
1351C<L>=L*L time (saxpy):         0.000348 sec (nthreads: 2 speedup 1.1835)
1352tricount time:         0.000349 sec (saxpy method)
1353tri+prep time:         0.000496 sec (incl time to compute L)
1354compute C time:        0.000348 sec
1355reduce (C) time:       0.000001 sec
1356rate   100.62 million edges/sec (incl time for L=tril(A))
1357rate   143.23 million edges/sec (just tricount itself)
1358C<L>=L*L time (saxpy):         0.000373 sec (nthreads: 4 speedup 1.10476)
1359tricount time:         0.000373 sec (saxpy method)
1360tri+prep time:         0.000521 sec (incl time to compute L)
1361compute C time:        0.000373 sec
1362reduce (C) time:       0.000001 sec
1363rate    95.85 million edges/sec (incl time for L=tril(A))
1364rate   133.75 million edges/sec (just tricount itself)
1365C<L>=L*L time (saxpy):         0.000377 sec (nthreads: 8 speedup 1.0916)
1366tricount time:         0.000378 sec (saxpy method)
1367tri+prep time:         0.000526 sec (incl time to compute L)
1368compute C time:        0.000377 sec
1369reduce (C) time:       0.000001 sec
1370rate    94.97 million edges/sec (incl time for L=tril(A))
1371rate   132.06 million edges/sec (just tricount itself)
1372
1373--------------------------------------------------------------
1374matrix 67 by 67, 574 entries, from stdin
1375
1376total time to read A matrix:       0.000275 sec
1377
1378n 67 # edges 287
1379U=triu(A) time:        0.000024 sec
1380L=tril(A) time:        0.000007 sec
1381
1382------------------------------------- dot product method:
1383# triangles 120
1384L*U' time (dot):         0.000032 sec
1385tricount time:         0.000035 sec (dot product method)
1386tri+prep time:         0.000065 sec (incl time to compute L and U)
1387compute C time:        0.000032 sec
1388reduce (C) time:       0.000003 sec
1389rate     4.41 million edges/sec (incl time for U=triu(A))
1390rate     8.25 million edges/sec (just tricount itself)
1391L*U' time (dot):         0.000011 sec (nthreads: 2 speedup 2.86698)
1392tricount time:         0.000012 sec (dot product method)
1393tri+prep time:         0.000043 sec (incl time to compute L and U)
1394compute C time:        0.000011 sec
1395reduce (C) time:       0.000001 sec
1396rate     6.75 million edges/sec (incl time for U=triu(A))
1397rate    23.50 million edges/sec (just tricount itself)
1398L*U' time (dot):         0.000008 sec (nthreads: 4 speedup 3.78072)
1399tricount time:         0.000009 sec (dot product method)
1400tri+prep time:         0.000040 sec (incl time to compute L and U)
1401compute C time:        0.000008 sec
1402reduce (C) time:       0.000001 sec
1403rate     7.26 million edges/sec (incl time for U=triu(A))
1404rate    31.12 million edges/sec (just tricount itself)
1405L*U' time (dot):         0.000007 sec (nthreads: 8 speedup 4.41994)
1406tricount time:         0.000008 sec (dot product method)
1407tri+prep time:         0.000038 sec (incl time to compute L and U)
1408compute C time:        0.000007 sec
1409reduce (C) time:       0.000001 sec
1410rate     7.52 million edges/sec (incl time for U=triu(A))
1411rate    36.41 million edges/sec (just tricount itself)
1412L*U' time (dot):         0.000012 sec
1413tricount time:         0.000013 sec (dot product method)
1414tri+prep time:         0.000044 sec (incl time to compute L and U)
1415compute C time:        0.000012 sec
1416reduce (C) time:       0.000001 sec
1417rate     6.56 million edges/sec (incl time for U=triu(A))
1418rate    21.38 million edges/sec (just tricount itself)
1419L*U' time (dot):         0.000010 sec (nthreads: 2 speedup 1.20171)
1420tricount time:         0.000011 sec (dot product method)
1421tri+prep time:         0.000041 sec (incl time to compute L and U)
1422compute C time:        0.000010 sec
1423reduce (C) time:       0.000001 sec
1424rate     6.94 million edges/sec (incl time for U=triu(A))
1425rate    25.94 million edges/sec (just tricount itself)
1426L*U' time (dot):         0.000009 sec (nthreads: 4 speedup 1.38725)
1427tricount time:         0.000010 sec (dot product method)
1428tri+prep time:         0.000040 sec (incl time to compute L and U)
1429compute C time:        0.000009 sec
1430reduce (C) time:       0.000001 sec
1431rate     7.19 million edges/sec (incl time for U=triu(A))
1432rate    29.86 million edges/sec (just tricount itself)
1433L*U' time (dot):         0.000008 sec (nthreads: 8 speedup 1.50481)
1434tricount time:         0.000009 sec (dot product method)
1435tri+prep time:         0.000039 sec (incl time to compute L and U)
1436compute C time:        0.000008 sec
1437reduce (C) time:       0.000001 sec
1438rate     7.33 million edges/sec (incl time for U=triu(A))
1439rate    32.36 million edges/sec (just tricount itself)
1440
1441----------------------------------- saxpy method:
1442C<L>=L*L time (saxpy):         0.000033 sec
1443tricount time:         0.000034 sec (saxpy method)
1444tri+prep time:         0.000041 sec (incl time to compute L)
1445compute C time:        0.000033 sec
1446reduce (C) time:       0.000001 sec
1447rate     7.05 million edges/sec (incl time for L=tril(A))
1448rate     8.43 million edges/sec (just tricount itself)
1449C<L>=L*L time (saxpy):         0.000016 sec (nthreads: 2 speedup 2.01304)
1450tricount time:         0.000017 sec (saxpy method)
1451tri+prep time:         0.000024 sec (incl time to compute L)
1452compute C time:        0.000016 sec
1453reduce (C) time:       0.000001 sec
1454rate    12.08 million edges/sec (incl time for L=tril(A))
1455rate    16.79 million edges/sec (just tricount itself)
1456C<L>=L*L time (saxpy):         0.000014 sec (nthreads: 4 speedup 2.44745)
1457tricount time:         0.000014 sec (saxpy method)
1458tri+prep time:         0.000021 sec (incl time to compute L)
1459compute C time:        0.000014 sec
1460reduce (C) time:       0.000001 sec
1461rate    13.81 million edges/sec (incl time for L=tril(A))
1462rate    20.31 million edges/sec (just tricount itself)
1463C<L>=L*L time (saxpy):         0.000014 sec (nthreads: 8 speedup 2.43042)
1464tricount time:         0.000014 sec (saxpy method)
1465tri+prep time:         0.000021 sec (incl time to compute L)
1466compute C time:        0.000014 sec
1467reduce (C) time:       0.000001 sec
1468rate    13.68 million edges/sec (incl time for L=tril(A))
1469rate    20.04 million edges/sec (just tricount itself)
1470
1471--------------------------------------------------------------
1472Wathen: nx 200 ny 200 n 120801 nz 1762400 method 0, time: 0.166 sec
1473
1474total time to read A matrix:       0.168617 sec
1475
1476n 120801 # edges 881200
1477U=triu(A) time:        0.002978 sec
1478L=tril(A) time:        0.002865 sec
1479
1480------------------------------------- dot product method:
1481# triangles 2160400
1482L*U' time (dot):         0.029921 sec
1483tricount time:         0.032427 sec (dot product method)
1484tri+prep time:         0.038270 sec (incl time to compute L and U)
1485compute C time:        0.029921 sec
1486reduce (C) time:       0.002506 sec
1487rate    23.03 million edges/sec (incl time for U=triu(A))
1488rate    27.18 million edges/sec (just tricount itself)
1489L*U' time (dot):         0.011246 sec (nthreads: 2 speedup 2.66055)
1490tricount time:         0.012483 sec (dot product method)
1491tri+prep time:         0.018327 sec (incl time to compute L and U)
1492compute C time:        0.011246 sec
1493reduce (C) time:       0.001237 sec
1494rate    48.08 million edges/sec (incl time for U=triu(A))
1495rate    70.59 million edges/sec (just tricount itself)
1496L*U' time (dot):         0.008601 sec (nthreads: 4 speedup 3.47878)
1497tricount time:         0.009216 sec (dot product method)
1498tri+prep time:         0.015059 sec (incl time to compute L and U)
1499compute C time:        0.008601 sec
1500reduce (C) time:       0.000615 sec
1501rate    58.52 million edges/sec (incl time for U=triu(A))
1502rate    95.62 million edges/sec (just tricount itself)
1503L*U' time (dot):         0.006418 sec (nthreads: 8 speedup 4.66232)
1504tricount time:         0.006907 sec (dot product method)
1505tri+prep time:         0.012751 sec (incl time to compute L and U)
1506compute C time:        0.006418 sec
1507reduce (C) time:       0.000490 sec
1508rate    69.11 million edges/sec (incl time for U=triu(A))
1509rate   127.57 million edges/sec (just tricount itself)
1510L*U' time (dot):         0.023521 sec
1511tricount time:         0.026194 sec (dot product method)
1512tri+prep time:         0.032037 sec (incl time to compute L and U)
1513compute C time:        0.023521 sec
1514reduce (C) time:       0.002673 sec
1515rate    27.51 million edges/sec (incl time for U=triu(A))
1516rate    33.64 million edges/sec (just tricount itself)
1517L*U' time (dot):         0.011493 sec (nthreads: 2 speedup 2.04665)
1518tricount time:         0.012796 sec (dot product method)
1519tri+prep time:         0.018639 sec (incl time to compute L and U)
1520compute C time:        0.011493 sec
1521reduce (C) time:       0.001303 sec
1522rate    47.28 million edges/sec (incl time for U=triu(A))
1523rate    68.87 million edges/sec (just tricount itself)
1524L*U' time (dot):         0.006705 sec (nthreads: 4 speedup 3.50821)
1525tricount time:         0.007384 sec (dot product method)
1526tri+prep time:         0.013228 sec (incl time to compute L and U)
1527compute C time:        0.006705 sec
1528reduce (C) time:       0.000680 sec
1529rate    66.62 million edges/sec (incl time for U=triu(A))
1530rate   119.34 million edges/sec (just tricount itself)
1531L*U' time (dot):         0.009763 sec (nthreads: 8 speedup 2.4093)
1532tricount time:         0.010669 sec (dot product method)
1533tri+prep time:         0.016512 sec (incl time to compute L and U)
1534compute C time:        0.009763 sec
1535reduce (C) time:       0.000906 sec
1536rate    53.37 million edges/sec (incl time for U=triu(A))
1537rate    82.59 million edges/sec (just tricount itself)
1538
1539----------------------------------- saxpy method:
1540C<L>=L*L time (saxpy):         0.026566 sec
1541tricount time:         0.028627 sec (saxpy method)
1542tri+prep time:         0.031492 sec (incl time to compute L)
1543compute C time:        0.026566 sec
1544reduce (C) time:       0.002061 sec
1545rate    27.98 million edges/sec (incl time for L=tril(A))
1546rate    30.78 million edges/sec (just tricount itself)
1547C<L>=L*L time (saxpy):         0.022131 sec (nthreads: 2 speedup 1.2004)
1548tricount time:         0.023573 sec (saxpy method)
1549tri+prep time:         0.026438 sec (incl time to compute L)
1550compute C time:        0.022131 sec
1551reduce (C) time:       0.001442 sec
1552rate    33.33 million edges/sec (incl time for L=tril(A))
1553rate    37.38 million edges/sec (just tricount itself)
1554C<L>=L*L time (saxpy):         0.011668 sec (nthreads: 4 speedup 2.27679)
1555tricount time:         0.012288 sec (saxpy method)
1556tri+prep time:         0.015153 sec (incl time to compute L)
1557compute C time:        0.011668 sec
1558reduce (C) time:       0.000620 sec
1559rate    58.15 million edges/sec (incl time for L=tril(A))
1560rate    71.71 million edges/sec (just tricount itself)
1561C<L>=L*L time (saxpy):         0.016841 sec (nthreads: 8 speedup 1.57751)
1562tricount time:         0.018066 sec (saxpy method)
1563tri+prep time:         0.020931 sec (incl time to compute L)
1564compute C time:        0.016841 sec
1565reduce (C) time:       0.001225 sec
1566rate    42.10 million edges/sec (incl time for L=tril(A))
1567rate    48.78 million edges/sec (just tricount itself)
1568
1569--------------------------------------------------------------
1570random 10000 by 10000, nz: 199768, method 0 time 0.027 sec
1571
1572total time to read A matrix:       0.028004 sec
1573
1574n 10000 # edges 99884
1575U=triu(A) time:        0.000362 sec
1576L=tril(A) time:        0.000234 sec
1577
1578------------------------------------- dot product method:
1579# triangles 1357
1580L*U' time (dot):         0.011664 sec
1581tricount time:         0.011843 sec (dot product method)
1582tri+prep time:         0.012439 sec (incl time to compute L and U)
1583compute C time:        0.011664 sec
1584reduce (C) time:       0.000179 sec
1585rate     8.03 million edges/sec (incl time for U=triu(A))
1586rate     8.43 million edges/sec (just tricount itself)
1587L*U' time (dot):         0.005893 sec (nthreads: 2 speedup 1.97936)
1588tricount time:         0.006089 sec (dot product method)
1589tri+prep time:         0.006686 sec (incl time to compute L and U)
1590compute C time:        0.005893 sec
1591reduce (C) time:       0.000196 sec
1592rate    14.94 million edges/sec (incl time for U=triu(A))
1593rate    16.40 million edges/sec (just tricount itself)
1594L*U' time (dot):         0.003444 sec (nthreads: 4 speedup 3.387)
1595tricount time:         0.003609 sec (dot product method)
1596tri+prep time:         0.004206 sec (incl time to compute L and U)
1597compute C time:        0.003444 sec
1598reduce (C) time:       0.000165 sec
1599rate    23.75 million edges/sec (incl time for U=triu(A))
1600rate    27.67 million edges/sec (just tricount itself)
1601L*U' time (dot):         0.002678 sec (nthreads: 8 speedup 4.35594)
1602tricount time:         0.002885 sec (dot product method)
1603tri+prep time:         0.003481 sec (incl time to compute L and U)
1604compute C time:        0.002678 sec
1605reduce (C) time:       0.000207 sec
1606rate    28.69 million edges/sec (incl time for U=triu(A))
1607rate    34.63 million edges/sec (just tricount itself)
1608L*U' time (dot):         0.012640 sec
1609tricount time:         0.012779 sec (dot product method)
1610tri+prep time:         0.013376 sec (incl time to compute L and U)
1611compute C time:        0.012640 sec
1612reduce (C) time:       0.000139 sec
1613rate     7.47 million edges/sec (incl time for U=triu(A))
1614rate     7.82 million edges/sec (just tricount itself)
1615L*U' time (dot):         0.004852 sec (nthreads: 2 speedup 2.60499)
1616tricount time:         0.004964 sec (dot product method)
1617tri+prep time:         0.005561 sec (incl time to compute L and U)
1618compute C time:        0.004852 sec
1619reduce (C) time:       0.000112 sec
1620rate    17.96 million edges/sec (incl time for U=triu(A))
1621rate    20.12 million edges/sec (just tricount itself)
1622L*U' time (dot):         0.002892 sec (nthreads: 4 speedup 4.37131)
1623tricount time:         0.002976 sec (dot product method)
1624tri+prep time:         0.003572 sec (incl time to compute L and U)
1625compute C time:        0.002892 sec
1626reduce (C) time:       0.000085 sec
1627rate    27.96 million edges/sec (incl time for U=triu(A))
1628rate    33.56 million edges/sec (just tricount itself)
1629L*U' time (dot):         0.004180 sec (nthreads: 8 speedup 3.02402)
1630tricount time:         0.004349 sec (dot product method)
1631tri+prep time:         0.004946 sec (incl time to compute L and U)
1632compute C time:        0.004180 sec
1633reduce (C) time:       0.000169 sec
1634rate    20.20 million edges/sec (incl time for U=triu(A))
1635rate    22.97 million edges/sec (just tricount itself)
1636
1637----------------------------------- saxpy method:
1638C<L>=L*L time (saxpy):         0.003369 sec
1639tricount time:         0.003378 sec (saxpy method)
1640tri+prep time:         0.003612 sec (incl time to compute L)
1641compute C time:        0.003369 sec
1642reduce (C) time:       0.000009 sec
1643rate    27.65 million edges/sec (incl time for L=tril(A))
1644rate    29.57 million edges/sec (just tricount itself)
1645C<L>=L*L time (saxpy):         0.002108 sec (nthreads: 2 speedup 1.59874)
1646tricount time:         0.002115 sec (saxpy method)
1647tri+prep time:         0.002349 sec (incl time to compute L)
1648compute C time:        0.002108 sec
1649reduce (C) time:       0.000007 sec
1650rate    42.53 million edges/sec (incl time for L=tril(A))
1651rate    47.24 million edges/sec (just tricount itself)
1652C<L>=L*L time (saxpy):         0.001484 sec (nthreads: 4 speedup 2.27006)
1653tricount time:         0.001490 sec (saxpy method)
1654tri+prep time:         0.001724 sec (incl time to compute L)
1655compute C time:        0.001484 sec
1656reduce (C) time:       0.000006 sec
1657rate    57.92 million edges/sec (incl time for L=tril(A))
1658rate    67.02 million edges/sec (just tricount itself)
1659C<L>=L*L time (saxpy):         0.005230 sec (nthreads: 8 speedup 0.644297)
1660tricount time:         0.005238 sec (saxpy method)
1661tri+prep time:         0.005472 sec (incl time to compute L)
1662compute C time:        0.005230 sec
1663reduce (C) time:       0.000008 sec
1664rate    18.25 million edges/sec (incl time for L=tril(A))
1665rate    19.07 million edges/sec (just tricount itself)
1666
1667--------------------------------------------------------------
1668random 10000 by 10000, nz: 199768, method 1 time 0.017 sec
1669
1670total time to read A matrix:       0.017593 sec
1671
1672n 10000 # edges 99884
1673U=triu(A) time:        0.000807 sec
1674L=tril(A) time:        0.000660 sec
1675
1676------------------------------------- dot product method:
1677# triangles 1357
1678L*U' time (dot):         0.014539 sec
1679tricount time:         0.014694 sec (dot product method)
1680tri+prep time:         0.016162 sec (incl time to compute L and U)
1681compute C time:        0.014539 sec
1682reduce (C) time:       0.000156 sec
1683rate     6.18 million edges/sec (incl time for U=triu(A))
1684rate     6.80 million edges/sec (just tricount itself)
1685L*U' time (dot):         0.005467 sec (nthreads: 2 speedup 2.65947)
1686tricount time:         0.005544 sec (dot product method)
1687tri+prep time:         0.007011 sec (incl time to compute L and U)
1688compute C time:        0.005467 sec
1689reduce (C) time:       0.000077 sec
1690rate    14.25 million edges/sec (incl time for U=triu(A))
1691rate    18.02 million edges/sec (just tricount itself)
1692L*U' time (dot):         0.003181 sec (nthreads: 4 speedup 4.57045)
1693tricount time:         0.003257 sec (dot product method)
1694tri+prep time:         0.004724 sec (incl time to compute L and U)
1695compute C time:        0.003181 sec
1696reduce (C) time:       0.000076 sec
1697rate    21.14 million edges/sec (incl time for U=triu(A))
1698rate    30.67 million edges/sec (just tricount itself)
1699L*U' time (dot):         0.002482 sec (nthreads: 8 speedup 5.85712)
1700tricount time:         0.002570 sec (dot product method)
1701tri+prep time:         0.004037 sec (incl time to compute L and U)
1702compute C time:        0.002482 sec
1703reduce (C) time:       0.000088 sec
1704rate    24.74 million edges/sec (incl time for U=triu(A))
1705rate    38.87 million edges/sec (just tricount itself)
1706L*U' time (dot):         0.013548 sec
1707tricount time:         0.013735 sec (dot product method)
1708tri+prep time:         0.015202 sec (incl time to compute L and U)
1709compute C time:        0.013548 sec
1710reduce (C) time:       0.000187 sec
1711rate     6.57 million edges/sec (incl time for U=triu(A))
1712rate     7.27 million edges/sec (just tricount itself)
1713L*U' time (dot):         0.005883 sec (nthreads: 2 speedup 2.30282)
1714tricount time:         0.006074 sec (dot product method)
1715tri+prep time:         0.007542 sec (incl time to compute L and U)
1716compute C time:        0.005883 sec
1717reduce (C) time:       0.000191 sec
1718rate    13.24 million edges/sec (incl time for U=triu(A))
1719rate    16.44 million edges/sec (just tricount itself)
1720L*U' time (dot):         0.003481 sec (nthreads: 4 speedup 3.89243)
1721tricount time:         0.003664 sec (dot product method)
1722tri+prep time:         0.005131 sec (incl time to compute L and U)
1723compute C time:        0.003481 sec
1724reduce (C) time:       0.000183 sec
1725rate    19.47 million edges/sec (incl time for U=triu(A))
1726rate    27.26 million edges/sec (just tricount itself)
1727L*U' time (dot):         0.002990 sec (nthreads: 8 speedup 4.53042)
1728tricount time:         0.003239 sec (dot product method)
1729tri+prep time:         0.004706 sec (incl time to compute L and U)
1730compute C time:        0.002990 sec
1731reduce (C) time:       0.000249 sec
1732rate    21.22 million edges/sec (incl time for U=triu(A))
1733rate    30.84 million edges/sec (just tricount itself)
1734
1735----------------------------------- saxpy method:
1736C<L>=L*L time (saxpy):         0.004303 sec
1737tricount time:         0.004314 sec (saxpy method)
1738tri+prep time:         0.004974 sec (incl time to compute L)
1739compute C time:        0.004303 sec
1740reduce (C) time:       0.000011 sec
1741rate    20.08 million edges/sec (incl time for L=tril(A))
1742rate    23.15 million edges/sec (just tricount itself)
1743C<L>=L*L time (saxpy):         0.002223 sec (nthreads: 2 speedup 1.93561)
1744tricount time:         0.002230 sec (saxpy method)
1745tri+prep time:         0.002890 sec (incl time to compute L)
1746compute C time:        0.002223 sec
1747reduce (C) time:       0.000008 sec
1748rate    34.56 million edges/sec (incl time for L=tril(A))
1749rate    44.78 million edges/sec (just tricount itself)
1750C<L>=L*L time (saxpy):         0.001506 sec (nthreads: 4 speedup 2.8577)
1751tricount time:         0.001511 sec (saxpy method)
1752tri+prep time:         0.002171 sec (incl time to compute L)
1753compute C time:        0.001506 sec
1754reduce (C) time:       0.000005 sec
1755rate    46.01 million edges/sec (incl time for L=tril(A))
1756rate    66.10 million edges/sec (just tricount itself)
1757C<L>=L*L time (saxpy):         0.001319 sec (nthreads: 8 speedup 3.26257)
1758tricount time:         0.001325 sec (saxpy method)
1759tri+prep time:         0.001985 sec (incl time to compute L)
1760compute C time:        0.001319 sec
1761reduce (C) time:       0.000006 sec
1762rate    50.33 million edges/sec (incl time for L=tril(A))
1763rate    75.40 million edges/sec (just tricount itself)
1764
1765--------------------------------------------------------------
1766random 100000 by 100000, nz: 19980330, method 0 time 2.496 sec
1767
1768total time to read A matrix:       2.523121 sec
1769
1770n 100000 # edges 9990165
1771U=triu(A) time:        0.018984 sec
1772L=tril(A) time:        0.020506 sec
1773
1774------------------------------------- dot product method:
1775# triangles 1330131
1776L*U' time (dot):        10.037756 sec
1777tricount time:        10.065191 sec (dot product method)
1778tri+prep time:        10.104681 sec (incl time to compute L and U)
1779compute C time:       10.037756 sec
1780reduce (C) time:       0.027436 sec
1781rate     0.99 million edges/sec (incl time for U=triu(A))
1782rate     0.99 million edges/sec (just tricount itself)
1783L*U' time (dot):         5.268859 sec (nthreads: 2 speedup 1.90511)
1784tricount time:         5.287288 sec (dot product method)
1785tri+prep time:         5.326778 sec (incl time to compute L and U)
1786compute C time:        5.268859 sec
1787reduce (C) time:       0.018428 sec
1788rate     1.88 million edges/sec (incl time for U=triu(A))
1789rate     1.89 million edges/sec (just tricount itself)
1790L*U' time (dot):         3.710080 sec (nthreads: 4 speedup 2.70554)
1791tricount time:         3.724638 sec (dot product method)
1792tri+prep time:         3.764128 sec (incl time to compute L and U)
1793compute C time:        3.710080 sec
1794reduce (C) time:       0.014557 sec
1795rate     2.65 million edges/sec (incl time for U=triu(A))
1796rate     2.68 million edges/sec (just tricount itself)
1797L*U' time (dot):         2.599948 sec (nthreads: 8 speedup 3.86075)
1798tricount time:         2.615894 sec (dot product method)
1799tri+prep time:         2.655384 sec (incl time to compute L and U)
1800compute C time:        2.599948 sec
1801reduce (C) time:       0.015946 sec
1802rate     3.76 million edges/sec (incl time for U=triu(A))
1803rate     3.82 million edges/sec (just tricount itself)
1804L*U' time (dot):        10.711924 sec
1805tricount time:        10.739376 sec (dot product method)
1806tri+prep time:        10.778866 sec (incl time to compute L and U)
1807compute C time:       10.711924 sec
1808reduce (C) time:       0.027452 sec
1809rate     0.93 million edges/sec (incl time for U=triu(A))
1810rate     0.93 million edges/sec (just tricount itself)
1811L*U' time (dot):         6.001916 sec (nthreads: 2 speedup 1.78475)
1812tricount time:         6.019951 sec (dot product method)
1813tri+prep time:         6.059441 sec (incl time to compute L and U)
1814compute C time:        6.001916 sec
1815reduce (C) time:       0.018035 sec
1816rate     1.65 million edges/sec (incl time for U=triu(A))
1817rate     1.66 million edges/sec (just tricount itself)
1818L*U' time (dot):         3.885379 sec (nthreads: 4 speedup 2.75698)
1819tricount time:         3.899436 sec (dot product method)
1820tri+prep time:         3.938926 sec (incl time to compute L and U)
1821compute C time:        3.885379 sec
1822reduce (C) time:       0.014056 sec
1823rate     2.54 million edges/sec (incl time for U=triu(A))
1824rate     2.56 million edges/sec (just tricount itself)
1825L*U' time (dot):         2.636954 sec (nthreads: 8 speedup 4.06223)
1826tricount time:         2.652757 sec (dot product method)
1827tri+prep time:         2.692247 sec (incl time to compute L and U)
1828compute C time:        2.636954 sec
1829reduce (C) time:       0.015802 sec
1830rate     3.71 million edges/sec (incl time for U=triu(A))
1831rate     3.77 million edges/sec (just tricount itself)
1832
1833----------------------------------- saxpy method:
1834C<L>=L*L time (saxpy):         5.043538 sec
1835tricount time:         5.049500 sec (saxpy method)
1836tri+prep time:         5.070006 sec (incl time to compute L)
1837compute C time:        5.043538 sec
1838reduce (C) time:       0.005962 sec
1839rate     1.97 million edges/sec (incl time for L=tril(A))
1840rate     1.98 million edges/sec (just tricount itself)
1841C<L>=L*L time (saxpy):         3.138652 sec (nthreads: 2 speedup 1.60691)
1842tricount time:         3.141832 sec (saxpy method)
1843tri+prep time:         3.162339 sec (incl time to compute L)
1844compute C time:        3.138652 sec
1845reduce (C) time:       0.003181 sec
1846rate     3.16 million edges/sec (incl time for L=tril(A))
1847rate     3.18 million edges/sec (just tricount itself)
1848C<L>=L*L time (saxpy):         1.656651 sec (nthreads: 4 speedup 3.04442)
1849tricount time:         1.658998 sec (saxpy method)
1850tri+prep time:         1.679504 sec (incl time to compute L)
1851compute C time:        1.656651 sec
1852reduce (C) time:       0.002346 sec
1853rate     5.95 million edges/sec (incl time for L=tril(A))
1854rate     6.02 million edges/sec (just tricount itself)
1855C<L>=L*L time (saxpy):         1.782248 sec (nthreads: 8 speedup 2.82988)
1856tricount time:         1.783162 sec (saxpy method)
1857tri+prep time:         1.803668 sec (incl time to compute L)
1858compute C time:        1.782248 sec
1859reduce (C) time:       0.000914 sec
1860rate     5.54 million edges/sec (incl time for L=tril(A))
1861rate     5.60 million edges/sec (just tricount itself)
1862
1863--------------------------------------------------------------
1864random 100000 by 100000, nz: 19980330, method 1 time 1.848 sec
1865
1866total time to read A matrix:       1.877002 sec
1867
1868n 100000 # edges 9990165
1869U=triu(A) time:        0.019503 sec
1870L=tril(A) time:        0.026980 sec
1871
1872------------------------------------- dot product method:
1873# triangles 1330131
1874L*U' time (dot):         9.740372 sec
1875tricount time:         9.767869 sec (dot product method)
1876tri+prep time:         9.814351 sec (incl time to compute L and U)
1877compute C time:        9.740372 sec
1878reduce (C) time:       0.027498 sec
1879rate     1.02 million edges/sec (incl time for U=triu(A))
1880rate     1.02 million edges/sec (just tricount itself)
1881L*U' time (dot):         5.274905 sec (nthreads: 2 speedup 1.84655)
1882tricount time:         5.291900 sec (dot product method)
1883tri+prep time:         5.338383 sec (incl time to compute L and U)
1884compute C time:        5.274905 sec
1885reduce (C) time:       0.016996 sec
1886rate     1.87 million edges/sec (incl time for U=triu(A))
1887rate     1.89 million edges/sec (just tricount itself)
1888L*U' time (dot):         3.592400 sec (nthreads: 4 speedup 2.71138)
1889tricount time:         3.607020 sec (dot product method)
1890tri+prep time:         3.653502 sec (incl time to compute L and U)
1891compute C time:        3.592400 sec
1892reduce (C) time:       0.014619 sec
1893rate     2.73 million edges/sec (incl time for U=triu(A))
1894rate     2.77 million edges/sec (just tricount itself)
1895L*U' time (dot):         2.499505 sec (nthreads: 8 speedup 3.89692)
1896tricount time:         2.515554 sec (dot product method)
1897tri+prep time:         2.562037 sec (incl time to compute L and U)
1898compute C time:        2.499505 sec
1899reduce (C) time:       0.016050 sec
1900rate     3.90 million edges/sec (incl time for U=triu(A))
1901rate     3.97 million edges/sec (just tricount itself)
1902L*U' time (dot):        10.443740 sec
1903tricount time:        10.472049 sec (dot product method)
1904tri+prep time:        10.518531 sec (incl time to compute L and U)
1905compute C time:       10.443740 sec
1906reduce (C) time:       0.028309 sec
1907rate     0.95 million edges/sec (incl time for U=triu(A))
1908rate     0.95 million edges/sec (just tricount itself)
1909L*U' time (dot):         5.903907 sec (nthreads: 2 speedup 1.76895)
1910tricount time:         5.922306 sec (dot product method)
1911tri+prep time:         5.968789 sec (incl time to compute L and U)
1912compute C time:        5.903907 sec
1913reduce (C) time:       0.018399 sec
1914rate     1.67 million edges/sec (incl time for U=triu(A))
1915rate     1.69 million edges/sec (just tricount itself)
1916L*U' time (dot):         3.949521 sec (nthreads: 4 speedup 2.64431)
1917tricount time:         3.962544 sec (dot product method)
1918tri+prep time:         4.009026 sec (incl time to compute L and U)
1919compute C time:        3.949521 sec
1920reduce (C) time:       0.013023 sec
1921rate     2.49 million edges/sec (incl time for U=triu(A))
1922rate     2.52 million edges/sec (just tricount itself)
1923L*U' time (dot):         2.604668 sec (nthreads: 8 speedup 4.00962)
1924tricount time:         2.620189 sec (dot product method)
1925tri+prep time:         2.666671 sec (incl time to compute L and U)
1926compute C time:        2.604668 sec
1927reduce (C) time:       0.015521 sec
1928rate     3.75 million edges/sec (incl time for U=triu(A))
1929rate     3.81 million edges/sec (just tricount itself)
1930
1931----------------------------------- saxpy method:
1932C<L>=L*L time (saxpy):         4.623672 sec
1933tricount time:         4.629221 sec (saxpy method)
1934tri+prep time:         4.656200 sec (incl time to compute L)
1935compute C time:        4.623672 sec
1936reduce (C) time:       0.005549 sec
1937rate     2.15 million edges/sec (incl time for L=tril(A))
1938rate     2.16 million edges/sec (just tricount itself)
1939C<L>=L*L time (saxpy):         2.570878 sec (nthreads: 2 speedup 1.79848)
1940tricount time:         2.574308 sec (saxpy method)
1941tri+prep time:         2.601288 sec (incl time to compute L)
1942compute C time:        2.570878 sec
1943reduce (C) time:       0.003430 sec
1944rate     3.84 million edges/sec (incl time for L=tril(A))
1945rate     3.88 million edges/sec (just tricount itself)
1946C<L>=L*L time (saxpy):         1.508288 sec (nthreads: 4 speedup 3.06551)
1947tricount time:         1.510577 sec (saxpy method)
1948tri+prep time:         1.537557 sec (incl time to compute L)
1949compute C time:        1.508288 sec
1950reduce (C) time:       0.002289 sec
1951rate     6.50 million edges/sec (incl time for L=tril(A))
1952rate     6.61 million edges/sec (just tricount itself)
1953C<L>=L*L time (saxpy):         1.565095 sec (nthreads: 8 speedup 2.95424)
1954tricount time:         1.578662 sec (saxpy method)
1955tri+prep time:         1.605642 sec (incl time to compute L)
1956compute C time:        1.565095 sec
1957reduce (C) time:       0.013567 sec
1958rate     6.22 million edges/sec (incl time for L=tril(A))
1959rate     6.33 million edges/sec (just tricount itself)
1960
1961