1The \eslmod{cluster} module implements a generalized, efficient
2discrete single linkage clustering algorithm.
3
4The clustering algorithm tests for links on the fly, thus avoiding
5construction of an $O(N^2)$ adjacency matrix. This results in an
6algorithm of $O(N)$ memory, $O(N^2)$ time worst-case complexity for
7$N$ vertices. Average case behavior typically scales much better than
8this, as efficiently as $O(N)$ for a densely connected graph that
9forms a single cluster.
10
11In order to work on generalized vertices of any data type, the
12implementation uses an interface akin to that of the C \ccode{qsort()}
13utility: the caller provides a void pointer to an untyped array of
14vertices, the number of vertices, and the size of each vertex data
15element, and a function that can take untyped pointers to two vertices
16and compute whether they are linked or not.
17
18The API is summarized in Table~\ref{tbl:cluster_api}. Only the
19\eslmod{easel} module is required.
20
21% Table generated by autodoc -t esl_cluster.c (so don't edit here, edit esl_cluster.c:)
22\begin{table}[hbp]
23\begin{center}
24{\small
25\begin{tabular}{|ll|}\hline
26\hyperlink{func:esl_cluster_SingleLinkage()}{\ccode{esl\_cluster\_SingleLinkage()}} & Generalized single linkage clustering.\\
27\hline
28\end{tabular}
29}
30\end{center}
31\caption{The \eslmod{cluster} API.}
32\label{tbl:cluster_api}
33\end{table}
34
35\subsection{Example of using the msacluster API}
36
37An example of clustering some numbers together, according to their
38difference:
39
40\input{cexcerpts/cluster_example}
41
42The thing to pay most attention to here is the mechanism of dealing
43with vertices via generic untyped pointers; in particular, the way the
44caller-provided linkage-determining function takes its \ccode{void *}
45arguments and immediately casts them back to data types that the
46caller wants to use in computing whether the two vertices are linked.
47
48In the example here, the linkage function needs only one parameter
49from the caller, so a pointer to \ccode{threshold} itself is passed
50into the API. If your linkage function needs more parameters, you
51would define a structure that bundles them together, then pass a
52pointer to that structure into \ccode{esl\_cluster\_SingleLinkage()}.
53
54