1/* tag: Tom Lord Tue Dec 4 14:57:15 2001 (bitsets-data-sheet.doc) 2 */ 3/************************************************************************ 4 *(h0 "Data Sheet for the Hackerlab Bitsets Functions" 5 * :appendix) 6 * 7 insert*/ 8 9Package: 10 Hackerlab Bitsets (bitset data structures) 11 12Supplier: 13 regexps.com 14 15Function: 16 17 Hackerlab Bitsets provides functions for manipulating ordinary 18 bitsets, and bitset trees (very large, sparsely populated 19 bitsets). 20 21 22Key Features: 23 24 Fast and accurate. 25 26 Clean and simple "classic C" interface. 27 28 Extensive test suite included. 29 30 Postscript and HTML documentation included. 31 32 A complete set of bitset functions. These operations are 33 provided for both ordinary bitsets and bitset trees: 34 35 alloc 36 free 37 dup 38 is_member 39 is_equal 40 is_subset 41 is_empty 42 is_empty_range 43 is_full 44 is_full_range 45 adjoin 46 remove 47 toggle 48 clear 49 clear_range 50 fill 51 fill_range 52 complement 53 assign 54 union 55 intersection 56 difference 57 revdifference 58 xor 59 population 60 population_range 61 ffs ("find first set") 62 ffs_range 63 ffc ("find first clear") 64 ffc_range 65 66 67 68Licensing: 69 70 Hackerlab Bitsets is part of the Hackerlab C Library which is 71 distributed under the terms of the GNU General Public License, 72 Version 2, as published by the Free Software Foundation. 73 74 Alternative licenses (such as licenses permitting binary-only 75 re-distribution) can be purchased from regexps.com. 76 77 78Prerequisites: 79 80 standard C compiler 81 Posix libc.a (standard C library) 82 GNU Make 83 84 85Recommended and Disrecommended Applications: 86 87 Recommended for any application needing a fast and complete 88 implementation of the "standard" bitset operations. 89 90 Sparse bitset support is ideal for representing sets 91 of Unicode code points. 92 93 94Limitations: 95 96 This is the first public (non-prototype) release. 97 98 99Contribution to Executable Size: 100 101 Ordinary Bitsets 102 ---------------- 103 104 On a Pentium based machine, running gcc (egcs version 105 2.91.66) we compiled this simple (nonsense) program and 106 linked it against the Hackerlab C Library: 107 108 109 int 110 main (int argc, char * argv[]) 111 { 112 bitset_alloc (); bitset_free (); bitset_dup (); 113 bitset_is_member (); bitset_is_equal (); 114 bitset_is_subset (); bitset_is_empty (); 115 bitset_is_empty_range (); bitset_is_full (); 116 bitset_is_full_range (); bitset_adjoin (); 117 bitset_remove (); bitset_toggle (); bitset_clear (); 118 bitset_clear_range (); bitset_fill (); 119 bitset_fill_range (); bitset_complement (); 120 bitset_assign (); bitset_union (); 121 bitset_intersection (); bitset_difference (); 122 bitset_revdifference (); bitset_xor (); 123 bitset_population (); bitset_population_range (); 124 bitset_ffs (); bitset_ffs_range (); 125 bitset_ffc (); bitset_ffc_range (); 126 } 127 128 Both the library and the program were compiled with the "-g" 129 option. 130 131 Total executable size: 132 133 text data 134 16681 4864 135 136 The following list of object files from the Hackerlab 137 C library were linked into the executable: 138 139 bitset.o alloc-limits.o mem.o must-malloc.o panic.o 140 str.o cvt.o panic-exit.o char-class.o 141 142 The contribution of those files to the executable size is: 143 144 text data 145 15405 4672 146 147 148 149 Bitset Trees (Large Sparsely Populated Bitsets) 150 ----------------------------------------------- 151 152 For these, we used the (nonsense) program: 153 154 int 155 main (int argc, char * argv[]) 156 { 157 bits_alloc (); bits_free (); bits_dup (); 158 bits_is_member (); bits_is_equal (); 159 bits_is_subset (); bits_is_empty (); 160 bits_is_empty_range (); bits_is_full (); 161 bits_is_full_range (); bits_adjoin (); 162 bits_remove (); bits_toggle (); bits_clear (); 163 bits_clear_range (); bits_fill (); 164 bits_fill_range (); bits_complement (); 165 bits_assign (); bits_union (); 166 bits_intersection (); bits_difference (); 167 bits_revdifference (); bits_xor (); 168 bits_population (); bits_population_range (); 169 bits_ffs (); bits_ffs_range (); 170 bits_ffc (); bits_ffc_range (); 171 } 172 173 Total executable size: 174 175 text data 176 39097 4864 177 178 The following list of object files from the Hackerlab 179 C library were linked into the executable: 180 181 bits.o bitset-tree.o bitset.o alloc-limits.o mem.o 182 must-malloc.o panic.o str.o cvt.o panic-exit.o 183 char-class.o 184 185 The contribution of those files to the executable size is: 186 187 text data 188 37814 4672 189 190 191External Linkage Dependencies: 192 193 194 When compiled under FreeBSD 3.0, the simple programs used to 195 test executable sizes depends on the following symbols 196 defined in "libc": 197 198 _exit 199 free 200 malloc 201 realloc 202 write 203 204 205Accuracy: 206 207 The Hackerlab Bitsets library passes all of the tests 208 described here: 209 210 To test the ordinary bitset functions, we wrote a Scheme 211 program to automatically generate test cases. The tests 212 generated exercise all of the bitset functions, testing a 213 variety of bitset sizes and boundary conditions. The Scheme 214 program uses a list-based set library to compute the expected 215 results of these tests and generates an answer key. Over 216 32,000 tests are generated. 217 218 To test bitset trees, we wrote a program that performs a 219 randomly selected sequence of bitset operations on both an 220 ordinary bitset and on an equivalent bitset tree. Parameters 221 are randomly selected to exercise a variety of special cases 222 such as boundary conditions. All bitset operations are 223 tested. The ordinary and bitset tree results of each 224 operation are compared for equality. 225 226 The totality of Hackerlab C Library tests, including the tests 227 described here and tests of other subsystems that use the 228 bitset library, achieve 98.9% test coverage (measured in lines 229 of source executed) for the ordinary bitset functions, 86.8% 230 coverage for bitset tree functions. The code not tested is 231 primarily code concerned with cleanly returning error codes in 232 the (presumably unlikely) case of memory allocation failure. 233 234 235Execution Speed: 236 237 There are no industry-wide standards for measuring the 238 performance of bitset operations and we know of similar 239 library against which to make comparative measurements. 240 241 The implementation is conceptually straightforward, and 242 its performance characteristics can be accurately inferred 243 from a description of the implementation. 244 245 The ordinary bitset functions perform straightforward bit 246 manipulations using C operators. Bitsets are represented as 247 an array of subsets, each subset containing (up to) the number 248 of bits in a machine word. Most functions which operate on a 249 sequence of contiguous bits contain a loop which operates on 250 an entire word at a time, with special cases for the 251 boundaries of the loop. 252 253 Bitset trees are recursively represented as a tree of bitset 254 trees, with ordinary bitsets as leaf nodes. Subtrees 255 which are all 0 or all 1 are (in most circumstances) replaced 256 by a NULL or -1 pointer (which speeds up operations on those 257 subtrees). 258 259 260Run-time Allocation Requirements: 261 262 Ordinary bitsets are stored as a contiguous array of bits. 263 264 Bitset trees afford programmers to make explicit space/time 265 trade-offs by determining the depth and branching structure 266 of the trees. 267 268 269Support: 270 271 To purchase an alternative license, request additional 272 features, or for any kind of support assistance, you can 273 contact us at "hackerlab@regexps.com" or via our web site 274 "www.regexps.com". We can also be reached at (412) 401-5204 275 or at: 276 277 regexps.com 278 1717 Murray Ave. P.M.B. #10 279 Pittsburgh, PA 15217 280 281 Available support includes (but is not limited to): 282 283 - porting assistance 284 - customized extensions 285 - consultation concerning regexp-intensive applications 286 - help with undocumented interfaces 287 - bug fixing 288 289 In most cases, support is offered as a commercial 290 service. 291 292 293/*end-insert 294 */ 295 296 297