1=head1 NAME 2 3Cache::BDB - An object caching wrapper around BerkeleyDB 4 5=head1 SYNOPSIS 6 7 use Cache::BDB; 8 my %options = ( 9 cache_root => "/tmp/caches", 10 namespace => "Some::Namespace", 11 default_expires_in => 300, # seconds 12 ); 13 14 my $cache = Cache::BDB->new(%options); 15 16 # 17 # [myshellprompt:~]$ find /tmp/caches 18 # /tmp/caches/Some::Namespace/ 19 # /tmp/caches/Some::Namespace/Some::Namespace.db 20 # /tmp/caches/Some::Namespace/__db.001 21 # /tmp/caches/Some::Namespace/__db.002 22 # /tmp/caches/Some::Namespace/__db.003 23 # 24 25 $cache->namespace(); # returns "Some::Namespace", read only 26 $cache->default_expires_in(); # returns 300 27 $cache->default_expires_in(600); # change it to 600 28 29 $cache->set(1, \%some_hash); 30 $cache->set('foo', 'bar'); 31 $cache->set(20, $obj, 10); 32 33 $cache->add(21, 'whatever'); # works, nothing with the key '21' set yet. 34 $cache->add(21, 'coffeepot'); # fails, can only add() something that hasn't 35 # yet been set 36 37 $cache->replace(21, 'shoelace'); # replaces the data 'whatever' with 38 # 'shoelace' 39 $cache->replace(7, 'tattoo'); # fails key/value pair was never set() or 40 # add()ed previously 41 42 my $h = $cache->get(1); # $h and \%some_hash contain the same data 43 my $bar = $cache->get('foo'); # $bar eq 'bar'; 44 my $obj = $cache->get(20); # returns the blessed object 45 46 $cache->count() == 3; 47 # assuming 10 seconds has passed ... 48 $cache->is_expired(20); # returns true .. 49 $cache->purge(); 50 $cache->get(20); # returns undef 51 $cache->count() == 2; 52 53 my $hr = $cache->get_bulk(); 54 55 # $hr = {1 => {contents_of => '%some_hash'}, 56 # 21 => 'shoelace' }; 57 58 $cache->close(); # close the cache object 59 60=head1 DESCRIPTION 61 62This module implements a caching layer around BerkeleyDB 63for object persistence. It implements the basic methods necessary to 64add, retrieve, and remove objects. The main advantage over other 65caching modules is performance. I've attempted to stick with a 66B<Cache::Cache>-like interface as much as possible, though it may differ 67here and there. 68 69=head1 DEPENDENCIES 70 71I've been developing using a very recent version of Berkeley DB 72(v4.4.20) and BerkeleyDB (v0.27). I'm pretty sure that most of the 73functionality the module relies on is available in Berkeley DB version 743 and higher, but so far I have not tested with older versions. I'm 75open to making version specific concessions if necessary. If at all 76possible, I would advise you to upgrade both Berkeley DB and 77BerkeleyDB to their latest respective versions. 78 79Cache::BDB currently serializes everything it stores with Storable. 80 81=head1 PERFORMANCE 82 83The intent of this module is to supply great performance with a 84reasonably feature rich API. There is no way this module can compete 85with, say, using BerkeleyDB directly, and if you don't need any kind 86of expiration, automatic purging, etc, that will more than likely be 87much faster. If you'd like to compare the speed of some other caching 88modules, have a look at 89B<http://cpan.robm.fastmail.fm/cache_perf.html>. I've included a 90patch which adds Cache::BDB to the benchmark. 91 92=head1 LOCKING 93 94All Cache::BDB environments are opened with the DB_INIT_CDB 95flag. This enables multiple-reader/single-writer locking handled 96entirely by the Berkeley DB internals at either the database or 97environment level. See 98http://www.sleepycat.com/docs/ref/cam/intro.html for more information 99on what this means for locking. 100 101Important: it is a bad idea to share a single Cache::BDB object across 102multiple processes or threads. Doing so is bound to cause you 103pain. Instead, have your thread/process instantiate its own Cache::BDB 104object. It is safe to have them all pointing at the same cache file. 105 106=head1 CACHE FILES 107 108For every new B<Cache::BDB> object, a Berkeley DB Environment is 109created (or reused if it already exists). This means that even for a 110single cache object, at least 4 files need to be created, three for 111the environment and at least one for the actual data in the cache. Its 112possible for mutliple cache database files to share a single 113environment, and its also possible for multiple cache databases to 114share a single database file. See the SYNOPSIS above for a quick view 115of what you are likeley to find on the filesystem for a 116cache. Cache::BDB uses BerkeleyDB exclusively with regard to files, so 117if you have questions about whats in those files, you might 118familiarize yourself further with Berkeley DB. 119 120=head1 USAGE 121 122=over 4 123 124=item B<new>(%options) 125 126=item * cache_root 127 128Specify the top level directory to store cache and related files 129in. This parameter is required. Keep in mind that B<Cache::BDB> uses a 130B<BerkeleyDB> environment object so more than one file will be written 131for each cache. 132 133=item * cache_file 134 135If you want to tell B<Cache::BDB> exactly which file to use for your 136cache, specify it here. This paramater is required if you plan to use 137the env_lock option and/or if you want to have multiple logical 138databases (namespaces) in a single physical file. If unspecified, 139B<Cach::BDB> will create its database file using the 140B<namespace>. B<cache_file> should be relative to your cache_root, not 141fully-qualified, i.e. 142 143 my %options = ( cache_root => '/some/location/for/caching/', 144 cache_file => 'whatever.db', 145 namespace => 'MyObjects'); 146 147This gives you, among other files, /some/location/for/caching/whatever.db. 148Your logical database inside of 'whatever.db' will be named with 'MyObject'. 149If you were to then instantiate another Cache::BDB with the following: 150 151 my %options = ( cache_root => '/some/location/for/caching/', 152 cache_file => 'whatever.db', 153 namespace => 'MyOtherObjects'); 154 155You would now have two logical caches in one physical file, which is 156ok, but see B<namespace> below for a better idea. 157 158=item * namespace 159 160Your B<namespace> tells B<Cache::BDB> where to store cache data under 161the B<cache_root> if no B<cache_file> is specified or what to call the 162database in the multi-database file if B<cache_file> is specified. It 163is a required parameter. For clarity, it might be best to instantiate 164B<Cache::BDB> objects like so: 165 166 my $namespace = 'MyObjects'; 167 my %options = ( cache_root => "/some/location/for/caching/$namespace", 168 namespace => $namespace ); 169 170Unlike the examples given above under cache_file, this allows you to 171locate a single cache type in its own directory, which gives you more 172flexibility to nuke it wholesale or move things around a little. 173 174=item * type 175 176Cache::BDB allows you to select the type of Berkeley DB storage 177mechanism to use. Your choices are Hash, Btree, and Recno. Queue isn't 178supported. I haven't tested the three supported types extensively. The 179default, if unspecified, is Btree, and this is probably good enough 180for most applications. Note that if a cache is created as one type it 181must remain that type. If you instantiate a Cache::BDB object with one 182type (or use the default), and then attempt to connect to the same 183cache with a newly instantiated object that uses a different type, you 184will get a warning, and Cache::BDB will be nice and connect you to the 185cache with its original type. 186 187Important: up until Berkeley DB 4.4.x, it has not been possible to 188shrink the physical size of a database file, which means that, 189technically, your cache files will never get smaller even if you 190delete everything from them. HOWEVER, with 4.4.x this functionality is 191now possiblye but it will only work with the Btree type. As soon as 192this is available in the BerkeleyDB.pm wrapper (soon I'm told), I'll 193be releasing a version with some options to allow this. Point being, 194this may be a good reason to stick with Btree. 195 196For more info, see http://www.sleepycat.com/docs/ref/am_conf/intro.html. 197 198=item * env_lock 199 200If multiple databases (same or different files) are opened using the 201same Berkeley DB environment, its possible to turn on environment 202level locking rather than file level locking. This may be advantageous 203if you have two separate but related caches. By passing in the 204env_lock parameter with any true value, the environment will be 205created in such a way that any databases created under its control 206will all lock whenever Berkeley DB attempts a read/write lock. This 207flag must be specified for every database opened under this 208environment. Note: this is very untested in Cache::BDB, and I don't 209know how necessary it is. 210 211=item * default_expires_in 212 213Time (in seconds) that cached objects should live. If set to 0, 214objects never expire. See B<set> to enable a per-object value. 215 216=item * auto_purge_interval 217 218Time (in seconds) that the cached objects will be purged by one or 219both of the B<auto_purge> types (get/set). If set to 0, auto purge is 220disabled. Note, of course, that objects won't actually be purged until 221some event actually takes place that will call purge (set or get), so 222if this is set to 300 but no gets or sets are called for more than 300 223seconds, the items haven't actually been purged yet. 224 225=item * auto_purge_on_set 226 227If this item is true and B<auto_purge_interval> is greater than 0, 228calling the B<set> method will first purge any expired records from 229the cache. 230 231=item * auto_purge_on_get 232 233If this item is true and B<auto_purge_interval> is greater than 0, 234calling the B<get> method will first purge any expired records from 235the cache. 236 237=item * purge_on_init 238 239If set to a true value, purge will be called before the constructor returns. 240 241=item * purge_on_destroy 242 243If set to a true value, purge will be called before the object goes 244out of scope. 245 246=item * clear_on_init 247 248If set to a true value, clear will be called before the constructor returns. 249 250=item * clear_on_destroy 251 252If set to a true value, clear will be called before the object goes 253out of scope. 254 255=item * disable_compact 256 257Disable database compactions for clear, purge, delete and remove 258methods. See B<DATABASE SIZE> below for more information on database 259compaction. 260 261=item * disable_auto_purge 262 263As a courtesy, Cache::BDB will automatically remove() any expired 264cache item you get() before returning undef. This is handy if you 265don't feel the need to do a lot of explicit cache purging, but if you 266only want purge, remove, delete or clear to actually delete cache 267items, you can disable this functionality by passing in 268disable_auto_purge with any true value. 269 270=back 271 272=over 4 273 274=item B<close>() 275 276Explicitly close the connection to the cache. A good idea. Essentially 277the same as undef'ing the object (explicitly calls DESTROY). 278 279=item B<namespace>() 280 281This read only method returns the namespace that the cache object is 282currently associated with. 283 284=item B<auto_purge_interval>($seconds) 285 286Set/get the length of time (in seconds) that the cache object will 287wait before calling one or both of the B<auto_purge> methodss. If set 288to 0, automatic purging is disabled. 289 290=item B<auto_purge_on_set>(1/0) 291 292Enable/disable auto purge when B<set> is called. 293 294=item B<auto_purge_on_get>(1/0) 295 296Enable/disable auto purge when B<get> is called. 297 298=item B<set>($key, $value, [$seconds]) 299 300Store an item ($value) with the associated $key. Time to live (in 301seconds) can be optionally set with a third argument. Returns true on success. 302 303=item B<add>($key, $value, [$seconds]) 304 305Only B<set> in the cache if the key doesn't already exist. 306 307=item B<replace>($key, $value, [$seconds]) 308 309Only B<set> in the cache if the key does exist. 310 311=item B<get>($key) 312 313Locate and return the data associated with $key. Returns the object 314associated with $key or undef if the data doesn't exist. If 315B<auto_purge_on_get> is enabled, the cache will be purged before 316attempting to locate the item. 317 318=item B<get_bulk>() 319 320Returns a hash reference containing every unexpired item from the 321cache key'ed on their cache id. This can be useful if your keys aren't 322always available or if you just want to use the cache as a convenient 323way to dump data in chunks. 324 325The result looks something like this: 326 327 my $h = $cache->get_bulk(); 328 329 # $h = { 123 => "bird and bee", 330 # 456 => "monkeys with sticks", 331 # 789 => "take whats mine", 332 # }; 333 334=item B<remove>($key) 335 336Removes the cache element specified by $key if it exists. Returns true 337for success. 338 339=item B<delete>($key) 340 341Same as remove() 342 343=item B<clear>() 344 345Completely clear out the cache and compact the underlying 346database. Returns the number of cached items removed. 347 348=item B<count>() 349 350Returns the number of items in the cache. 351 352=item B<size>() 353 354Return the size (in bytes) of all the cached items. This call relies 355on the availability of B<Devel::Size>. If its not found, you'll get a 356warning and size() will simply return 0. Currently the size is 357calculated every time this is called by using 358B<Devel::Size::total_size>, so it may be expensive for large 359caches. In the future size-aware options and functionality may be 360available, but for now you'll need to implement this outside of 361Cache::BDB if you need it. 362 363=item B<purge>() 364 365Purge expired items from the cache. Returns the number of items purged. 366 367=item B<is_expired>($key) 368 369Returns true if the data pointed to by $key is expired based on its 370stored expiration time. Returns false if the data isn't expired *or* if the 371data doesn't exist. 372 373=back 374 375=head1 DATABASE SIZE 376 377(See http://www.sleepycat.com/docs/ref/am_misc/diskspace.html) 378 379Before Berkeley DB release 4.4 it was not possible to return freed 380space in a database file. This means that no matter how many items you 381delete, your file will still retain its size, and continue to grow as 382you add more items. The only way to get the file size back down was to 383dump the database to a file and reload it into a new database 384file. This may or may not be a problem for your application, but keep 385in mind that your cache will continue to get bigger and, for example, 386your operating system may have a maximum file size limit. 387 388In 4.4, Sleepycat introduced the ability to free unused 389space. BerkeleyDB 0.29 exposes this functionality in the perl 390wrapper. If you are using these versions or better and have chosen the 391Btree database type (the default for Cache::BDB), your caches will 392automatically be compacted when items are purged, removed/deleted, or 393if clear is called. You can disable the automatic compaction of cache 394files by initializing your Cache::BDB object with the disable_compact 395parameter set to any true value. In my tests so far, however, database 396compaction does not appear to affect performance significantly, and 397may save you from a headache down the road. 398 399=head1 AUTHOR 400 401Josh Rotenberg, C<< <joshrotenberg at gmail.com> >> 402 403=head1 TODO 404 405* Make data storage scheme configurable (Storable, YAML, Data::Dumper, 406 or callback based) 407 408* Split storage between meta and data for faster operations on meta data. 409 410* Add some size/count aware features. 411 412* Create some examples. 413 414* Fix fork()'ing tests. 415 416=head1 BUGS 417 418Please report any bugs or feature requests to C<bug-cache-bdb at 419rt.cpan.org>, or through the web interface at 420L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Cache-BDB>. I will 421be notified, and then you'll automatically be notified of progress on 422your bug as I make changes. 423 424=head1 SUPPORT 425 426You can find documentation for this module with the perldoc command. 427 428 perldoc Cache::BDB 429 430You can also look for information at: 431 432=over 4 433 434=item * AnnoCPAN: Annotated CPAN documentation 435 436L<http://annocpan.org/dist/Cache-BDB> 437 438=item * CPAN Ratings 439 440L<http://cpanratings.perl.org/d/Cache-BDB> 441 442=item * RT: CPAN's request tracker 443 444L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Cache-BDB> 445 446=item * Search CPAN 447 448L<http://search.cpan.org/dist/Cache-BDB> 449 450=back 451 452=head1 SEE ALSO 453 454BerkeleyDB 455 456=head1 ACKNOWLEDGEMENTS 457 458Baldur Kristinsson 459Sandy Jensen 460 461=head1 COPYRIGHT & LICENSE 462 463Copyright 2006 Josh Rotenberg, all rights reserved. 464 465This program is free software; you can redistribute it and/or modify it 466under the same terms as Perl itself. 467 468=cut 469 470