1purge 2===== 3 4The purge tool is a kind of magnifying glass into your squid-2 cache. You 5can use purge to have a look at what URLs are stored in which file within 6your cache. The purge tool can also be used to release objects which URLs 7match user specified regular expressions. A more troublesome feature is the 8ability to remove files squid does not seem to know about any longer. 9 10 USE AT YOUR OWN RISK! NO GUARANTEES, WHATSOEVER! DON'T BLAME US! 11 YOU HAVE BEEN WARNED! 12 13 14 15compilation 16=========== 17 18Purge has been successfully compiled under the following OSes: 19 20 SYSTEM g++ native 21 ------ --- ------ 22 Solaris 2.7 yes CC 23 IRIX 6.5 yes CC -n32 24 Linux 2.0.36 yes (g++ IS native) 25 FreeBSD 4.x yes gmake port must be installed 26 (g++ IS supported) 27 28The recent move of the Linux community towards glibc2 may cause some 29troubles, though. The compilation requires GNU make, no other make will work 30correctly. The source distribution contains all files checked into the 31revision control repository. Therefore, you will need to install GNU RCS 32first (which in turn needs the GNU diffutils). 33 34The repository also contains the prototypical Perl implementation. The user 35interface in the C++ implementation changed a little when compared to the 36Perl one. You will have to state at least one regular expression for purge 37to start working. Also, printing the complete cache URLs, you will need to 38specify the "-e ." regular expression. 39 40In order to compile the purge tool, untar the source distribution and 41change into the purge directory. With RCS and GNU make installed, just say 42"make". GNU make will automagically retrieve all necessary files from the 43repository and create the binary. 44 45Systems not stated above will need to retrieve the makefile (use "co -l 46Makefile" for this) and add their own platform specific definitions to 47section [2] in the makefile. 48 49 50 51squid preparation 52================= 53 54In order to use purge for real PURGEs, you will have to enable this feature 55in squid. By default, PURGE is disabled. You should watch closely for whom 56you enable the PURGE ability, otherwise total stranger just might wipe your 57cache content. The following lines will need to be added to your squid.conf 58(you may want to add further networks to the src_local ACL): 59 60 acl purge method PURGE 61 acl src_local src 127.0.0.0/8 62 http_access allow purge src_local 63 http_access deny purge 64 65Reconfigure or restart (preferred) your squid after changing the 66configuration file. 67 68 69 70modes of operation 71================== 72 73$Id: purge.cc,v 1.15 2000/09/21 09:05:56 cached Exp $ 74Usage: purge [-a] [-c cf] [-d l] [-(f|F) fn | -(e|E) re] [-p h[:p]] 75 [-P #] [-s] [-v] [-C dir [-H]] [-n] 76 77 -a display a little rotating thingy to indicate that I am alive (tty only). 78 -c c squid.conf location, default "/usr/local/etc/squid/squid.conf". 79 -C dir base directory for content extraction (copy-out mode). 80 -d l debug level, an OR of different debug options. 81 -e re single regular expression per -e instance (use quotes!). 82 -E re single case sensitive regular expression like -e. 83 -f fn name of textfile containing one regular expression per line. 84 -F fn name of textfile like -f containing case sensitive REs. 85 -H prepend HTTP reply header to destination files in copy-out mode. 86 -n do not fork() when using more than one cache_dir. 87 -p h:p cache runs on host h and optional port p, default is localhost:3128. 88 -P # if 0, just print matches; otherwise OR the following purge modes: 89 0x01 really send PURGE to the cache. 90 0x02 remove all caches files reported as 404 (not found). 91 0x04 remove all weird (inaccessible or too small) cache files. 92 0 and 1 are recommended - slow rebuild your cache with other modes. 93 -s show all options after option parsing, but before really starting. 94 -v show more information about the file, e.g. MD5, timestamps and flags. 95 96--- &< snip, snip --- 97 98-a is a kind of "i am alive" flag. It can only be activated, if 99 your stdout is a tty. If active, it will display a little 100 rotating line to indicate that there is actually something 101 happening. You should not use this switch, if you capture 102 your stdout in a file, or if your expression list produces 103 many matches. The -a flag is also incompatible with the 104 (default) multi cache_dir mode. 105 106 default: off 107 See also: -n 108 109-c cd CHANGED! 110 this option lets you specify the location of the squid.conf file. 111 Purge now understands about more than one cache_dir, and does so 112 by parsing Squid's configuration file. It knows about both ways 113 of Squid-2 cache_dir specifications, and will automatically try 114 to use the correct one. 115 116 default: /usr/local/etc/squid/squid.conf 117 118-C cd if you want to rescue files from your cache, you need to specify 119 the directory into which the files will be copied. Please note 120 that purge will try to establish the original server's directory 121 structure. This switch also activates copy-out mode. Please do 122 not use copy-out mode with any purge mode (-P) other than 0. 123 124 For instance, if you specified "-C /tmp", Purge will try to 125 recreate /tmp/www.server.1/url/path/file, and so forth. 126 127 default: off 128 See also: -H, -P 129 130-d l lets you specify a debug level. Differents bits are reserved for 131 different output. 132 133 default: 0 134 135-e re the "-e" options let you specify one regular expression at the 136-E re commandline. This is useful, if there is only a handful you 137 want to check. Please remember to escape the shell metachars 138 used in your regular expression. The use of single quotes 139 around your expression is recommended. The capital letter 140 version works case sensitive, the lower caps version does not. 141 142 default: (no default) 143 144-f fn if you have more than a handful of expression, or want to check 145-F fn the same set at regular intervals, the file option might be more 146 useful to you. Each line in the text file will be regarded as 147 one regular expression. Again, the capital letter version works 148 case sensitive, the lower caps version does not. 149 150 default: (no default) 151 152-H if in copy-out mode (see: -C), you can specify to keep the 153 HTTP Header in the recreated file. 154 155 default: off 156 See also: -C 157 158-n by specifying the "-n" switch, you will tell Purge to process 159 one cache_dir after another, instead of doing things in parallel. 160 If you have more than one cache_dir in your configuration, 161 Purge will fork off a worker process for each cache_dir to 162 do the checks for optimum speed - assuming a decently designed 163 cache. Since parallel execution will put quite some load on the 164 system and its controllers, it is sometimes preferred to use 165 less resources, though it will take longer. 166 167 default: parallel mode for more than one cache_dir 168 169-p h[:p] Some cache admins (i.e. me) use a different port than 3128. The 170 purge tool will need to connect to your cache in order to send 171 the PURGE request (see -P). This option lets you specify the 172 host and port to connect to. The port is optional. The port 173 can be a name (check your /etc/services) or number. It is 174 separated from the host name portion by a single colon, no 175 spaces allowed. 176 177 default: localhost:3128 178 179-P # If you want to do more than just print your cache content, you 180 will need to specify this option. Each bit is reserved for a 181 different action. Only the use of the LSB is recommended, the 182 rest should be considered experimental. 183 184 no bit set: just print 185 bit#0 set: send PURGE for matches 186 bit#1 set: unlink object file for 404 not found PURGEs 187 bit#2 set: unlink weird object files 188 189 If you use a value other than 0 or 1, you will need to slow 190 rebuild your cache content. A warning message will remind you 191 of that. If you use bit#1, all unsuccessful PURGEs will result 192 in the object file in your cache directory to be removed, because 193 squid does not seem to know about it any longer. Beware that the 194 asyncio might try to remove it after the purge tool, and thus 195 complains bitterly. Bit#1 only makes sense, if Bit#0 is also 196 set, otherwise it has no effect (since the HTTP status 404 is 197 never returned). 198 199 Bit#2 is reserved for strange files which do not even contain 200 a URL. Beware that these files may indicate a new object squid 201 currently intends to swap onto disk. If the file suddenly went 202 away, or is removed when squid tries to fetch the object, it 203 will complain bitterly. You must slow rebuild your cache, if 204 you use this option. 205 206 It is recommended that if you dare to use bit#1 or bit#2, you 207 should only grant the purge tool access to your squid, e.g. 208 move the HTTP and ICP listening port of squid to a different 209 non-standard location during the purge. 210 211 default: 0 (just print) 212 213-s If you specify this switch, all commandline parameters will be 214 shown after they were parsed. 215 216 default: off 217 218-v be verbose in the things reported about the file. See the output 219 section below. 220 221 222output 223====== 224 225In regular mode, the output of purge consists of four columns. If the 226URL contains not encoded whitespaces, it may look as if there are more 227columns, but the last one is the URI. 228 229 # name meaning 230 - ------ ----------------------------------------------------------- 231 1 file name of cache file eximed which matches the re. 232 2 status return result of purge request, " 0" in print mode. 233 3 size object size including stored headers, not file size. 234 4 uri perceived uri 235 236Example for non-verbose output in print-mode: 237 238/cache3/00/00/0000004A 0 5682 http://graphics.userfriendly.org/images/slovenia.gif 239 240In verbose mode, additional columns are inserted before the uri. Time 241stamps are reported using hexadecimal notation, and Squid's standard 242for reporting "no such timestamp" == -1, and "unparsable timestamp" == -2. 243 244 # name meaning 245 - ------ ----------------------------------------------------------- 246 1 file name of cache file eximed which matches the re. 247 2 status return result of purge request, " 0" in print mode "-P 0". 248 3 size object size including stored headers, not file size. 249 4 md5 MD5 of URI from file, or "(no_md5_data_available)" string. 250 5 ts UTC of Value of Date: header in hex notation 251 6 lr UTC of last time the object was referenced 252 7 ex UTC of Expires: header 253 8 lr UTC of Last-Modified: header 254 9 flags Value of objects flags field in hex, see: Programmers Guide 25510 refcnt number of times the object was referenced. 25611 uri STORE_META_URL uri or "strange_file" 257 258Example for verbose output in print-mode: 259 260/cache1/00/00/000000B7 0 406 7CFCB1D319F158ADC9CFD991BB8F6DCE 397d449b 39bf677b ffffffff 3820abfc 0460 1 http://www.netscape.com/images/nc_vera_tile.gif 261 262 263hexd 264==== 265 266The hexd tool let's you conveniently hex dump a file both, in hex char and 267display char columns. Hexd only assumes that characters 0-31,127-159,255 268are not printable. 269 270 271$ ./hexd /cache1/00/00/000000B7 | less -r 272 27300000000: 03 00 00 00 6D 03 00 00-00 10 7C FC B1 D3 19 F1 ....m.....|���.� 27400000010: 58 AD C9 CF D9 91 BB 8F-6D CE 05 00 00 00 18 39 X����.�.m�.....9 27500000020: 7D 44 9B 39 BF 67 7B FF-FF FF FF 38 20 AB FC 00 }D.9�g{....8 ��. 27600000030: 00 00 00 00 01 04 60 04-00 00 00 30 68 74 74 70 ......`....0http 27700000040: 3A 2F 2F 77 77 77 2E 6E-65 74 73 63 61 70 65 2E ://www.netscape. 27800000050: 63 6F 6D 2F 69 6D 61 67-65 73 2F 6E 63 5F 76 65 com/images/nc_ve 27900000060: 72 61 5F 74 69 6C 65 2E-67 69 66 00 08 48 54 54 ra_tile.gif..HTT 28000000070: 50 2F 31 2E 30 20 32 30-30 20 4F 4B 0D 0A 53 65 P/1.0 200 OK..Se 28100000080: 72 76 65 72 3A 20 4E 65-74 73 63 61 70 65 2D 45 rver: Netscape-E 28200000090: 6E 74 65 72 70 72 69 73-65 2F 33 2E 36 0D 0A 44 nterprise/3.6..D 283000000A0: 61 74 65 3A 20 54 75 65-2C 20 32 35 20 4A 75 6C ate: Tue, 25 Jul 284000000B0: 20 32 30 30 30 20 30 37-3A 34 31 3A 31 35 20 47 2000 07:41:15 G 285000000C0: 4D 54 0D 0A 43 6F 6E 74-65 6E 74 2D 54 79 70 65 MT..Content-Type 286000000D0: 3A 20 69 6D 61 67 65 2F-67 69 66 0D 0A 4C 61 73 : image/gif..Las 287000000E0: 74 2D 4D 6F 64 69 66 69-65 64 3A 20 57 65 64 2C t-Modified: Wed, 288000000F0: 20 30 33 20 4E 6F 76 20-31 39 39 39 20 32 31 3A 03 Nov 1999 21: 28900000100: 34 31 3A 31 36 20 47 4D-54 0D 0A 43 6F 6E 74 65 41:16 GMT..Conte 29000000110: 6E 74 2D 4C 65 6E 67 74-68 3A 20 36 37 0D 0A 41 nt-Length: 67..A 29100000120: 63 63 65 70 74 2D 52 61-6E 67 65 73 3A 20 62 79 ccept-Ranges: by 29200000130: 74 65 73 0D 0A 41 67 65-3A 20 31 38 32 37 31 33 tes..Age: 182713 29300000140: 0D 0A 58 2D 43 61 63 68-65 3A 20 48 49 54 20 66 ..X-Cache: HIT f 29400000150: 72 6F 6D 20 63 73 2D 68-61 6E 34 2E 77 69 6E 2D rom cs-han4.win- 29500000160: 69 70 2E 64 66 6E 2E 64-65 0D 0A 58 2D 43 61 63 ip.dfn.de..X-Cac 29600000170: 68 65 2D 4C 6F 6F 6B 75-70 3A 20 48 49 54 20 66 he-Lookup: HIT f 29700000180: 72 6F 6D 20 63 73 2D 68-61 6E 34 2E 77 69 6E 2D rom cs-han4.win- 29800000190: 69 70 2E 64 66 6E 2E 64-65 3A 38 30 38 31 0D 0A ip.dfn.de:8081.. 299000001A0: 50 72 6F 78 79 2D 43 6F-6E 6E 65 63 74 69 6F 6E Proxy-Connection 300000001B0: 3A 20 6B 65 65 70 2D 61-6C 69 76 65 0D 0A 0D 0A : keep-alive.... 301000001C0: 47 49 46 38 39 61 01 00-26 00 A2 00 00 00 00 00 GIF89a..&.�..... 302000001D0: FF FF FF 00 33 66 33 66-99 FF FF FF 00 00 00 00 ....3f3f........ 303000001E0: 00 00 00 00 00 21 F9 04-01 00 00 04 00 2C 00 00 .....!�......,.. 304000001F0: 00 00 01 00 26 00 00 03-08 38 A2 BC DE F0 C9 A8 ....&....8����ɨ 30500000200: 12 00 3B ..; 306 307 308 309limitations 310=========== 311 312o Purge does not slow rebuild the cache for you. 313 314o It is still relatively slow, especially if your machine is low on memory 315and/or unable to hold all OS directory cache entries in main memory. 316 317o should never be used on "busy" caches with purge modes higher than 1. 318 319 320TODO 321==== 322 3231) use the stat() result on weird files to have a look at their ctime and 324 mtime. If they are younger than, lets say 30 seconds, they were just 325 created by squid and should not be removed. 326 3272) Add a query before purging objects or removing files, and add another 328 option to remove nagging for the experienced user. 329 3303) The reported object size may be off by one. 331