1purge
2=====
3
4The purge tool is a kind of magnifying glass into your squid-2 cache. You
5can use purge to have a look at what URLs are stored in which file within
6your cache. The purge tool can also be used to release objects which URLs
7match user specified regular expressions. A more troublesome feature is the
8ability to remove files squid does not seem to know about any longer.
9
10    USE AT YOUR OWN RISK! NO GUARANTEES, WHATSOEVER! DON'T BLAME US!
11			 YOU HAVE BEEN WARNED!
12
13
14
15compilation
16===========
17
18Purge has been successfully compiled under the following OSes:
19
20	SYSTEM		g++	native
21	------		---	------
22	Solaris 2.7	yes	CC
23	IRIX 6.5	yes	CC -n32
24	Linux 2.0.36	yes	(g++ IS native)
25	FreeBSD 4.x	yes	gmake port must be installed
26				(g++ IS supported)
27
28The recent move of the Linux community towards glibc2 may cause some
29troubles, though. The compilation requires GNU make, no other make will work
30correctly. The source distribution contains all files checked into the
31revision control repository. Therefore, you will need to install GNU RCS
32first (which in turn needs the GNU diffutils).
33
34The repository also contains the prototypical Perl implementation. The user
35interface in the C++ implementation changed a little when compared to the
36Perl one. You will have to state at least one regular expression for purge
37to start working. Also, printing the complete cache URLs, you will need to
38specify the "-e ." regular expression.
39
40In order to compile the purge tool, untar the source distribution and
41change into the purge directory. With RCS and GNU make installed, just say
42"make". GNU make will automagically retrieve all necessary files from the
43repository and create the binary.
44
45Systems not stated above will need to retrieve the makefile (use "co -l
46Makefile" for this) and add their own platform specific definitions to
47section [2] in the makefile.
48
49
50
51squid preparation
52=================
53
54In order to use purge for real PURGEs, you will have to enable this feature
55in squid. By default, PURGE is disabled. You should watch closely for whom
56you enable the PURGE ability, otherwise total stranger just might wipe your
57cache content. The following lines will need to be added to your squid.conf
58(you may want to add further networks to the src_local ACL):
59
60	acl purge method PURGE
61	acl src_local src 127.0.0.0/8
62	http_access allow purge src_local
63	http_access deny  purge
64
65Reconfigure or restart (preferred) your squid after changing the
66configuration file.
67
68
69
70modes of operation
71==================
72
73$Id: purge.cc,v 1.15 2000/09/21 09:05:56 cached Exp $
74Usage:  purge   [-a] [-c cf] [-d l] [-(f|F) fn | -(e|E) re] [-p h[:p]]
75                [-P #] [-s] [-v] [-C dir [-H]] [-n]
76
77 -a     display a little rotating thingy to indicate that I am alive (tty only).
78 -c c   squid.conf location, default "/usr/local/etc/squid/squid.conf".
79 -C dir base directory for content extraction (copy-out mode).
80 -d l   debug level, an OR of different debug options.
81 -e re  single regular expression per -e instance (use quotes!).
82 -E re  single case sensitive regular expression like -e.
83 -f fn  name of textfile containing one regular expression per line.
84 -F fn  name of textfile like -f containing case sensitive REs.
85 -H     prepend HTTP reply header to destination files in copy-out mode.
86 -n     do not fork() when using more than one cache_dir.
87 -p h:p cache runs on host h and optional port p, default is localhost:3128.
88 -P #   if 0, just print matches; otherwise OR the following purge modes:
89           0x01 really send PURGE to the cache.
90           0x02 remove all caches files reported as 404 (not found).
91           0x04 remove all weird (inaccessible or too small) cache files.
92        0 and 1 are recommended - slow rebuild your cache with other modes.
93 -s     show all options after option parsing, but before really starting.
94 -v     show more information about the file, e.g. MD5, timestamps and flags.
95
96--- &< snip, snip ---
97
98-a	is a kind of "i am alive" flag. It can only be activated, if
99	your stdout is a tty. If active, it will display a little
100	rotating line to indicate that there is actually something
101	happening. You should not use this switch, if you capture
102	your stdout in a file, or if your expression list produces
103	many matches. The -a flag is also incompatible with the
104	(default) multi cache_dir mode.
105
106	default: off
107	See also: -n
108
109-c cd	CHANGED!
110	this option lets you specify the location of the squid.conf file.
111	Purge now understands about more than one cache_dir, and does so
112	by parsing Squid's configuration file. It knows about both ways
113	of Squid-2 cache_dir specifications, and will automatically try
114	to use the correct one.
115
116	default: /usr/local/etc/squid/squid.conf
117
118-C cd	if you want to rescue files from your cache, you need to specify
119	the directory into which the files will be copied. Please note
120	that purge will try to establish the original server's directory
121	structure. This switch also activates copy-out mode. Please do
122	not use copy-out mode with any purge mode (-P) other than 0.
123
124	For instance, if you specified "-C /tmp", Purge will try to
125	recreate /tmp/www.server.1/url/path/file, and so forth.
126
127	default: off
128	See also: -H, -P
129
130-d l	lets you specify a debug level. Differents bits are reserved for
131	different output.
132
133	default: 0
134
135-e re	the "-e" options let you specify one regular expression at the
136-E re	commandline. This is useful, if there is only a handful you
137	want to check. Please remember to escape the shell metachars
138	used in your regular expression. The use of single quotes
139	around your expression is recommended. The capital letter
140	version works case sensitive, the lower caps version does not.
141
142	default: (no default)
143
144-f fn	if you have more than a handful of expression, or want to check
145-F fn	the same set at regular intervals, the file option might be more
146	useful to you. Each line in the text file will be regarded as
147	one regular expression.  Again, the capital letter version works
148	case sensitive, the lower caps version does not.
149
150	default: (no default)
151
152-H	if in copy-out mode (see: -C), you can specify to keep the
153	HTTP Header in the recreated file.
154
155	default: off
156	See also: -C
157
158-n	by specifying the "-n" switch, you will tell Purge to process
159	one cache_dir after another, instead of doing things in parallel.
160	If you have more than one cache_dir in your configuration,
161	Purge will fork off a worker process for each cache_dir to
162	do the checks for optimum speed - assuming a decently designed
163	cache. Since parallel execution will put quite some load on the
164	system and its controllers, it is sometimes preferred to use
165	less resources,	though it will take longer.
166
167	default: parallel mode for more than one cache_dir
168
169-p h[:p] Some cache admins (i.e. me) use a different port than 3128. The
170	purge tool will need to connect to your cache in order to send
171	the PURGE request (see -P). This option lets you specify the
172	host and port to connect to. The port is optional. The port
173	can be a name (check your /etc/services) or number. It is
174	separated from the host name portion by a single colon, no
175	spaces allowed.
176
177	default: localhost:3128
178
179-P #	If you want to do more than just print your cache content, you
180	will need to specify this option. Each bit is reserved for a
181	different action. Only the use of the LSB is recommended, the
182	rest should be considered experimental.
183
184		no bit set:	just print
185		bit#0 set:	send PURGE for matches
186		bit#1 set:	unlink object file for 404 not found PURGEs
187		bit#2 set:	unlink weird object files
188
189	If you use a value other than 0 or 1, you will need to slow
190	rebuild your cache content. A warning message will remind you
191	of that. If you use bit#1, all unsuccessful PURGEs will result
192	in the object file in your cache directory to be removed, because
193	squid does not seem to know about it any longer. Beware that the
194	asyncio might try to remove it after the purge tool, and thus
195	complains bitterly. Bit#1 only makes sense, if Bit#0 is also
196	set, otherwise it has no effect (since the HTTP status 404 is
197	never returned).
198
199	Bit#2 is reserved for strange files which do not even contain
200	a URL. Beware that these files may indicate a new object squid
201	currently intends to swap onto disk. If the file suddenly went
202	away, or is removed when squid tries to fetch the object, it
203	will complain bitterly. You must slow rebuild your cache, if
204	you use this option.
205
206	It is recommended that if you dare to use bit#1 or bit#2, you
207	should only grant the purge tool access to your squid, e.g.
208	move the HTTP and ICP listening port of squid to a different
209	non-standard location during the purge.
210
211	default: 0 (just print)
212
213-s	If you specify this switch, all commandline parameters will be
214	shown after they were parsed.
215
216	default: off
217
218-v	be verbose in the things reported about the file. See the output
219	section below.
220
221
222output
223======
224
225In regular mode, the output of purge consists of four columns. If the
226URL contains not encoded whitespaces, it may look as if there are more
227columns, but the last one is the URI.
228
229 # name   meaning
230 - ------ -----------------------------------------------------------
231 1 file   name of cache file eximed which matches the re.
232 2 status return result of purge request, "  0" in print mode.
233 3 size   object size including stored headers, not file size.
234 4 uri    perceived uri
235
236Example for non-verbose output in print-mode:
237
238/cache3/00/00/0000004A   0     5682 http://graphics.userfriendly.org/images/slovenia.gif
239
240In verbose mode, additional columns are inserted before the uri. Time
241stamps are reported using hexadecimal notation, and Squid's standard
242for reporting "no such timestamp" == -1, and "unparsable timestamp" == -2.
243
244 # name   meaning
245 - ------ -----------------------------------------------------------
246 1 file   name of cache file eximed which matches the re.
247 2 status return result of purge request, "  0" in print mode "-P 0".
248 3 size   object size including stored headers, not file size.
249 4 md5    MD5 of URI from file, or "(no_md5_data_available)" string.
250 5 ts     UTC of Value of Date: header in hex notation
251 6 lr     UTC of last time the object was referenced
252 7 ex     UTC of Expires: header
253 8 lr     UTC of Last-Modified: header
254 9 flags  Value of objects flags field in hex, see: Programmers Guide
25510 refcnt number of times the object was referenced.
25611 uri    STORE_META_URL uri or "strange_file"
257
258Example for verbose output in print-mode:
259
260/cache1/00/00/000000B7   0      406 7CFCB1D319F158ADC9CFD991BB8F6DCE 397d449b 39bf677b ffffffff 3820abfc 0460     1  http://www.netscape.com/images/nc_vera_tile.gif
261
262
263hexd
264====
265
266The hexd tool let's you conveniently hex dump a file both, in hex char and
267display char columns. Hexd only assumes that characters 0-31,127-159,255
268are not printable.
269
270
271$ ./hexd /cache1/00/00/000000B7 | less -r
272
27300000000: 03 00 00 00 6D 03 00 00-00 10 7C FC B1 D3 19 F1  ....m.....|���.�
27400000010: 58 AD C9 CF D9 91 BB 8F-6D CE 05 00 00 00 18 39  X����.�.m�.....9
27500000020: 7D 44 9B 39 BF 67 7B FF-FF FF FF 38 20 AB FC 00  }D.9�g{....8 ��.
27600000030: 00 00 00 00 01 04 60 04-00 00 00 30 68 74 74 70  ......`....0http
27700000040: 3A 2F 2F 77 77 77 2E 6E-65 74 73 63 61 70 65 2E  ://www.netscape.
27800000050: 63 6F 6D 2F 69 6D 61 67-65 73 2F 6E 63 5F 76 65  com/images/nc_ve
27900000060: 72 61 5F 74 69 6C 65 2E-67 69 66 00 08 48 54 54  ra_tile.gif..HTT
28000000070: 50 2F 31 2E 30 20 32 30-30 20 4F 4B 0D 0A 53 65  P/1.0 200 OK..Se
28100000080: 72 76 65 72 3A 20 4E 65-74 73 63 61 70 65 2D 45  rver: Netscape-E
28200000090: 6E 74 65 72 70 72 69 73-65 2F 33 2E 36 0D 0A 44  nterprise/3.6..D
283000000A0: 61 74 65 3A 20 54 75 65-2C 20 32 35 20 4A 75 6C  ate: Tue, 25 Jul
284000000B0: 20 32 30 30 30 20 30 37-3A 34 31 3A 31 35 20 47   2000 07:41:15 G
285000000C0: 4D 54 0D 0A 43 6F 6E 74-65 6E 74 2D 54 79 70 65  MT..Content-Type
286000000D0: 3A 20 69 6D 61 67 65 2F-67 69 66 0D 0A 4C 61 73  : image/gif..Las
287000000E0: 74 2D 4D 6F 64 69 66 69-65 64 3A 20 57 65 64 2C  t-Modified: Wed,
288000000F0: 20 30 33 20 4E 6F 76 20-31 39 39 39 20 32 31 3A   03 Nov 1999 21:
28900000100: 34 31 3A 31 36 20 47 4D-54 0D 0A 43 6F 6E 74 65  41:16 GMT..Conte
29000000110: 6E 74 2D 4C 65 6E 67 74-68 3A 20 36 37 0D 0A 41  nt-Length: 67..A
29100000120: 63 63 65 70 74 2D 52 61-6E 67 65 73 3A 20 62 79  ccept-Ranges: by
29200000130: 74 65 73 0D 0A 41 67 65-3A 20 31 38 32 37 31 33  tes..Age: 182713
29300000140: 0D 0A 58 2D 43 61 63 68-65 3A 20 48 49 54 20 66  ..X-Cache: HIT f
29400000150: 72 6F 6D 20 63 73 2D 68-61 6E 34 2E 77 69 6E 2D  rom cs-han4.win-
29500000160: 69 70 2E 64 66 6E 2E 64-65 0D 0A 58 2D 43 61 63  ip.dfn.de..X-Cac
29600000170: 68 65 2D 4C 6F 6F 6B 75-70 3A 20 48 49 54 20 66  he-Lookup: HIT f
29700000180: 72 6F 6D 20 63 73 2D 68-61 6E 34 2E 77 69 6E 2D  rom cs-han4.win-
29800000190: 69 70 2E 64 66 6E 2E 64-65 3A 38 30 38 31 0D 0A  ip.dfn.de:8081..
299000001A0: 50 72 6F 78 79 2D 43 6F-6E 6E 65 63 74 69 6F 6E  Proxy-Connection
300000001B0: 3A 20 6B 65 65 70 2D 61-6C 69 76 65 0D 0A 0D 0A  : keep-alive....
301000001C0: 47 49 46 38 39 61 01 00-26 00 A2 00 00 00 00 00  GIF89a..&.�.....
302000001D0: FF FF FF 00 33 66 33 66-99 FF FF FF 00 00 00 00  ....3f3f........
303000001E0: 00 00 00 00 00 21 F9 04-01 00 00 04 00 2C 00 00  .....!�......,..
304000001F0: 00 00 01 00 26 00 00 03-08 38 A2 BC DE F0 C9 A8  ....&....8����ɨ
30500000200: 12 00 3B                                         ..;
306
307
308
309limitations
310===========
311
312o Purge does not slow rebuild the cache for you.
313
314o It is still relatively slow, especially if your machine is low on memory
315and/or unable to hold all OS directory cache entries in main memory.
316
317o should never be used on "busy" caches with purge modes higher than 1.
318
319
320TODO
321====
322
3231) use the stat() result on weird files to have a look at their ctime and
324   mtime. If they are younger than, lets say 30 seconds, they were just
325   created by squid and should not be removed.
326
3272) Add a query before purging objects or removing files, and add another
328   option to remove nagging for the experienced user.
329
3303) The reported object size may be off by one.
331