1The Webalizer - A log file analysis program -- DNS information 2 3The webalizer has the ability to perform reverse DNS lookups, and 4fully supports both IPv4 and IPv6 addressing schemes. This document 5attempts to explain how it works, and some things that you should be 6aware of when using the DNS lookup features. 7 8Note: The Reverse DNS feature may be enabled or disabled at compile 9 time. DNS lookup code is enabled by default. You can run The 10 Webalizer using the '-vV' command line options to determine what 11 options are enabled in the version you are using. 12 13 14How it works 15------------ 16 17DNS lookups are made against a DNS cache file containing IP addresses 18and resolved names. If the IP address is not found in the cache file, 19it will be left as an IP address. In order for this to happen, a 20cache file MUST be specified when the Webalizer is run, either using 21the '-D' command line switch, or a "DNSCache" configuration file 22keyword. If no cache file is specified, no attempts to perform DNS 23lookups will be done. The cache file can be made three different ways. 24 251) You can have the Webalizer pre-process the specified log file at 26 run-time, creating the cache file before processing the log file 27 normally. This is done by setting the number of DNS Children 28 processes to run, either by using the '-N' command line switch or 29 the "DNSChildren" configuration keyword. This will cause the 30 Webalizer to spawn the specified number of processes which will 31 be used to do reverse DNS lookups.. generally, a larger number 32 of processes will result in faster resolution of the log, however 33 if set too high may cause overall system degradation. A setting 34 of between 5 and 20 should be acceptable, and there is a maximum 35 limit of 100. If used, a cache filename MUST be specified also, 36 using either the '-D' command line switch, or the "DNSCache" 37 configuration keyword. Using this method, normal processing will 38 continue only after all IP addresses have been processed, and the 39 cache file is created/updated. 40 412) You can pre-process the log file as a standalone process, creating 42 the cache file that will be used later by the Webalizer. This is 43 done by running the Webalizer with a name of 'webazolver' (ie: the 44 name 'webazolver' is a symbolic link to 'webalizer') and specifying 45 the cache filename (either with '-D' or DNSCache). If the number 46 of child processes is not given, the default of 5 will be used. In 47 this mode, the log will be read and processed, creating a DNS cache 48 file or updating an existing one, and the program will then exit 49 without any further processing. 50 513) You can use The Webalizer (DNS) Cache file Manager program 'wcmgr' 52 to create and manipulate a cache file. A blank cache file can be 53 created which would be later populated, or data for the cache file 54 can be imported using tab delimited text files. See the wcmgr(1) 55 man page for usage information. 56 57 58Run-time DNS cache file creation/update 59--------------------------------------- 60 61The creation/update of a DNS cache file at run-time occurs as follows: 62 631) The log file is read, creating a list of all IP addresses that are 64 not already cached (or cached but expired) and need to be resolved. 65 Addresses are expired based on the TTL value specified using the 66 'CacheTTL' configuration option or after 7 days (default) if no TTL 67 is specified. 68 692) The specified number of children processes are forked, and are used 70 to perform DNS lookups. 71 723) Each IP address is given, one at a time, to the next available child 73 process until all IP addresses have been processed. Each child will 74 update the cache file when a result is returned. This may be either 75 a resolved name or a failed lookup, in which case the address will be 76 left unresolved. Unresolved addresses are not normally cached, but 77 can be, if enabled using the 'CacheIPs' configuration file keyword. 78 794) Once all IP addresses have been processed and the cache file updated, 80 the Webalizer will process the log normally. Each record it finds 81 that has an unresolved IP address will be looked up in the cache file 82 to see if a hostname is available (ie: was previously found). 83 84Because there may be a significant amount of time between the initial 85unresolved IP list and normal processing, the Webalizer should not be 86run against live log files (ie: a log file that is actively being written 87to by a server), otherwise there may be additional records present that 88were not resolved. 89 90 91Stand-Alone DNS cache file creation/update 92------------------------------------------ 93 94The creation/update of the DNS cache file, when run in stand-alone mode, 95occurs as follows: 96 971) The log file is read, creating a list of all IP addresses that are 98 not already cached (or cached but expired) and need to be resolved. 99 1002) The specified number of children processes are forked, and are used 101 to perform DNS lookups. If the number of processes was not specified, 102 the default of 5 will be used. 103 1043) Each IP address is given, one at a time, to the next available child 105 process until all IP addresses have been processed. Each child will 106 update the cache file when a result is returned. 107 1084) Once all IP addresses have been processed and the cache file updated, 109 the program will terminate without any further processing. 110 111 112Larger sites may prefer to use a stand-alone process to create the DNS 113cache file, and then run the Webalizer against the cache file. This 114allows a single cache file to be used for many virtual hosts, and reduces 115the processing needed if many sites are being processed. The Webalizer 116can be used in stand alone mode by running it as 'webazolver'. When 117run in this fashion, it will only create the cache file and then exit 118without any further processing. A cache filename MUST be specified, 119however unlike when running the Webalizer normally, the number of child 120processes does not have to be given (will default to 5). All normal 121configuration and command line options are recognized, however, many 122of them will simply be ignored.. this allows the use of a standard 123configuration file for both normal use and stand alone use. 124 125 126Examples: 127--------- 128 129webalizer -c test.conf -N 10 -D dns_cache.db /var/log/my_www_log 130 131 This will use the configuration file 'test.conf' to obtain normal 132 configuration options such as hostname and output directory.. it 133 will then either create or update the file 'dns_cache.db' in the 134 default output directory (using 10 child processes) based on the 135 IP addresses it finds in the log /var/lib/my_www_log, and then 136 process that log file normally. 137 138 139webalizer -o out -D dns_cache.db /var/log/my_www_log 140 141 This will process the log file /var/log/my_www_log, resolving IP 142 addresses from the cache file 'dns_cache.db' found in the default 143 output directory "out". The cache file must be present as it will 144 not be created with this command. 145 146 147for i in /var/log/*/access_log; do 148 webazolver -N 20 -D /var/lib/dns_cache.db $i 149done 150 151 The above is an example of how to run through multiple log files 152 creating a single DNS cache file.. this might be typically used on 153 a larger site that has many virtual hosts, all keeping their log 154 files in a separate directory. It will process each access_log it 155 finds in /var/log/* and create a cache file (var/lib/dns_cache.db). 156 This cache file can then be used to process the logs normally with 157 with the Webalizer in a read-only fashion (see next example). 158 159 160for i in /etc/webalizer/*.conf; do webalizer -c $i -D /etc/cache.db; done 161 162 This will process each configuration file found in /etc/webalizer, 163 using the DNS cache file /etc/cache.db. This will also typically be 164 used on a larger site with multiple hosts.. Each configuration file 165 will specify a site specific log file, hostname, output directory, etc. 166 The cache file used will typically be created using a command similar 167 to the one previous to this example. 168 169 170Cache File Maintenance 171---------------------- 172 173The Webalizer DNS cache files generally require very little or no 174special attention. There are times though when some maintenance 175is required, such as occasional purging of very old cache entries. 176The Webalizer never removes a record once it's inserted into the 177cache. If a record expires based on its timestamp, the next time 178that address is seen in a log, its name is looked up again and the 179timestamp is updated. However, there will always be addresses that 180are never seen again, which will cause the cache files to continue 181to grow in size over time. On extremely busy sites or sites that 182attract many one time visitors, the cache file may grow extremely 183large, yet only contain a small amount of valid entries. Using 184The Webalizer (DNS) Cache file Manager ('wcmgr'), cache files can 185be purged, removing expired entries and shrinking the file size. 186A TTL (time to live) value can be specified, so the length of time 187an entry remains in the cache can be varied depending on individual 188site requirements. In addition to purging cache files, 'wcmgr' can 189also be used to list cache file contents, import/export cache data, 190lookup/add/delete individual entries and gather overall statistics 191regarding the cache file (number of records, number expired, etc..). 192 193To purge a cache file using 'wcmgr', an example command would be: 194 195wcmgr -p31 /path/to/dns.cache 196 197This would purge the 'dns.cache' cache file of any records that are 198over 31 days old, and would reclaim the space that those records 199were using in the file. If you would like to see the records that 200get purged, adding the command line option '-v' (verbose) will cause 201the program to print each entry and its age as they are removed. 202You can also use the 'wcmgr' to display statistics on cache files 203to aid in determining when a cache file should be purged. See the 204'wcmgr' man page (wcmgr.1) for additional information on the various 205options available. 206 207 208Stupid Cache Tricks 209------------------- 210 211The DNS cache files used by The Webalizer allow for efficient IP address 212to name translations. Resolved names are normally generated by using an 213existing DNS name server to query the address, either locally or over 214the Internet. However, using The Webalizer (DNS) Cache file Manager, 215almost any IP address to Name translation can be included in the cache. 216One such example would be for mapping local network addresses to real 217names, even though those addresses may not have real DNS entries on the 218network (or may be 'local' addresses prohibited from use on the Internet). 219A simple tab delimited text file can be created and imported into a cache 220for use by The Webalizer, which will then be used to convert the local 221IP addresses to real names. Additional configuration options for The 222Webalizer can then be used as would be normally. For example, consider 223a small business with 10 computers and a DSL router to the Internet. 224Each machine on the local network would use a private IP address that 225would not be resolved using an external (public) DNS server, so would 226always be reported by The Webalizer as 'unknown/unresolved'. A simple 227cache file could be created to map those unresolved addresses into more 228meaningful names, which could then be further processed by the Webalizer. 229An example might look something like: 230 231# Local machines 232192.168.123.254 0 0 gw.widgetsareus.lan 233192.168.123.253 0 0 mail.widgetsareus.lan 234192.168.123.250 0 0 sales.widgetsareus.lan 235192.168.123.240 0 0 service.widgetsareus.lan 236192.168.123.237 0 0 mgr.widgetsareus.lan 237192.168.123.235 0 0 support1.widgetsareus.lan 238192.168.123.234 0 0 support2.widgetsareus.lan 239192.168.123.232 0 0 pres.widgetsareus.lan 240192.168.123.230 0 0 vp.widgetsareus.lan 241192.168.123.225 0 0 reception.widgetsareus.lan 242192.168.123.224 0 0 finance.widgetsareus.lan 243127.0.0.1 0 1 127.0.0.1 244 245 246There are a couple of things here that should be noted. The first 247is that the timestamps (first zero on each line above) are set to 248zero. This tells The Webalizer that these cached entries are to 249be considered 'permanent', and should never be expired (infinite 250TTL or time to live). The second thing to note is that the resolved 251names are using a non-standard TLD (top level domain) of '.lan'. 252The Webalizer will map this special TLD to mean "Local Network" in 253its reports, which allows local traffic to be grouped separately 254from normal Internet traffic. Lastly, you may notice that the 255last line of the file contains an entry with the same IP address 256where a name should be. This entry will prevent the Webalizer 257from ever trying to lookup 127.0.0.1, which is the 'localhost' 258address, when it is found in a log. The second number after the IP 259address (1) tells the Webalizer that it is an unresolved entry, not 260a resolved hostname (ie: has no name). Entries such as this one can 261be used to reduce DNS lookups on addresses that are known not to 262resolve. 263 264 265Considerations 266-------------- 267 268Processing of live log files is discouraged, as the chances of log records 269being written between the time of DNS resolution and normal processing will 270cause problems. 271 272If you are using STDIN for the input stream (log file) and have run-time 273DNS cache file creation/update enabled.. the program will exit after the 274cache file has been created/updated and no output will be produced. If 275you must use STDIN for the input log, you will need to process the stream 276twice, once to create/update the cache file, and again to produce the 277reports. The reason for this is that stream inputs from STDIN cannot 278be 'rewound' to the beginning like files can, so must be given twice. 279 280Cached DNS addresses have a default TTL (time to live) of 7 days. This 281may now be changed using the CacheTTL config file keyword to any value 282from 1 to 100 (days). You may also now specify if unresolved addresses 283should be stored in the DNS cache. Normally, unresolved IP addresses 284are NOT saved in the cache and are looked up each time the program is 285run. 286 287There is an absolute maximum of 100 child processes that may be created, 288however the actual number of children should be significantly less than 289the maximum.. typical usage should be between 5 and 20. 290 291Special thanks to Henning P. Schmiedehausen <hps@tanstaafl.de> for the 292original dns-resolver code he submitted, which was the basis for this 293implementation. Also thanks to Jose Carlos Medeiros for the inital IPv6 294support code. 295 296