1
2                        =============================
3                        p0f v3: passive fingerprinter
4                        =============================
5
6                    http://lcamtuf.coredump.cx/p0f3.shtml
7
8         Copyright (C) 2012 by Michal Zalewski <lcamtuf@coredump.cx>
9
10
11---------------
121. What's this?
13---------------
14
15P0f is a tool that utilizes an array of sophisticated, purely passive traffic
16fingerprinting mechanisms to identify the players behind any incidental TCP/IP
17communications (often as little as a single normal SYN) without interfering in
18any way.
19
20Some of its capabilities include:
21
22  - Highly scalable and extremely fast identification of the operating system
23    and software on both endpoints of a vanilla TCP connection - especially in
24    settings where NMap probes are blocked, too slow, unreliable, or would
25    simply set off alarms,
26
27  - Measurement of system uptime and network hookup, distance (including
28    topology behind NAT or packet filters), and so on.
29
30  - Automated detection of connection sharing / NAT, load balancing, and
31    application-level proxying setups.
32
33  - Detection of dishonest clients / servers that forge declarative statements
34    such as X-Mailer or User-Agent.
35
36The tool can be operated in the foreground or as a daemon, and offers a simple
37real-time API for third-party components that wish to obtain additional
38information about the actors they are talking to.
39
40Common uses for p0f include reconnaissance during penetration tests; routine
41network monitoring; detection of unauthorized network interconnects in corporate
42environments; providing signals for abuse-prevention tools; and miscellanous
43forensics.
44
45A snippet of typical p0f output may look like this:
46
47.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (syn) ]-
48|
49| client   = 1.2.3.4
50| os       = Windows XP
51| dist     = 8
52| params   = none
53| raw_sig  = 4:120+8:0:1452:65535,0:mss,nop,nop,sok:df,id+:0
54|
55`----
56
57.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (syn+ack) ]-
58|
59| server   = 4.3.2.1
60| os       = Linux 3.x
61| dist     = 0
62| params   = none
63| raw_sig  = 4:64+0:0:1460:mss*10,0:mss,nop,nop,sok:df:0
64|
65`----
66
67.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (mtu) ]-
68|
69| client   = 1.2.3.4
70| link     = DSL
71| raw_mtu  = 1492
72|
73`----
74
75.-[ 1.2.3.4/1524 -> 4.3.2.1/80 (uptime) ]-
76|
77| client   = 1.2.3.4
78| uptime   = 0 days 11 hrs 16 min (modulo 198 days)
79| raw_freq = 250.00 Hz
80|
81`----
82
83A live demonstration can be seen here:
84
85http://lcamtuf.coredump.cx/p0f3/
86
87--------------------
882. How does it work?
89--------------------
90
91A vast majority of metrics used by p0f were invented specifically for this tool,
92and include data extracted from IPv4 and IPv6 headers, TCP headers, the dynamics
93of the TCP handshake, and the contents of application-level payloads.
94
95For TCP/IP, the tool fingerprints the client-originating SYN packet and the
96first SYN+ACK response from the server, paying attention to factors such as the
97ordering of TCP options, the relation between maximum segment size and window
98size, the progression of TCP timestamps, and the state of about a dozen possible
99implementation quirks (e.g. non-zero values in "must be zero" fields).
100
101The metrics used for application-level traffic vary from one module to another;
102where possible, the tool relies on signals such as the ordering or syntax of
103HTTP headers or SMTP commands, rather than any declarative statements such as
104User-Agent. Application-level fingerprinting modules currently support HTTP.
105Before the tool leaves "beta", I want to add SMTP and FTP. Other protocols,
106such as FTP, POP3, IMAP, SSH, and SSL, may follow.
107
108The list of all the measured parameters is reviewed in section 5 later on.
109Some of the analysis also happens on a higher level: inconsistencies in the
110data collected from various sources, or in the data from the same source
111obtained over time, may be indicative of address translation, proxying, or
112just plain trickery. For example, a system where TCP timestamps jump back
113and forth, or where TTLs and MTUs change subtly, is probably a NAT device.
114
115-------------------------------
1163. How do I compile and use it?
117-------------------------------
118
119To compile p0f, try running './build.sh'; if that fails, you will be probably
120given some tips about the probable cause. If the tips are useless, send me a
121mean-spirited mail.
122
123It is also possible to build a debug binary ('./build.sh debug'), in which case,
124verbose packet parsing and signature matching information will be written to
125stderr. This is useful when troubleshooting problems, but that's about it.
126
127The tool should compile cleanly under any reasonably new version of Linux,
128FreeBSD, OpenBSD, MacOS X, and so forth. You can also builtdit on Windows using
129cygwin and winpcap. I have not tested it on all possible varieties of un*x, but
130if there are issues, they should be fairly superficial.
131
132Once you have the binary compiled, you should be aware of the following
133command-line options:
134
135  -f fname   - reads fingerprint database (p0f.fp) from the specified location.
136               See section 5 for more information about the contents of this
137               file.
138
139               The default location is ./p0f.fp. If you want to install p0f, you
140               may want to change FP_FILE in config.h to /usr/local/etc/p0f.fp.
141
142  -i iface   - asks p0f to listen on a specific network interface. On un*x, you
143               should reference the interface by name (e.g., eth0). On Windows,
144               you can use adapter index instead (0, 1, 2...).
145
146               Multiple -i parameters are not supported; you need to run
147               separate instances of p0f for that. On Linux, you can specify
148               'any' to access a pseudo-device that combines the traffic on
149               all other interfaces; the only limitation is that libpcap will
150               not recognize VLAN-tagged frames in this mode, which may be
151               an issue in some of the more exotic setups.
152
153               If you do not specify an interface, libpcap will probably pick
154               the first working interface in your system.
155
156  -L         - lists all available network interfaces, then quits. Particularly
157               useful on Windows, where the system-generated interface names
158               are impossible to memorize.
159
160  -r fname   - instead of listening for live traffic, reads pcap captures from
161               the specified file. The data can be collected with tcpdump or any
162               other compatible tool. Make sure that snapshot length (-s
163               option in tcpdump) is large enough not to truncate packets; the
164               default may be too small.
165
166               As with -i, only one -r option can be specified at any given
167               time.
168
169  -o fname   - appends grep-friendly log data to the specified file. The log
170               contains all observations made by p0f about every matching
171               connection, and may grow large; plan accordingly.
172
173               Only one instance of p0f should be writing to a particular file
174               at any given time; where supported, advisory locking is used to
175               avoid problems.
176
177  -s fname   - listens for API queries on the specified filesystem socket. This
178               allows other programs to ask p0f about its current thoughts about
179               a particular host. More information about the API protocol can be
180               found in section 4 below.
181
182               Only one instance of p0f can be listening on a particular socket
183               at any given time. The mode is also incompatible with -r.
184
185  -d         - runs p0f in daemon mode: the program will fork into background
186               and continue writing to the specified log file or API socket. It
187               will continue running until killed, until the listening interface
188               is shut down, or until some other fatal error is encountered.
189
190               This mode requires either -o or -s to be specified.
191
192               To continue capturing p0f debug output and error messages (but
193               not signatures), redirect stderr to another non-TTY destination,
194               e.g.:
195
196               ./p0f -o /var/log/p0f.log -d 2>>/var/log/p0f.error
197
198               Note that if -d is specified and stderr points to a TTY, error
199               messages will be lost.
200
201   -u user   - causes p0f to drop privileges, switching to the specified user
202               and chroot()ing itself to said user's home directory.
203
204               This mode is *highly* advisable (but not required) on un*x
205               systems, especially in daemon mode. See section 7 for more info.
206
207More arcane settings (you probably don't need to touch these):
208
209  -p         - puts the interface specified with -i in promiscuous mode. If
210               supported by the firmware, the card will also process frames not
211               addressed to it.
212
213  -S num     - sets the maximum number of simultaneous API connections. The
214               default is 20; the upper cap is 100.
215
216  -m c,h     - sets the maximum number of connections (c) and hosts (h) to be
217               tracked at the same time (default: c = 1,000, h = 10,000). Once
218               the limit is reached, the oldest 10% entries gets pruned to make
219               room for new data.
220
221               This setting effectively controls the memory footprint of p0f.
222               The cost of tracking a single host is under 400 bytes; active
223               connections have a worst-case footprint of about 18 kB. High
224               limits have some CPU impact, too, by the virtue of complicating
225               data lookups in the cache.
226
227               NOTE: P0f tracks connections only until the handshake is done,
228               and if protocol-level fingerprinting is possible, until few
229               initial kilobytes of data have been exchanged. This means that
230               most connections are dropped from the cache in under 5 seconds;
231               consequently, the 'c' variable can be much lower than the real
232               number of parallel connections happening on the wire.
233
234  -t c,h     - sets the timeout for collecting signatures for any connection
235               (c); and for purging idle hosts from in-memory cache (h). The
236               first parameter is given in seconds, and defaults to 30 s; the
237               second one is in minutes, and defaults to 120 min.
238
239               The first value must be just high enough to reliably capture
240               SYN, SYN+ACK, and the initial few kB of traffic. Low-performance
241               sites may want to increase it slightly.
242
243               The second value governs for how long API queries about a
244               previously seen host can be made; and what's the maximum interval
245               between signatures to still trigger NAT detection and so on.
246               Raising it is usually not advisable; lowering it to 5-10 minutes
247               may make sense for high-traffic servers, where it is possible to
248               see several unrelated visitors subsequently obtaining the same
249               dynamic IP from their ISP.
250
251Well, that's about it. You probably need to run the tool as root. Some of the
252most common use cases:
253
254# ./p0f -i eth0
255
256# ./p0f -i eth0 -d -u p0f-user -o /var/log/p0f.log
257
258# ./p0f -r some_capture.cap
259
260The greppable log format (-o) uses pipe ('|') as a delimiter, with name=value
261pairs describing the signature in a manner very similar to the pretty-printed
262output generated on stdout:
263
264[2012/01/04 10:26:14] mod=mtu|cli=1.2.3.4/1234|srv=4.3.2.1/80|subj=cli|link=DSL|raw_mtu=1492
265
266The 'mod' parameter identifies the subsystem that generated the entry; the
267'cli' and 'srv' parameters always describe the direction in which the TCP
268session is established; and 'subj' describes which of these two parties is
269actually being fingerprinted.
270
271Command-line options may be followed by a single parameter containing a
272pcap-style traffic filtering rule. This allows you to reject some of the less
273interesting packets for performance or privacy reasons. Simple examples include:
274
275  'dst net 10.0.0.0/8 and port 80'
276
277  'not src host 10.1.2.3'
278
279  'port 22 or port 443'
280
281You can read more about the supported syntax by doing 'man pcap-fiter'; if
282that fails, try this URL:
283
284  http://www.manpagez.com/man/7/pcap-filter/
285
286Filters work both for online capture (-i) and for previously collected data
287produced by any other tool (-r).
288
289-------------
2904. API access
291-------------
292
293The API allows other applications running on the same system to get p0f's
294current opinion about a particular host. This is useful for integrating it with
295spam filters, web apps, and so on.
296
297Clients are welcome to connect to the unix socket specified with -s using the
298SOCK_STREAM protocol, and may issue any number of fixed-length queries. The
299queries will be answered in the order they are received.
300
301Note that there is no response caching, nor any software limits in place on p0f
302end, so it is your responsibility to write reasonably well-behaved clients.
303
304Queries have exactly 21 bytes. The format is:
305
306  - Magic dword (0x50304601), in native endian of the platform.
307
308  - Address type byte: 4 for IPv4, 6 for IPv6.
309
310  - 16 bytes of address data, network endian. IPv4 addresses should be
311    aligned to the left.
312
313To such a query, p0f responds with:
314
315  - Another magic dword (0x50304602), native endian.
316
317  - Status dword: 0x00 for 'bad query', 0x10 for 'OK', and 0x20 for 'no match'.
318
319  - Host information, valid only if status is 'OK' (byte width in square
320    brackets):
321
322    [4]  first_seen  - unix time (seconds) of first observation of the host.
323
324    [4]  last_seen   - unix time (seconds) of most recent traffic.
325
326    [4]  total_conn  - total number of connections seen.
327
328    [4]  uptime_min  - calculated system uptime, in minutes. Zero if not known.
329
330    [4]  up_mod_days - uptime wrap-around interval, in days.
331
332    [4]  last_nat    - time of the most recent detection of IP sharing (NAT,
333                       load balancing, proxying). Zero if never detected.
334
335    [4]  last_chg    - time of the most recent individual OS mismatch (e.g.,
336                       due to multiboot or IP reuse).
337
338    [2]  distance    - system distance (derived from TTL; -1 if no data).
339
340    [1]  bad_sw      - p0f thinks the User-Agent or Server strings aren't
341                       accurate. The value of 1 means OS difference (possibly
342                       due to proxying), while 2 means an outright mismatch.
343
344                       NOTE: If User-Agent is not present at all, this value
345                       stays at 0.
346
347    [1]  os_match_q  - OS match quality: 0 for a normal match; 1 for fuzzy
348                       (e.g., TTL or DF difference); 2 for a generic signature;
349                       and 3 for both.
350
351    [32] os_name     - NUL-terminated name of the most recent positively matched
352                       OS. If OS not known, os_name[0] is NUL.
353
354                       NOTE: If the host is first seen using an known system and
355                       then switches to an unknown one, this field is not
356                       reset.
357
358    [32] os_flavor   - OS version. May be empty if no data.
359
360    [32] http_name   - most recent positively identified HTTP application
361                       (e.g. 'Firefox').
362
363    [32] http_flavor - version of the HTTP application, if any.
364
365    [32] link_type   - network link type, if recognized.
366
367    [32] language    - system language, if recognized.
368
369A simple reference implementation of an API client is provided in p0f-client.c.
370Implementations in C / C++ may reuse api.h from p0f source code, too.
371
372Developers using the API should be aware of several important constraints:
373
374  - The maximum number of simultaneous API connections is capped to 20. The
375    limit may be adjusted with the -S parameter, but rampant parallelism may
376    lead to poorly controlled latency; consider a single query pipeline,
377    possibly with prioritization and caching.
378
379  - The maximum number of hosts and connections tracked at any given time is
380    subject to configurable limits. You should look at your traffic stats and
381    see if the defaults are suitable.
382
383    You should also keep in mind that whenever you are subject to an ongoing
384    DDoS or SYN spoofing DoS attack, p0f may end up dropping entries faster
385    than you could query for them. It's that or running out of memory, so
386    don't fret.
387
388  - Cache entries with no activity for more than 120 minutes will be dropped
389    even if the cache is nearly empty. The timeout is adjustable with -t, but
390    you should not use the API to obtain ancient data; if you routinely need to
391    go back hours or days, parse the logs instead of wasting RAM.
392
393-----------------------
3945. Fingerprint database
395-----------------------
396
397Whenever p0f obtains a fingerprint from the observed traffic, it defers to
398the data read from p0f.fp to identify the operating system and obtain some
399ancillary data needed for other analysis tasks. The fingerprint database is a
400simple text file where lines starting with ; are ignored.
401
402== Module specification ==
403
404The file is split into sections based on the type of traffic the fingerprints
405apply to. Section identifiers are enclosed in square brackets, like so:
406
407[module:direction]
408
409  module     - the name of the fingerprinting module (e.g. 'tcp' or 'http').
410
411  direction  - the direction of fingerprinted traffic: 'request' (from client to
412               server) or 'response' (from server to client).
413
414               For the TCP module, 'client' matches the initial SYN; and
415               'server' matches SYN+ACK.
416
417The 'direction' part is omitted for MTU signatures, as they work equally well
418both ways.
419
420== Signature groups ==
421
422The actual signatures must be preceeded by an 'label' line, describing the
423fingerprinted software:
424
425label = type:class:name:flavor
426
427  type       - some signatures in p0f.fp offer broad, last-resort matching for
428               less researched corner cases. The goal there is to give an
429               answer slightly better than "unknown", but less precise than
430               what the user may be expecting.
431
432               Normal, reasonably specific signatures that can't be radically
433               improved should have their type specified as 's'; while generic,
434               last-resort ones should be tagged with 'g'.
435
436               Note that generic signatures are considered only if no specific
437               matches are found in the database.
438
439  class      - the tool needs to distinguish between OS-identifying signatures
440               (only one of which should be matched for any given host) and
441               signatures that just identify user applications (many of which
442               may be seen concurrently).
443
444               To assist with this, OS-specific signatures should specify the
445               OS architecture family here (e.g., 'win', 'unix', 'cisco'); while
446               application-related sigs (NMap, MSIE, Apache) should use a
447               special value of '!'.
448
449               Most TCP signatures are OS-specific, and should have OS family
450               defined. Other signatures, such as HTTP, should use '!' unless
451               the fingerprinted component is deeply intertwined with the
452               platform (e.g., Windows Update).
453
454               NOTE: To avoid variations (e.g. 'win' and 'windows' or 'unix'
455               and 'linux'), all classes need to be pre-registered using a
456               'classes' directive, seen near the beginning of p0f.fp.
457
458  name       - a human-readable short name for what the fingerprint actually
459               helps identify - say, 'Linux', 'Sendmail', or 'NMap'. The tool
460               doesn't care about the exact value, but requires consistency - so
461               don't switch between 'Internet Explorer' and 'MSIE', or 'MacOS'
462               and 'Mac OS'.
463
464  flavor     - anything you want to say to further qualify the observation. Can
465               be the version of the identified software, or a description of
466               what the application seems to be doing (e.g. 'SYN scan' for NMap).
467
468               NOTE: Don't be too specific: if you have a signature for Apache
469               2.2.16, but have no reason to suspect that other recent versions
470               behave in a radically different way, just say '2.x'.
471
472P0f uses labels to group similar signatures that may be plausibly generated by
473the same system or application, and should not be considered a strong signal for
474NAT detection.
475
476To further assist the tool in deciding which OS and application combinations are
477reasonable, and which ones are indicative of foul play, any 'label' line for
478applications (class '!') should be followed by a comma-delimited list of OS
479names or @-prefixed OS architecture classes on which this software is known to
480be used on. For example:
481
482label = s:!:Uncle John's Networked ls Utility:2.3.0.1
483sys   = Linux,FreeBSD,OpenBSD
484
485...or:
486
487label = s:!:Mom's Homestyle Browser:1.x
488sys = @unix,@win
489
490The label can be followed by any number of module-specific signatures; all of
491them will be linked to the most recent label, and will be reported the same
492way.
493
494All sections except for 'name' are omitted for [mtu] signatures, which do not
495convey any OS-specific information, and just describe link types.
496
497== MTU signatures ==
498
499Many operating systems derive the maximum segment size specified in TCP options
500from the MTU of their network interface; that value, in turn, normally depends
501on the design of the link-layer protocol. A different MTU is associated with
502PPPoE, a different one with IPSec, and a different one with Juniper VPN.
503
504The format of the signatures in the [mtu] section is exceedingly simple,
505consisting just of a description and a list of values:
506
507label = Ethernet
508sig   = 1500
509
510These will be matched for any wildcard MSS TCP packets (see below) not generated
511by userspace TCP tools.
512
513== TCP signatures ==
514
515For TCP traffic, signature layout is as follows:
516
517sig = ver:ittl:olen:mss:wsize,scale:olayout:quirks:pclass
518
519  ver        - signature for IPv4 ('4'), IPv6 ('6'), or both ('*').
520
521               NEW SIGNATURES: P0f documents the protocol observed on the wire,
522               but you should replace it with '*' unless you have observed some
523               actual differences between IPv4 and IPv6 traffic, or unless the
524               software supports only one of these versions to begin with.
525
526  ittl       - initial TTL used by the OS. Almost all operating systems use
527               64, 128, or 255; ancient versions of Windows sometimes used
528               32, and several obscure systems sometimes resort to odd values
529               such as 60.
530
531               NEW SIGNATURES: P0f will usually suggest something, using the
532               format of 'observed_ttl+distance' (e.g. 54+10). Consider using
533               traceroute to check that the distance is accurate, then sum up
534               the values. If initial TTL can't be guessed, p0f will output
535               'nnn+?', and you need to use traceroute to estimate the '?'.
536
537               A handful of userspace tools will generate random TTLs. In these
538               cases, determine maximum initial TTL and then add a - suffix to
539               the value to avoid confusion.
540
541  olen       - length of IPv4 options or IPv6 extension headers. Usually zero
542               for normal IPv4 traffic; always zero for IPv6 due to the
543               limitations of libpcap.
544
545               NEW SIGNATURES: Copy p0f output literally.
546
547  mss        - maximum segment size, if specified in TCP options. Special value
548               of '*' can be used to denote that MSS varies depending on the
549               parameters of sender's network link, and should not be a part of
550               the signature. In this case, MSS will be used to guess the
551               type of network hookup according to the [mtu] rules.
552
553               NEW SIGNATURES: Use '*' for any commodity OSes where MSS is
554               around 1300 - 1500, unless you know for sure that it's fixed.
555               If the value is outside that range, you can probably copy it
556               literally.
557
558  wsize      - window size. Can be expressed as a fixed value, but many
559               operating systems set it to a multiple of MSS or MTU, or a
560               multiple of some random integer. P0f automatically detects these
561               cases, and allows notation such as 'mss*4', 'mtu*4', or '%8192'
562               to be used. Wilcard ('*') is possible too.
563
564               NEW SIGNATURES: Copy p0f output literally. If frequent variations
565               are seen, look for obvious patterns. If there are no patterns,
566               '*' is a possible alternative.
567
568  scale      - window scaling factor, if specified in TCP options. Fixed value
569               or '*'.
570
571               NEW SIGNATURES: Copy literally, unless the value varies randomly.
572               Many systems alter between 2 or 3 scaling factors, in which case,
573               it's better to have several 'sig' lines, rather than a wildcard.
574
575  olayout    - comma-delimited layout and ordering of TCP options, if any. This
576               is one of the most valuable TCP fingerprinting signals. Supported
577               values:
578
579               eol+n  - explicit end of options, followed by n bytes of padding
580               nop    - no-op option
581               mss    - maximum segment size
582               ws     - window scaling
583               sok    - selective ACK permitted
584               sack   - selective ACK (should not be seen)
585               ts     - timestamp
586               ?n     - unknown option ID n
587
588               NEW SIGNATURES: Copy this string literally.
589
590  quirks     - comma-delimited properties and quirks observed in IP or TCP
591               headers:
592
593               df     - "don't fragment" set (probably PMTUD); ignored for IPv6
594               id+    - DF set but IPID non-zero; ignored for IPv6
595               id-    - DF not set but IPID is zero; ignored for IPv6
596               ecn    - explicit congestion notification support
597               0+     - "must be zero" field not zero; ignored for IPv6
598               flow   - non-zero IPv6 flow ID; ignored for IPv4
599
600               seq-   - sequence number is zero
601               ack+   - ACK number is non-zero, but ACK flag not set
602               ack-   - ACK number is zero, but ACK flag set
603               uptr+  - URG pointer is non-zero, but URG flag not set
604               urgf+  - URG flag used
605               pushf+ - PUSH flag used
606
607               ts1-   - own timestamp specified as zero
608               ts2+   - non-zero peer timestamp on initial SYN
609               opt+   - trailing non-zero data in options segment
610               exws   - excessive window scaling factor (> 14)
611               bad    - malformed TCP options
612
613               If a signature scoped to both IPv4 and IPv6 contains quirks valid
614               for just one of these protocols, such quirks will be ignored for
615               on packets using the other protocol. For example, any combination
616               of 'df', 'id+', and 'id-' is always matched by any IPv6 packet.
617
618               NEW SIGNATURES: Copy literally.
619
620  pclass     - payload size classification: '0' for zero, '+' for non-zero,
621               '*' for any. The packets we fingerprint right now normally have
622               no payloads, but some corner cases exist.
623
624               NEW SIGNATURES: Copy literally.
625
626NOTE: The TCP module allows some fuzziness when an exact match can't be found:
627'df' and 'id+' quirks are allowed to disappear; 'id-' or 'ecn' may appear; and
628TTLs can change.
629
630To gather new SYN ('request') signatures, simply connect to the fingerprinted
631system, and p0f will provide you with the necessary data. To gather SYN+ACK
632('response') signatures, you should use the bundled p0f-sendsyn utility while p0f
633is running in the background; creating them manually is not advisable.
634
635== HTTP signatures ==
636
637A special directive should appear at the beginning of the [http:request]
638section, structured the following way:
639
640ua_os = Linux,Windows,iOS=[iPad],iOS=[iPhone],Mac OS X,...
641
642This list should specify OS names that should be looked for within the
643User-Agent string if the string is otherwise deemed to be honest. This input
644is not used for fingerprinting, but aids NAT detection in some useful ways.
645
646The names have to match the names used in 'sig' specifiers across p0f.fp. If a
647particular name used by p0f differs from what typically appears in User-Agent,
648the name=[string] syntax may be used to define any number of aliases.
649
650Other than that, HTTP signatures for GET and HEAD requests have the following
651layout:
652
653sig = ver:horder:habsent:expsw
654
655  ver        - 0 for HTTP/1.0, 1 for HTTP/1.1, or '*' for any.
656
657               NEW SIGNATURES: Copy the value literally, unless you have a
658               specific reason to do otherwise.
659
660  horder     - comma-separated, ordered list of headers that should appear in
661               matching traffic. Substrings to match within each of these
662               headers may be specified using a name=[value] notation.
663
664               The signature will be matched even if other headers appear in
665               between, as long as the list itself is matched in the specified
666               sequence.
667
668               Headers that usually do appear in the traffic, but may go away
669               (e.g. Accept-Language if the user has no languages defined, or
670               Referer if no referring site exists) should be prefixed with '?',
671               e.g. "?Referer". P0f will accept their disappearance, but will
672               not allow them to appear at any other location.
673
674               NEW SIGNATURES: Review the list and remove any headers that
675               appear to be irrelevant to the fingerprinted software, and mark
676               transient ones with '?'. Remove header values that do not add
677               anything to the signature, or are request- or user-specific.
678               In particular, pay attention to Accept, Accept-Language, and
679               Accept-Charset, as they are highly specific to request type
680               and user settings.
681
682               P0f automatically removes some headers, prefixes others with '?',
683               and inhibits the value of fields such as 'Referer' or 'Cookie' -
684               but this is not a substitute for manual review.
685
686               NOTE: Server signatures may differ depending on the request
687               (HTTP/1.1 versus 1.0, keep-alive versus one-shot, etc) and on the
688               returned resource (e.g., CGI versus static content). Play around,
689               browse to several URLs, also try curl and wget.
690
691  habsent    - comma-separated list of headers that must *not* appear in
692               matching traffic. This is particularly useful for noting the
693               absence of standard headers (e.g. 'Host'), or for differentiating
694               between otherwise very similar signatures.
695
696               NEW SIGNATURES: P0f will automatically highlight the absence of
697               any normally present headers; other entries may be added where
698               necessary.
699
700  expsw      - expected substring in 'User-Agent' or 'Server'. This is not
701               used to match traffic, and merely serves to detect dishonest
702               software. If you want to explicitly match User-Agent, you need
703               to do this in the 'horder' section, e.g.:
704
705               User-Agent=[Firefox]
706
707Any of these sections sections except for 'ver' may be blank.
708
709There are many protocol-level quirks that p0f could be detecting - for example,
710the use of non-standard newlines, or missing or extra spacing between header
711field names and values. There is also some information to be gathered from
712responses to OPTIONS or POST. That said, it does not seem to be worth the
713effort: the protocol is so verbose, and implemented so arbitrarily, that we are
714getting more than enough information just with a simple GET / HEAD fingerprint.
715
716== SMTP signatures ==
717
718   *** NOT IMPLEMENTED YET ***
719
720== FTP signatures ==
721
722   *** NOT IMPLEMENTED YET ***
723
724----------------
7256. NAT detection
726----------------
727
728In addition to fairly straightforward measurements of intrinsic properties of
729a single TCP session, p0f also tries to compare signatures across sessions to
730detect client-side connection sharing (NAT, HTTP proxies) or server-side load
731balancing.
732
733This is done in two steps: the first significant deviation usually prompts a
734"host change" entry (which may be also indicative of multi-boot, address reuse,
735or other one-off events); and a persistent pattern of changes prompts an
736"ip sharing" notification later on.
737
738All of these messages are accompanied by a set of reason codes:
739
740  os_sig       - the OS detected right now doesn't match the OS detected earlier
741                 on.
742
743  sig_diff     - no definite OS detection data available, but protocol-level
744                 characteristics have changed drastically (e.g., different
745                 TCP option layout).
746
747  app_vs_os    - the application detected running on the host is not supposed
748                 to work on the host's operating system.
749
750  x_known      - the signature progressed from known to unknown, or vice versa.
751
752The following additional codes are specific to TCP:
753
754  tstamp       - TCP timestamps went back or jumped forward.
755
756  ttl          - TTL values have changed.
757
758  port         - source port number has decreased.
759
760  mtu          - system MTU has changed.
761
762  fuzzy        - the precision with which a TCP signature is matched has
763                 changed.
764
765The following code is also issued by the HTTP module:
766
767  via          - data explicitly includes Via / X-Forwarded-For.
768
769  us_vs_os     - OS fingerprint doesn't match User-Agent data, and the
770                 User-Agent value otherwise looks honest.
771
772  app_srv_lb   - server application signatures change, suggesting load
773                 balancing.
774
775  date         - server-advertised date changes inconsistently.
776
777Different reasons have different weights, balanced to keep p0f very sensitive
778even to very homogenous environments behind NAT. If you end up seeing false
779positives or other detection problems in your environment, please let me know!
780
781-----------
7827. Security
783-----------
784
785You should treat the output from this tool as advisory; the fingerprinting can
786be gambled with some minor effort, and it's also possible to evade it altogether
787(e.g. with excessive IP fragmentation or bad TCP checksums). Plan accordingly.
788
789P0f should to be reasonably secure to operate as a daemon. That said, un*x
790users should employ the -u option to drop privileges and chroot() when running
791the tool continuously. This greatly minimizes the consequences of any mishaps -
792and mishaps in C just tend to happen.
793
794To make this step meaningful, the user you are running p0f as should be
795completely unprivileged, and should have an empty, read-only home directory. For
796example, you can do:
797
798# useradd -d /var/empty/p0f -M -r -s /bin/nologin p0f-user
799# mkdir -p -m 755 /var/empty/p0f
800
801Please don't put the p0f binary itself, or any other valuable assets, inside
802that user's home directory; and certainly do not use any generic locations such
803as / or /bin/ in lieu of a proper home.
804
805P0f running in the background should be fairly difficult to DoS, especially
806compared to any real TCP services it will be watching. Nevertheless, there are
807so many deployment-specific factors at play that you should always preemptively
808stress-test your setup, and see how it behaves.
809
810Other than that, let's talk filesystem security. When using the tool in the
811API mode (-s), the listening socket is always re-created created with 666
812permissions, so that applications running as other uids can query it at will.
813If you want to preserve the privacy of captured traffic in a multi-user system,
814please ensure that the socket is created in a directory with finer-grained
815permissions; or change API_MODE in config.h.
816
817The default file mode for binary log data (-o) is 600, on the account that
818others probably don't need access to historical data; if you need to share logs,
819you can pre-create the file or change LOG_MODE in config.h.
820
821Don't build p0f, and do not store its source, binary, configuration files, logs,
822or query sockets in world-writable locations such as /tmp (or any
823subdirectories created therein).
824
825Last but not least, please do not attempt to make p0f setuid, or otherwise
826grant it privileges higher than these of the calling user. Neither the tool
827itself, nor the third-party components it depends on, are designed to keep rogue
828less-privileged callers at bay. If you use /usr/local/etc/sudoers to list p0f as the only
829program that user X should be able to run as root, that user will probably be
830able to compromise your system. The same goes for many other uses of sudo, by
831the way.
832
833--------------
8348. Limitations
835--------------
836
837Here are some of the known issues you may run into:
838
839== General ==
840
8411) RST, ACK, and other experimental fingerprinting modes offered in p0f v2 are
842   no longer supported in v3. This is because they proved to have very low
843   specificity. The consequence is that you can no longer fingerprint
844   "connection refused" responses.
845
8462) API queries or daemon execution are not supported when reading offline pcaps.
847   While there may be some fringe use cases for that, offline pcaps use a
848   much simpler event loop, and so supporting these features would require some
849   extra effort.
850
8513) P0f needs to observe at least about 25 milliseconds worth of qualifying
852   traffic to estimate system uptime. This means that if you're testing it over
853   loopback or LAN, you may need to let it see more than one connection.
854
855   Systems with extremely slow timestamp clocks may need longer acquisition
856   periods (up to several seconds); very fast clocks (over 1.5 kHz) are rejected
857   completely on account of being prohibited by the RFC. Almost all OSes are
858   between 100 Hz and 1 kHz, which should work fine.
859
8604) Some systems vary SYN+ACK responses based on the contents of the initial SYN,
861   sometimes removing TCP options not supported by the other endpoint.
862   Unfortunately, there is no easy way to account for this, so several SYN+ACK
863   signatures may be required per system. The bundled p0f-sendsyn utility helps
864   with collecting them.
865
866   Another consequence of this is that you will sometimes see server uptime only
867   if your own system has RFC1323 timestamps enabled. Linux does that since
868   version 2.2; on Windows, you need version 7 or newer. Client uptimes are not
869   affected.
870
871== Windows port ==
872
8731) API sockets do not work on Windows. This is due to a limitation of winpcap;
874   see live_event_loop(...) in p0f.c for more info.
875
8762) The chroot() jail (-u) on Windows doesn't offer any real security. This is
877   due to the limitations of cygwin.
878
8793) The p0f-sendsyn utility doesn't work because of the limited capabilities of
880   Windows raw sockets (this should be relatively easy to fix if there are any
881   users who care).
882
883---------------------------
8849. Acknowledgments and more
885---------------------------
886
887P0f is made possible thanks to the contributions of several good souls,
888including:
889
890  Phil Ames
891  Jannich Brendle
892  Matthew Dempsky
893  Jason DePriest
894  Dalibor Dukic
895  Mark Martinec
896  Damien Miller
897  Josh Newton
898  Nibbler
899  Bernhard Rabe
900  Chris John Riley
901  Sebastian Roschke
902  Peter Valchev
903  Jeff Weisberg
904  Anthony Howe
905  Tomoyuki Murakami
906  Michael Petch
907
908If you wish to help, the most immediate way to do so is to simply gather new
909signatures, especially from less popular or older platforms (servers, networking
910equipment, portable / embedded / specialty OSes, etc).
911
912Problems? Suggestions? Complaints? Compliments? You can reach the author at
913<lcamtuf@coredump.cx>. The author is very lonely and appreciates your mail.
914