1=pod
2
3=head1 NAME
4
5B<rwuniq> - Bin SiLK Flow records by a key and print each bin's volume
6
7=head1 SYNOPSIS
8
9  rwuniq --fields=KEY [--values=VALUES]
10        [{--threshold=MIN-MAX | --threshold=MIN}]
11        [--presorted-input] [--sort-output]
12        [{--bin-time=SECONDS | --bin-time}]
13        [--timestamp-format=FORMAT] [--epoch-time]
14        [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
15        [--integer-sensors] [--integer-tcp-flags]
16        [--no-titles] [--no-columns] [--column-separator=CHAR]
17        [--no-final-delimiter] [{--delimited | --delimited=CHAR}]
18        [--print-filenames] [--copy-input=PATH] [--output-path=PATH]
19        [--pager=PAGER_PROG] [--temp-directory=DIR_PATH]
20        [{--legacy-timestamps | --legacy-timestamps={1,0}}]
21        [--all-counts] [{--bytes | --bytes=MIN | --bytes=MIN-MAX}]
22        [{--packets | --packets=MIN | --packets=MIN-MAX}]
23        [{--flows | --flows=MIN | --flows=MIN-MAX}]
24        [--stime] [--etime]
25        [{--sip-distinct | --sip-distinct=MIN | --sip-distinct=MIN-MAX}]
26        [{--dip-distinct | --dip-distinct=MIN | --dip-distinct=MIN-MAX}]
27        [--ipv6-policy={ignore,asv4,mix,force,only}]
28        [--site-config-file=FILENAME]
29        [--plugin=PLUGIN [--plugin=PLUGIN ...]]
30        [--python-file=PATH [--python-file=PATH ...]]
31        [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
32        [--pmap-column-width=NUM]
33        {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
34
35  rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
36        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help
37
38  rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
39        [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
40
41  rwuniq --version
42
43=head1 DESCRIPTION
44
45B<rwuniq> reads SiLK Flow records and groups them by a key composed of
46user-specified attributes of the flows.  For each group (or bin), a
47collection of user-specified I<aggregate values> is computed; these
48values are typically related to the volume of the bin, such as the sum
49of the bytes fields for all records that match the key.  Once all the
50SiLK Flow records are read, the key fields and the aggregate values
51are printed.  For some of the built-in aggregate values, it is
52possible to limit the output to the bins where the aggregate value
53meets a user-specified minimum and/or maximum.
54
55There is no need to sort the input to B<rwuniq> since B<rwuniq>
56normally rearranges the records as they are read.  To have B<rwuniq>
57sort its output, use the B<--sort-output> switch.
58
59B<rwuniq> reads SiLK Flow records from the files named on the command
60line or from the standard input when no file names are specified and
61B<--xargs> is not present.  To read the standard input in addition to
62the named files, use C<-> or C<stdin> as a file name.  If an input
63file name ends in C<.gz>, the file is uncompressed as it is read.
64When the B<--xargs> switch is provided, B<rwuniq> reads the names of
65the files to process from the named text file or from the standard
66input if no file name argument is provided to the switch.  The input
67to B<--xargs> must contain one file name per line.
68
69The user must provide the B<--fields> switch to select the flow
70attribute(s) (or field(s)) that comprise the key for each bin.  The
71available fields are similar to those supported by B<rwcut(1)>; see
72the description of the B<--fields> switch in the L</OPTIONS> section
73below for the details.  The list of fields can be extended by loading
74PySiLK files (see B<silkpython(3)>) or plug-ins (B<silk-plugin(3)>).
75The fields are printed in the order in which they occur in the
76B<--fields> switch.  The size of the key is limited to 256 octets.  A
77larger key more quickly uses the available the memory leading to
78slower performance.
79
80The aggregate value(s) to compute for each bin are also chosen by the
81user.  As with the key fields, the user can extend the list of
82aggregate fields by using PySiLK or plug-ins.  Specify the aggregate
83fields with the B<--values> switch; the aggregate fields are printed
84in the order they occur in the B<--values> switch.  If the user does
85not provide B<--values> or a B<--threshold> switch (described next),
86B<rwuniq> defaults to computing the number of flow records for each
87bin.  As with the key fields, requesting more aggregate values slows
88performance.
89
90The B<--threshold> switch (added in SiLK 3.17.0) allows the user to
91print only bins where a value field is within a certain range.  The
92switch's argument contains the name of the value field, an equals
93sign, the minimum value (start of the range), and optionally a hyphen
94and the maximum value (end of the range); e.g.,
95C<--threshold=bytes=1000-2000>.  The upper bound is unlimited when no
96maximum is specified.  The B<--threshold> switch may be repeated to
97set multiple thresholds, and only those bins that meet all thresholds
98are printed.  Each field named by B<--threshold> is appended to the
99set of aggregate value fields unless that field was named in the
100B<--values> switch.
101
102The B<--presorted-input> switch may allow B<rwuniq> to process data
103more efficiently by causing B<rwuniq> to assume the input has been
104previously sorted with the B<rwsort(1)> command.  With this switch,
105B<rwuniq> typically does not need large amounts of memory because it does not
106bin each flow; instead, it keeps a running summation and outputs the
107bin whenever the key changes.  For the output to be meaningful,
108B<rwsort> and B<rwuniq> I<must> be invoked with the same B<--fields>
109value.  When multiple input files are specified and
110B<--presorted-input> is given, B<rwuniq> merge-sorts the flow
111records from the input files.  B<rwuniq> typically runs faster if
112you do I<not> include the B<--presorted-input> switch when counting
113distinct values, even when reading sorted input.  Finally, you
114may get unusual results with B<--presorted-input> when the B<--fields>
115switch contains multiple time-related key fields (C<sTime>,
116C<duration>, C<eTime>), or when the time-related key is not the final
117key listed in B<--fields>; see the L</NOTES> section for details.
118
119B<rwuniq> attempts to keep all key and aggregate value data in the
120computer's memory.  If B<rwuniq> runs out of memory, the current key
121and aggregate value data is written to a temporary file.  Once all
122input has been processed, the data from the temporary files is merged
123to produce the final output.  By default, these temporary files are
124stored in the F</tmp> directory.  Because these files can be large, it
125is strongly recommended that F</tmp> I<not> be used as the temporary
126directory.  To modify the temporary directory used by B<rwuniq>,
127provide the B<--temp-directory> switch, set the SILK_TMPDIR
128environment variable, or set the TMPDIR environment variable.
129
130=head1 OPTIONS
131
132Option names may be abbreviated if the abbreviation is unique or is an
133exact match for an option.  A parameter to an option may be specified
134as B<--arg>=I<param> or B<--arg> I<param>, though the first form is
135required for options that take optional parameters.
136
137The B<--fields> switch is required.  B<rwuniq> fails when it is
138not provided.
139
140=over 4
141
142=item B<--fields>=I<KEY>
143
144I<KEY> contains the list of flow attributes (a.k.a. fields or columns)
145that make up the key into which flows are binned.  The columns are
146displayed in the order the fields are specified.  Each field may be
147specified once only.  I<KEY> is a comma separated list of field-names,
148field-integers, and ranges of field-integers; a range is specified by
149separating the start and end of the range with a hyphen (B<->).
150Field-names are case insensitive.  Example:
151
152 --fields=stime,10,1-5
153
154There is no default value for the B<--fields> switch; the switch must
155be specified.
156
157The complete list of built-in fields that the SiLK tool suite supports
158follows, though note that not all fields are present in all SiLK file
159formats; when a field is not present, its value is 0.
160
161=over 4
162
163=item sIP,1
164
165source IP address
166
167=item dIP,2
168
169destination IP address
170
171=item sPort,3
172
173source port for TCP and UDP, or equivalent
174
175=item dPort,4
176
177destination port for TCP and UDP, or equivalent.  See note at C<iType>.
178
179=item protocol,5
180
181IP protocol
182
183=item packets,pkts,6
184
185packet count
186
187=item bytes,7
188
189byte count
190
191=item flags,8
192
193bit-wise OR of TCP flags over all packets
194
195=item sTime,9
196
197starting time of flow (seconds resolution unless B<--bin-time>
198includes fractional seconds). When the time-related fields
199C<sTime>,C<duration>,C<eTime> are all in use, B<rwuniq> ignores the
200final time field when binning the records.
201
202=item duration,10
203
204duration of flow (seconds resolution unless B<--bin-time> includes
205fractional seconds).  This field is not adjusted by B<--bin-time>
206unless B<--fields> includes both C<sTime> and C<eTime>.  See note at
207C<sTime,9>.
208
209=item eTime,11
210
211end time of flow (seconds resolution unless B<--bin-time> includes
212fractional seconds).  See note at C<sTime,9>.
213
214=item sensor,12
215
216name or ID of the sensor where the flow was collected
217
218=item class,20
219
220class assigned to the flow by B<rwflowpack(8)>.  Binning by C<class>
221and/or C<type> equates to binning by the integer value used internally
222to represent the class/type pair.  When B<--fields> contains C<class>
223but not C<type>, B<rwuniq>'s output contains multiple rows with the
224same value(s) for the key field(s).
225
226=item type,21
227
228type assigned to the flow by B<rwflowpack(8)>.  See note on previous
229entry.
230
231=item iType
232
233the ICMP type value for ICMP or ICMPv6 flows and empty (numerically
234zero) for non-ICMP flows.  Internally, SiLK stores the ICMP type and
235code in the C<dPort> field.  To avoid getting very odd results, either
236do not use the C<dPort> field when your key includes ICMP field(s) or
237be certain to include the C<protocol> field as part of your key.  This
238field was introduced in SiLK 3.8.1.
239
240=item iCode
241
242the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP
243flows.  See note at C<iType>.
244
245=item icmpTypeCode,25
246
247equivalent to C<iType>,C<iCode> when used in B<--fields>.  This field
248may not be mixed with C<iType> or C<iCode>, and this field is
249deprecated as of SiLK 3.8.1.  As of SiLK 3.8.1, C<icmpTypeCode> may no
250longer be used as the argument to the C<Distinct:> value field; the
251C<dPort> field provides an equivalent result as long as the input
252is limited to ICMP flow records.
253
254=back
255
256Many SiLK file formats do not store the following fields and their
257values are always be 0; they are listed here for completeness:
258
259=over 4
260
261=item in,13
262
263router SNMP input interface or vlanId if packing tools were
264configured to capture it (see B<sensor.conf(5)>)
265
266=item out,14
267
268router SNMP output interface or postVlanId
269
270=item nhIP,15
271
272router next hop IP
273
274=back
275
276SiLK can store flows generated by enhanced collection software that
277provides more information than NetFlow v5.  These flows may support
278some or all of these additional fields; for flows without this
279additional information, the field's value is always 0.
280
281=over 4
282
283=item initialFlags,26
284
285TCP flags on first packet in the flow
286
287=item sessionFlags,27
288
289bit-wise OR of TCP flags over all packets except the first in the flow
290
291=item attributes,28
292
293flow attributes set by the flow generator:
294
295=over 4
296
297=item C<S>
298
299all the packets in this flow record are exactly the same size
300
301=item C<F>
302
303flow generator saw additional packets in this flow following a packet
304with a FIN flag (excluding ACK packets)
305
306=item C<T>
307
308flow generator prematurely created a record for a long-running
309connection due to a timeout.  (When the flow generator B<yaf(1)> is
310run with the B<--silk> switch, it prematurely creates a flow and
311mark it with C<T> if the byte count of the flow cannot be stored in a
31232-bit value.)
313
314=item C<C>
315
316flow generator created this flow as a continuation of long-running
317connection, where the previous flow for this connection met a timeout
318(or a byte threshold in the case of B<yaf>).
319
320=back
321
322Consider a long-running ssh session that exceeds the flow generator's
323I<active> timeout.  (This is the active timeout since the flow
324generator creates a flow for a connection that still has activity).
325The flow generator will create multiple flow records for this ssh
326session, each spanning some portion of the total session.  The first
327flow record will be marked with a C<T> indicating that it hit the
328timeout.  The second through next-to-last records will be marked with
329C<TC> indicating that this flow both timed out and is a continuation
330of a flow that timed out.  The final flow will be marked with a C<C>,
331indicating that it was created as a continuation of an active flow.
332
333=item application,29
334
335guess as to the content of the flow.  Some software that generates flow
336records from packet data, such as B<yaf>, will inspect the contents of
337the packets that make up a flow and use traffic signatures to label
338the content of the flow.  SiLK calls this label the I<application>;
339B<yaf> refers to it as the I<appLabel>.  The application is the port
340number that is traditionally used for that type of traffic (see the
341F</etc/services> file on most UNIX systems).  For example, traffic
342that the flow generator recognizes as FTP will have a value of 21,
343even if that traffic is being routed through the standard HTTP/web
344S<port (80)>.
345
346=back
347
348The following fields provide a way to label the IPs or ports on a
349record.  These fields require external files to provide the mapping
350from the IP or port to the label:
351
352=over 4
353
354=item sType,16
355
356for the source IP address, the value 0 if the address is non-routable,
3571 if it is internal, or 2 if it is routable and external.  Uses the
358mapping file specified by the SILK_ADDRESS_TYPES environment variable,
359or the F<address_types.pmap> mapping file, as described in
360B<addrtype(3)>.
361
362=item dType,17
363
364as B<sType> for the destination IP address
365
366=item scc,18
367
368for the source IP address, a two-letter country code abbreviation
369denoting the country where that IP address is located.  Uses the
370mapping file specified by the SILK_COUNTRY_CODES environment variable,
371or the F<country_codes.pmap> mapping file, as described in
372B<ccfilter(3)>.  The abbreviations are those defined by ISO 3166-1
373(see for example L<https://www.iso.org/iso-3166-country-codes.html>
374or L<https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>) or the
375following special codes: B<--> N/A (e.g. private and experimental
376reserved addresses); B<a1> anonymous proxy; B<a2> satellite provider;
377B<o1> other
378
379=item dcc,19
380
381as B<scc> for the destination IP
382
383=item src-I<map-name>
384
385label contained in the prefix map file associated with I<map-name>.
386If the prefix map is for IP addresses, the label is that associated
387with the source IP address.  If the prefix map is for protocol/port
388pairs, the label is that associated with the protocol and source port.
389See also the description of the B<--pmap-file> switch below and the
390B<pmapfilter(3)> manual page.
391
392=item dst-I<map-name>
393
394as B<src-I<map-name>> for the destination IP address or the protocol
395and destination port.
396
397=item sval
398
399as B<src-I<map-name>> when no map-name is associated with the prefix
400map file
401
402=item dval
403
404as B<dst-I<map-name>> when no map-name is associated with the prefix
405map file
406
407=back
408
409Finally, the list of built-in fields may be augmented by the run-time
410loading of PySiLK code or plug-ins written in C (also called shared
411object files or dynamic libraries), as described by the
412B<--python-file> and B<--plugin> switches.
413
414=for comment
415##########################################################################
416# Whew!  We've finally reached the end of the --fields help
417
418=item B<--values>=I<VALUES>
419
420Specify the aggregate values to compute for each bin as a comma
421separated list of names.  Names are case insensitive.  When the
422B<--threshold> switch specifies an aggregate value field that does
423appear in I<VALUES>, that field is appended to I<VALUES>.  When
424neither the B<--values> switch nor any B<--threshold> switch is
425specified, B<rwuniq> counts the number of flow records for each bin.
426The aggregate fields are printed in the order they occur in I<VALUES>.
427The names of the built-in value fields follow.  This list can be
428augmented through the use of PySiLK and plug-ins.
429
430=over 4
431
432=item Records
433
434Count the number of flow records that mapped to each bin.
435
436=item Packets
437
438Sum the number of packets across all records that mapped to each bin.
439
440=item Bytes
441
442Sum the number of bytes across all records that mapped to each bin.
443
444=item sTime-Earliest
445
446Keep track of the earliest start time (minimum time) seen across all
447records that mapped to each bin, in seconds resolution.  The
448B<--bin-time> switch does not normally affect this value; however,
449this value uses milliseconds resolution when B<--bin-time> includes
450fractional seconds.
451
452=item eTime-Latest
453
454Keep track of the latest end time (maximum time) seen across all
455records that mapped to each bin, in seconds resolution.  The
456B<--bin-time> switch does not normally affect this value; however,
457this value uses milliseconds resolution when B<--bin-time> includes
458fractional seconds.
459
460=item sIP-Distinct
461
462Count the number of distinct source IP addresses that were seen for
463each bin, an alias for Distinct:sIP.
464
465=item dIP-Distinct
466
467Count the number of distinct destination IP addresses that were seen
468for each bin, an alias for Distinct:dIP.
469
470=item Distinct:I<KEY_FIELD>
471
472Count the number of distinct values for I<KEY_FIELD>, where
473I<KEY_FIELD> is any field that can be used as an argument to
474B<--fields> except C<icmpTypeCode>.  For example, C<Distinct:sPort>
475counts the number of distinct source ports for each bin.  When this
476aggregate value field is used, the specified I<KEY_FIELD> cannot be
477present in the argument to B<--fields>.
478
479=item Flows
480
481Count the number of flow records that mapped to each bin; an alias for
482Records.
483
484=back
485
486=item B<--plugin>=I<PLUGIN>
487
488Augment the list of key fields and/or aggregate value fields by using
489run-time loading of the plug-in (shared object) whose path is
490I<PLUGIN>.  The switch may be repeated to load multiple plug-ins.  The
491creation of plug-ins is described in the B<silk-plugin(3)> manual
492page.  When I<PLUGIN> does not contain a slash (C</>), B<rwuniq>
493attempts to find a file named I<PLUGIN> in the directories listed in
494the L</FILES> section.  If B<rwuniq> finds the file, it uses that
495path.  If I<PLUGIN> contains a slash or if B<rwuniq> does not find the
496file, B<rwuniq> relies on your operating system's B<dlopen(3)> call to
497find the file.  When the SILK_PLUGIN_DEBUG environment variable is
498non-empty, B<rwuniq> prints status messages to the standard error as
499it attempts to find and open each of its plug-ins.
500
501=item B<--threshold>=I<VALUE_FIELD>B<=>I<MIN>B<->I<MAX>
502
503=item B<--threshold>=I<VALUE_FIELD>B<=>I<MIN>
504
505Limit the output of B<rwuniq> to the bins where the value of the
506aggregate value field I<VALUE_FIELD> is not less than I<MIN> and not
507more than I<MAX>.  If I<MAX> is not given, limit the output to the
508bins where the value of I<VALUE_FIELD> is at least I<MIN>.  The
509I<VALUE_FIELD> argument is case insensitive and may be abbreviated to
510the shortest unique prefix.  This switch may be repeated to set
511thresholds for multiple fields, and B<rwuniq> only prints bins that
512meet all thresholds.  A I<MIN> of 0 is treated as 1.  If
513I<VALUE_FIELD> is not present in the argument to the B<--values>
514switch, it is appended to those aggregate values.  I<VALUE_FIELD> may
515be B<Records> (or B<Flows)>, B<Packets>, B<Bytes>, B<sIP-Distinct>,
516B<dIP-Distinct>, or B<Distinct:>I<KEY_FIELD>.  Setting thresholds for
517aggregate value fields defined by plug-ins is not supported.  I<Since
518SiLK 3.17.0.>
519
520=back
521
522Miscellaneous options:
523
524=over 4
525
526=item B<--presorted-input>
527
528Cause B<rwuniq> to assume that it is reading sorted input; i.e., that
529B<rwuniq>'s input file(s) were generated by B<rwsort(1)> using the
530I<exact same> value for the B<--fields> switch.  When no distinct
531counts are being computed, B<rwuniq> can process its input without
532needing to write temporary files.  When multiple input files are
533specified, B<rwuniq> merge-sorts the flow records from the input
534files.  See the L</NOTES> section for issues that may occur when using
535B<--presorted-input>.
536
537=item B<--sort-output>
538
539Cause B<rwuniq> to present the output in sorted numerical order.  The
540key B<rwuniq> uses for sorting is the same key it uses to index each
541bin.
542
543=item B<--bin-time>=I<SECONDS>
544
545=item B<--bin-time>
546
547Adjust the times in the key fields C<sTime> and C<eTime> to appear on
548I<SECONDS>-second boundaries (the floor of the time is used).  As of
549SiLK 3.17.0, I<SECONDS> may be a fractional value of 0.001 or greater,
550and B<rwuniq> uses millisecond timestamps when I<SECONDS> includes a
551fractional value that is non-zero.  When this switch is not specified,
552times appear on 1-second boundaries.  When the switch is used but no
553argument is given, B<rwuniq> uses 60-second time bins.  (When the
554start-time is the only key field and time binning is desired, consider
555using B<rwcount(1)> instead.)
556
557=item B<--timestamp-format>=I<FORMAT>
558
559Specify the format and/or timezone to use when printing timestamps.
560When this switch is not specified, the SILK_TIMESTAMP_FORMAT
561environment variable is checked for a default format and/or timezone.
562If it is empty or contains invalid values, timestamps are printed in
563the default format, and the timezone is UTC unless SiLK was compiled
564with local timezone support.  I<FORMAT> is a comma-separated list of a
565format and/or a timezone.  The format is one of:
566
567=over 4
568
569=item default
570
571Print the timestamps as C<I<YYYY>/I<MM>/I<DD>TI<hh>:I<mm>:I<ss>>.
572
573=item iso
574
575Print the timestamps as S<C<I<YYYY>-I<MM>-I<DD> I<hh>:I<mm>:I<ss>>>.
576
577=item m/d/y
578
579Print the timestamps as S<C<I<MM>/I<DD>/I<YYYY> I<hh>:I<mm>:I<ss>>>.
580
581=item epoch
582
583Print the timestamps as the number of seconds since 00:00:00 UTC on
5841970-01-01.
585
586=back
587
588When a timezone is specified, it is used regardless of the default
589timezone support compiled into SiLK.  The timezone is one of:
590
591=over 4
592
593=item utc
594
595Use Coordinated Universal Time to print timestamps.
596
597=item local
598
599Use the TZ environment variable or the local timezone.
600
601=back
602
603=item B<--epoch-time>
604
605Print timestamps as epoch time (number of seconds since midnight GMT
606on 1970-01-01).  This switch is equivalent to
607B<--timestamp-format=epoch>, it is deprecated as of SiLK 3.0.0, and it
608will be removed in the SiLK 4.0 release.
609
610=item B<--ip-format>=I<FORMAT>
611
612Specify how IP addresses are printed, where I<FORMAT> is a
613comma-separated list of the arguments described below.  When this
614switch is not specified, the SILK_IP_FORMAT environment variable is
615checked for a value and that format is used if it is valid.  The
616default I<FORMAT> is C<canonical>.  I<Since SiLK 3.7.0.>
617
618=over 4
619
620=item canonical
621
622Print IP addresses in the canonical format.  If the key only contains
623IPv4 addresses, use dot-separated decimal (C<192.0.2.1>).  Otherwise,
624use colon-separated hexadecimal (C<2001:db8::1>) or a mixed IPv4-IPv6
625representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96
626netblock, e.g., C<::ffff:192.0.2.1>) and IPv4-compatible IPv6
627addresses (the ::/96 netblock other than ::/127, e.g.,
628C<::192.0.2.1>).
629
630=item no-mixed
631
632Print IP addresses in the canonical format (C<192.0.2.1> or
633C<2001:db8::1>) but do not used the mixed IPv4-IPv6 representations.
634For example, use C<::ffff:c000:201> instead of C<::ffff:192.0.2.1>.
635I<Since SiLK 3.17.0>.
636
637=item decimal
638
639Print IP addresses as integers in decimal format.  For example, print
640C<192.0.2.1> and C<2001:db8::1> as C<3221225985> and
641C<42540766411282592856903984951653826561>, respectively.
642
643=item hexadecimal
644
645Print IP addresses as integers in hexadecimal format.  For example,
646print C<192.0.2.1> and C<2001:db8::1> as C<c00000201> and
647C<20010db8000000000000000000000001>, respectively.
648
649=item zero-padded
650
651Make all IP address strings contain the same number of characters by
652padding numbers with leading zeros.  For example, print C<192.0.2.1>
653and C<2001:db8::1> as C<192.000.002.001> and
654C<2001:0db8:0000:0000:0000:0000:0000:0001>, respectively.  For IPv6
655addresses, this setting implies C<no-mixed>, so that
656C<::ffff:192.0.2.1> is printed as
657C<0000:0000:0000:0000:0000:ffff:c000:0201>.  As of SiLK 3.17.0, may be
658combined with any of the above, including C<decimal> and
659C<hexadecimal>.
660
661=back
662
663The following arguments modify certain IP addresses prior to printing.
664These arguments may be combined with the above formats.
665
666=over 4
667
668=item map-v4
669
670Change IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the
671::ffff:0:0/96 netblock) prior to formatting.  I<Since SiLK 3.17.0>.
672
673=item unmap-v6
674
675When the key contains IPv6 addresses, change any IPv4-mapped IPv6
676addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses
677prior to formatting.  I<Since SiLK 3.17.0>.
678
679=back
680
681The following argument is also available:
682
683=over 4
684
685=item force-ipv6
686
687Set I<FORMAT> to C<map-v4>,C<no-mixed>.
688
689=back
690
691=item B<--integer-ips>
692
693Print IP addresses as integers.  This switch is equivalent to
694B<--ip-format=decimal>, it is deprecated as of SiLK 3.7.0, and it will
695be removed in the SiLK 4.0 release.
696
697=item B<--zero-pad-ips>
698
699Print IP addresses as fully-expanded, zero-padded values in their
700canonical form.  This switch is equivalent to
701B<--ip-format=zero-padded>, it is deprecated as of SiLK 3.7.0, and it
702will be removed in the SiLK 4.0 release.
703
704=item B<--integer-sensors>
705
706Print the integer ID of the sensor rather than its name.
707
708=item B<--integer-tcp-flags>
709
710Print the TCP flag fields (flags, initialFlags, sessionFlags) as an
711integer value.  Typically, the characters C<F,S,R,P,A,U,E,C> are used
712to represent the TCP flags.
713
714=item B<--no-titles>
715
716Turn off column titles.  By default, titles are printed.
717
718=item B<--no-columns>
719
720Disable fixed-width columnar output.
721
722=item B<--column-separator>=I<C>
723
724Use specified character between columns and after the final column.
725When this switch is not specified, the default of 'B<|>' is used.
726
727=item B<--no-final-delimiter>
728
729Do not print the column separator after the final column.  Normally a
730delimiter is printed.
731
732=item B<--delimited>
733
734=item B<--delimited>=I<C>
735
736Run as if B<--no-columns> B<--no-final-delimiter> B<--column-sep>=I<C>
737had been specified.  That is, disable fixed-width columnar output; if
738character I<C> is provided, it is used as the delimiter between
739columns instead of the default 'B<|>'.
740
741=item B<--print-filenames>
742
743Print to the standard error the names of input files as they are
744opened.
745
746=item B<--copy-input>=I<PATH>
747
748Copy all binary SiLK Flow records read as input to the specified file
749or named pipe.  I<PATH> may be C<stdout> or C<-> to write flows to the
750standard output as long as the B<--output-path> switch is specified to
751redirect B<rwuniq>'s textual output to a different location.
752
753=item B<--output-path>=I<PATH>
754
755Write the textual output to I<PATH>, where I<PATH> is a filename, a
756named pipe, the keyword C<stderr> to write the output to the standard
757error, or the keyword C<stdout> or C<-> to write the output to the
758standard output (and bypass the paging program).  If I<PATH> names an
759existing file, B<rwuniq> exits with an error unless the SILK_CLOBBER
760environment variable is set, in which case I<PATH> is overwritten.  If
761this switch is not given, the output is either sent to the pager or
762written to the standard output.
763
764=item B<--pager>=I<PAGER_PROG>
765
766When output is to a terminal, invoke the program I<PAGER_PROG> to view
767the output one screen full at a time.  This switch overrides the
768SILK_PAGER environment variable, which in turn overrides the PAGER
769variable.  If the B<--output-path> switch is given or if the value of
770the pager is determined to be the empty string, no paging is performed
771and all output is written to the terminal.
772
773=item B<--ipv6-policy>=I<POLICY>
774
775Determine how IPv4 and IPv6 flows are handled when SiLK has been
776compiled with IPv6 support.  When the switch is not provided, the
777SILK_IPV6_POLICY environment variable is checked for a policy.  If it
778is also unset or contains an invalid policy, the I<POLICY> is
779B<mix>.  When SiLK has not been compiled with IPv6 support, IPv6
780flows are always ignored, regardless of the value passed to this
781switch or in the SILK_IPV6_POLICY variable.  The supported values for
782I<POLICY> are:
783
784=over 4
785
786=item ignore
787
788Ignore any flow record marked as IPv6, regardless of the IP addresses
789it contains.
790
791=item asv4
792
793Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96
794netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all
795other IPv6 flow records.
796
797=item mix
798
799Process the input as a mixture of IPv4 and IPv6 flow records.  When an
800IP address is used as part of the key or value, this policy is
801equivalent to B<force>.
802
803=item force
804
805Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the
806::ffff:0:0/96 netblock.
807
808=item only
809
810Process only flow records that are marked as IPv6 and ignore IPv4 flow
811records in the input.
812
813=back
814
815=item B<--temp-directory>=I<DIR_PATH>
816
817Specify the name of the directory in which to store data files
818temporarily when the memory is not large enough to store all the bins
819and their aggregate values.  This switch overrides the directory
820specified in the SILK_TMPDIR environment variable, which overrides the
821directory specified in the TMPDIR variable, which overrides the
822default, F</tmp>.
823
824=item B<--site-config-file>=I<FILENAME>
825
826Read the SiLK site configuration from the named file I<FILENAME>.
827When this switch is not provided, B<rwuniq> searches for the site
828configuration file in the locations specified in the L</FILES>
829section.
830
831=item B<--legacy-timestamps>
832
833=item B<--legacy-timestamps>=I<NUM>
834
835When I<NUM> is not specified or is 1, this switch is equivalent to
836B<--timestamp-format=m/d/y>.  Otherwise, the switch has no effect.
837This switch is deprecated as of SiLK 3.0.0, and it will be removed in
838the SiLK 4.0 release.
839
840=item B<--xargs>
841
842=item B<--xargs>=I<FILENAME>
843
844Read the names of the input files from I<FILENAME> or from the
845standard input if I<FILENAME> is not provided.  The input is expected
846to have one filename per line.  B<rwuniq> opens each named file in
847turn and reads records from it as if the filenames had been listed on
848the command line.
849
850=item B<--help>
851
852Print the available options and exit.  Specifying switches that add
853new fields, values, or additional switches before B<--help> allows
854the output to include descriptions of those fields or switches.
855
856=item B<--help-fields>
857
858Print the description and alias(es) of each field and value and exit.
859Specifying switches that add new fields before B<--help-fields>
860allows the output to include descriptions of those fields.
861
862=item B<--version>
863
864Print the version number and information about how SiLK was
865configured, then exit the application.
866
867=item B<--pmap-file>=I<PATH>
868
869=item B<--pmap-file>=I<MAPNAME>:I<PATH>
870
871Load the prefix map file located at I<PATH> and create fields named
872src-I<map-name> and dst-I<map-name> where I<map-name> is either the
873I<MAPNAME> part of the argument or the map-name specified when the
874file was created (see B<rwpmapbuild(1)>).  If no map-name is
875available, B<rwuniq> names the fields C<sval> and C<dval>.  Specify
876I<PATH> as C<-> or C<stdin> to read from the standard input.  The
877switch may be repeated to load multiple prefix map files, but each
878prefix map must use a unique map-name.  The B<--pmap-file> switch(es)
879must precede the B<--fields> switch.  See also B<pmapfilter(3)>.
880
881=item B<--pmap-column-width>=I<NUM>
882
883When printing a label associated with a prefix map, this switch gives
884the maximum number of characters to use when displaying the textual
885value of the field.
886
887=item B<--python-file>=I<PATH>
888
889When the SiLK Python plug-in is used, B<rwuniq> reads the Python code
890from the file I<PATH> to define additional fields that can be used as
891part of the key or as an aggregate value.  This file should call
892B<register_field()> for each field it wishes to define.  For details
893and examples, see the B<silkpython(3)> and B<pysilk(3)> manual pages.
894
895=back
896
897=head2 Deprecated volume switches
898
899These options add the named aggregate field(s) to B<--values> if the
900field is not present.  When an argument is specified, the switch is
901equivalent to a B<--threshold> switch.  Use of these switches is
902deprecated.
903
904=over 4
905
906=item B<--all-counts>
907
908Append the following fields to the argument of the B<--values> switch
909unless the field is already present: B<Bytes>, B<Packets>, B<Records>,
910B<sTime-Earliest>, and B<eTime-Latest>.  Deprecated since SiLK 2.0.0.
911
912=item B<--bytes>
913
914Append B<Bytes> to the argument of the B<--values> switch unless it is
915already present.  Deprecated since SiLK 2.0.0.
916
917=item B<--bytes>=I<MIN>
918
919Add B<--threshold=bytes>=I<MIN> to the options.  Deprecated since SiLK
9203.17.0.
921
922=item B<--bytes>=I<MIN>-I<MAX>
923
924Add B<--threshold=bytes>=I<MIN>-I<MAX> to the options.  Deprecated
925since SiLK 3.17.0.
926
927=item B<--packets>
928
929Append B<Packets> to the argument of the B<--values> switch unless it
930is already present.  Deprecated since SiLK 2.0.0.
931
932=item B<--packets>=I<MIN>
933
934Add B<--threshold=packets>=I<MIN> to the options.  Deprecated since
935SiLK 3.17.0.
936
937=item B<--packets>=I<MIN>-I<MAX>
938
939Add B<--threshold=packets>=I<MIN>-I<MAX> to the options.  Deprecated
940since SiLK 3.17.0.
941
942=item B<--flows>
943
944Append B<Records> to the argument of the B<--values> switch unless it
945is already present.  Deprecated since SiLK 2.0.0.
946
947=item B<--flows>=I<MIN>
948
949Add B<--threshold=records>=I<MIN> to the options.  Deprecated since
950SiLK 3.17.0.
951
952=item B<--flows>=I<MIN>-I<MAX>
953
954Add B<--threshold=records>=I<MIN>-I<MAX> to the options.  Deprecated
955since SiLK 3.17.0.
956
957=item B<--sip-distinct>
958
959Append B<Distinct:sIP> to the argument of the B<--values> switch
960unless it is already present.  Deprecated since SiLK 2.0.0.
961
962=item B<--sip-distinct>=I<MIN>
963
964Add B<--threshold=distinct:sip>=I<MIN> to the options.  Deprecated
965since SiLK 3.17.0.
966
967=item B<--sip-distinct>=I<MIN>-I<MAX>
968
969Add B<--threshold=distinct:sip>=I<MIN>-I<MAX> to the options.
970Deprecated since SiLK 3.17.0.
971
972=item B<--dip-distinct>
973
974Append B<Distinct:dIP> to the argument of the B<--values> switch
975unless it is already present.  Deprecated since SiLK 2.0.0.
976
977=item B<--dip-distinct>=I<MIN>
978
979Add B<--threshold=distinct:dip>=I<MIN> to the options.  Deprecated
980since SiLK 3.17.0.
981
982=item B<--dip-distinct>=I<MIN>-I<MAX>
983
984Add B<--threshold=distinct:dip>=I<MIN>-I<MAX> to the options.
985Deprecated since SiLK 3.17.0.
986
987=item B<--stime>
988
989Append B<sTime-Earliest> to the argument of the B<--values> switch
990unless it is already present.  Deprecated since SiLK 2.0.0.
991
992=item B<--etime>
993
994Append B<eTime-Latest> to the argument of the B<--values> switch
995unless it is already present.  Deprecated since SiLK 2.0.0.
996
997=back
998
999=head1 EXAMPLES
1000
1001In these examples, the dollar sign (C<$>) represents the shell prompt
1002and a backslash (C<\>) is used to continue a line for better
1003readability.  Many examples assume previous B<rwfilter(1)> commands
1004have written data files named F<data.rw> and F<data-v6.rw>.
1005
1006=for comment
1007The output for nearly all commands is generated from the "make check"
1008test data.  All commands assume data.rw only contains the incoming
1009data, that is "rwfilter --type=in,inweb".
1010
1011The B<--fields> switch is required to specify which field(s) comprise
1012the key.  By default, B<rwuniq> counts the number of records for each
1013key.  This example uses the source port as the key.
1014
1015 $ rwuniq --fields=sport data.rw | head
1016 sPort|   Records|
1017    53|     62216|
1018    22|     27994|
1019    67|      7807|
1020 29897|        78|
1021 28816|        24|
1022    80|     27044|
1023 28925|        22|
1024     0|      7801|
1025 29246|        63|
1026
1027Notice how the keys are printed in an arbitrary order.  Use the
1028B<--sort-output> switch to arrange the keys from lowest to highest.
1029
1030 $ rwuniq --fields=sport --sort-output data.rw | head
1031 sPort|   Records|
1032     0|      7801|
1033    22|     27994|
1034    25|     15568|
1035    53|     62216|
1036    67|      7807|
1037    80|     27044|
1038   123|      7741|
1039   443|      7917|
1040  8080|      3946|
1041
1042To sort the output by a volume field (such as the number of records),
1043use B<rwstats(1)>.
1044
1045 $ rwstats --fields=sport --count=10 data.rw
1046 INPUT: 250928 Records for 4739 Bins and 250928 Total Records
1047 OUTPUT: Top 10 Bins by Records
1048 sPort|   Records|  %Records|   cumul_%|
1049    53|     62216| 24.794363| 24.794363|
1050    22|     27994| 11.156188| 35.950552|
1051    80|     27044| 10.777594| 46.728145|
1052    25|     15568|  6.204170| 52.932315|
1053   443|      7917|  3.155088| 56.087404|
1054    67|      7807|  3.111251| 59.198655|
1055     0|      7801|  3.108860| 62.307515|
1056   123|      7741|  3.084949| 65.392463|
1057  8080|      3946|  1.572563| 66.965026|
1058 29921|       117|  0.046627| 67.011653|
1059
1060Alternatively, process the textual output of B<rwuniq> with the UNIX
1061B<sort(1)> utility.
1062
1063 $ rwuniq --fields=sport data.rw  \
1064   | sort -r -t '|' -k 2 | head
1065 sPort|   Records|
1066    53|     62216|
1067    22|     27994|
1068    80|     27044|
1069    25|     15568|
1070   443|      7917|
1071    67|      7807|
1072     0|      7801|
1073   123|      7741|
1074  8080|      3946|
1075
1076Use the B<--values> field to change the volume that B<rwuniq> computes
1077for each key.  This example prints the byte-, packet-, and
1078record-counts for each protocol, sorting the results by protocol.
1079
1080 $ rwuniq --fields=proto --values=bytes,packets,records --sort data.rw
1081 pro|               Bytes|        Packets|   Records|
1082   1|             5344836|          73473|      7801|
1083   6|         59945492930|       72127917|    165363|
1084  17|            17553593|          77764|     77764|
1085
1086The B<--threshold> switch limits the output to rows where a value
1087field meets a minimum value or falls within a specific range.  For
1088example, print the number of records and packets seen for each source
1089port for bins having at least 1000 records.
1090
1091 $ rwuniq --fields=sport --values=records,packets \
1092        --threshold=records=1000 data.rw
1093 sPort|   Records|        Packets|
1094    53|     62216|          62216|
1095    22|     27994|       23434615|
1096    67|      7807|           7807|
1097    80|     27044|        8271125|
1098     0|      7801|          73473|
1099   123|      7741|           7741|
1100    25|     15568|         427777|
1101   443|      7917|        2421124|
1102  8080|      3946|        1202528|
1103
1104Multiple thresholds may be specified.
1105
1106 $ rwuniq --fields=sport --values=records,packets                 \
1107        --threshold=records=1000-5000 --threshold=packets=1000000 \
1108        data.rw
1109 sPort|   Records|        Packets|
1110  8080|      3946|        1202528|
1111
1112The B<--bin-time> switch adjusts the times used by the C<sTime> and
1113C<eTime> key fields.  An argument of 86400 moves the starting and
1114ending time to day boundaries.
1115
1116 $ rwuniq --bin-time=86400 --fields=stime,etime data.rw
1117               sTime|              eTime|   Records|
1118 2009/02/12T00:00:00|2009/02/12T00:00:00|     82969|
1119 2009/02/12T00:00:00|2009/02/13T00:00:00|       360|
1120 2009/02/13T00:00:00|2009/02/13T00:00:00|     83594|
1121 2009/02/13T00:00:00|2009/02/14T00:00:00|       332|
1122 2009/02/14T00:00:00|2009/02/14T00:00:00|     83673|
1123
1124The B<--bin-time> switch does not adjust the C<duration> value unless
1125both C<sTime> and C<eTime> are given.
1126
1127 $ rwuniq --bin-time=86400 --fields=stime,dur --sort data.rw | head -6
1128               sTime|durat|   Records|
1129 2009/02/12T00:00:00|    0|     29523|
1130 2009/02/12T00:00:00|    1|      4312|
1131 2009/02/12T00:00:00|    2|      4376|
1132 2009/02/12T00:00:00|    3|      3986|
1133 2009/02/12T00:00:00|    4|       923|
1134
1135 $ rwuniq --bin-time=86400 --fields=stime,dur,etime data.rw
1136               sTime|durat|              eTime|   Records|
1137 2009/02/12T00:00:00|    0|2009/02/12T00:00:00|     82969|
1138 2009/02/12T00:00:00|86400|2009/02/13T00:00:00|       360|
1139 2009/02/13T00:00:00|    0|2009/02/13T00:00:00|     83594|
1140 2009/02/13T00:00:00|86400|2009/02/14T00:00:00|       332|
1141 2009/02/14T00:00:00|    0|2009/02/14T00:00:00|     83673|
1142
1143As of SiLK 3.17.0, the B<--bin-time> switch accepts a floating point
1144value.  When the fractional part is non-zero, B<rwuniq> uses
1145millisecond precision for the times and the duration.
1146
1147 $ rwuniq --bin-time=0.001 --fields=duration data.rw | head -6
1148  duration|   Records|
1149     0.000|     85565|
1150  1791.045|         4|
1151     2.120|        19|
1152    22.263|         5|
1153    19.902|         3|
1154
1155The B<--bin-time> does not adjust the C<sTime-Earliest> and
1156C<eTime-Latest> aggregate value fields, but it does determine whether
1157those fields maintain millisecond precision.
1158
1159 $ rwuniq --bin-time=86400 --fields=stime --value=etime data.rw
1160               sTime|       eTime-Latest|
1161 2009/02/12T00:00:00|2009/02/12T00:29:59|
1162 2009/02/13T00:00:00|2009/02/13T00:29:58|
1163 2009/02/14T00:00:00|2009/02/14T00:29:59|
1164
1165 $ rwuniq --bin-time=0.001 --fields=proto --value=stime,etime data.rw
1166 pro|         sTime-Earliest|           eTime-Latest|
1167  17|2009/02/12T00:00:02.745|1970/01/15T06:57:35.997|
1168   6|2009/02/12T00:00:03.004|1970/01/15T06:57:35.998|
1169   1|2009/02/12T00:00:20.601|1970/01/15T06:57:35.992|
1170
1171With an input of both IPv4 and IPv6 records, B<rwuniq> maps the IPv4
1172records into the ::ffff:0:0/96 netblock.  The data is normally mapped
1173back to IPv4 on output.  Given this input:
1174
1175 $ rwcut --fields=sip,packets /tmp/v4v6.rw
1176                                     sIP|   packets|
1177                                     ::1|        45|
1178                              192.0.2.22|        87|
1179                    ::ffff:203.0.113.113|      2662|
1180                  2001:db8:54:32:ab:cd::|       345|
1181
1182The B<rwuniq> tool produces:
1183
1184 $ rwuniq --fields=sip --values=packets /tmp/v4v6.rw
1185                                     sIP|        Packets|
1186                                     ::1|             45|
1187                              192.0.2.22|             87|
1188                           203.0.113.113|           2662|
1189                  2001:db8:54:32:ab:cd::|            345|
1190
1191Set the B<--ip-format> to map-v4 to leave the values as IPv4-mapped
1192IPv6.  (Using an B<--ipv6-policy> of C<force-ipv6> has the same
1193effect.)
1194
1195 $ rwuniq --fields=sip --values=packets --ip-format=map-v4 /tmp/v4v6.rw
1196                                     sIP|        Packets|
1197                                     ::1|             45|
1198                       ::ffff:192.0.2.22|             87|
1199                    ::ffff:203.0.113.113|           2662|
1200                  2001:db8:54:32:ab:cd::|            345|
1201
1202Print the source addresses that sent more than 10,000,000 bytes, and
1203for each address print the number of unique destination hosts it
1204contacted:
1205
1206 $ rwuniq --fields=sip --values=bytes,distinct:dip \
1207        --threshold=bytes=10000000 data-v6.rw
1208                       sIP|               Bytes|dIP-Distin|
1209      2001:db8:a:fd::90:bd|            14529210|         2|
1210
1211Print the number of bytes that host shared with each destination
1212(first use B<rwfilter> to limit the input to that host):
1213
1214 $ rwfilter --saddr=2001:db8:a:fd::90:bd --pass=- data-v6.rw        \
1215   | rwuniq --fields=dip --values=bytes
1216                       dIP|               Bytes|
1217     2001:db8:c0:a8::fa:5d|             7097847|
1218      2001:db8:c0:a8::dd:6|             7431363|
1219
1220Print the packet and byte counts for each IPv4 source-destination
1221pair, where the prefix length is 16 (use B<rwnetmask(1)> on the input
1222to B<rwuniq>):
1223
1224 $ rwnetmask --4sip-prefix=16 --4dip-prefix=16 data.rw      \
1225   | rwuniq --fields=sip,dip --values=packet,byte | head
1226            sIP|            dIP|  Packets|        Bytes|
1227     10.139.0.0|    192.168.0.0|    33490|     22950353|
1228      10.40.0.0|    192.168.0.0|      258|        18544|
1229     10.204.0.0|    192.168.0.0|   353233|    288736424|
1230     10.106.0.0|    192.168.0.0|    13051|      3843693|
1231      10.71.0.0|    192.168.0.0|     4355|      1391194|
1232      10.98.0.0|    192.168.0.0|     7312|      7328359|
1233     10.114.0.0|    192.168.0.0|     2538|      4137927|
1234     10.168.0.0|    192.168.0.0|    92094|     86883062|
1235     10.176.0.0|    192.168.0.0|   122101|    116555051|
1236
1237Given a file of scan traffic, print the source of TCP traffic with no
1238more than 3 packets and which also appears at least 4 times.  First
1239use B<rwfilter> to limit the traffic to TCP and find the flow records
1240where the packet count in that flow record is no more than 3.
1241
1242 $ rwfilter --proto=6 --packets=1-3 --pass=- scandata.rw          \
1243   | rwuniq --field=sip --values=flow,packets --threshold=flows=4 \
1244   | head -5
1245             sIP|   Records|        Packets|
1246   10.249.216.38|       256|            256|
1247    10.155.55.93|       256|            256|
1248   10.61.255.154|       256|            256|
1249    10.60.122.82|       256|            256|
1250
1251The B<silkpython(3)> manual page provides examples that use PySiLK to
1252create arbitrary fields to use as part of the key for B<rwuniq>.
1253
1254When using B<rwuniq> on input that contains both incoming and outgoing
1255flow records, consider using the B<int-ext-fields(3)> plug-in which
1256defines four additional fields representing the external IP address,
1257the external port, the internal IP address, and the internal port.
1258The plug-in requires the user to specify which class/type pairs are
1259incoming and which are outgoing.  See its manual page for additional
1260information.  As an example, here we run B<rwuniq> on a file
1261containing incoming and outgoing web traffic.
1262
1263 $ rwuniq --fields=sip,sport,dip,dport --values=bytes \
1264        --sort-output data.rw | head -7
1265             sIP|sPort|            dIP|dPort|               Bytes|
1266     10.4.52.235|29631|192.168.233.171|   80|               18260|
1267    10.5.231.251|   80|192.168.226.129|28770|              536169|
1268     10.9.77.117|29906| 192.168.184.65|   80|               55386|
1269     10.11.88.88|   80|192.168.251.222|28902|              433198|
1270   10.14.110.214|29989| 192.168.249.96|   80|               25903|
1271    10.15.224.27|  443| 192.168.231.49|29779|              163759|
1272
1273Here the B<int-ext-fields> plug-in is used:
1274
1275 $ export INCOMING_FLOWTYPES=all/in,all/inweb
1276 $ export OUTGOING_FLOWTYPES=all/out,all/outweb
1277 $ rwuniq --plugin=int-ext-fields.so \
1278        --fields=ext-ip,ext-port,int-ip,int-port --value=bytes \
1279        --sort-output data.rw | head -7
1280          ext-ip|ext-p|         int-ip|int-p|               Bytes|
1281     10.4.52.235|29631|192.168.233.171|   80|              726111|
1282    10.5.231.251|   80|192.168.226.129|28770|              561654|
1283     10.9.77.117|29906| 192.168.184.65|   80|             1811738|
1284     10.11.88.88|   80|192.168.251.222|28902|              444277|
1285   10.14.110.214|29989| 192.168.249.96|   80|              393068|
1286    10.15.224.27|  443| 192.168.231.49|29779|              167696|
1287
1288=head1 ENVIRONMENT
1289
1290=over 4
1291
1292=item SILK_IPV6_POLICY
1293
1294This environment variable is used as the value for B<--ipv6-policy>
1295when that switch is not provided.
1296
1297=item SILK_IP_FORMAT
1298
1299This environment variable is used as the value for B<--ip-format> when
1300that switch is not provided.  I<Since SiLK 3.11.0.>
1301
1302=item SILK_TIMESTAMP_FORMAT
1303
1304This environment variable is used as the value for
1305B<--timestamp-format> when that switch is not provided.  I<Since SiLK
13063.11.0.>
1307
1308=item SILK_PAGER
1309
1310When set to a non-empty string, B<rwuniq> automatically invokes this
1311program to display its output a screen at a time.  If set to an empty
1312string, B<rwuniq> does not automatically page its output.
1313
1314=item PAGER
1315
1316When set and SILK_PAGER is not set, B<rwuniq> automatically invokes
1317this program to display its output a screen at a time.
1318
1319=item SILK_TMPDIR
1320
1321When set and B<--temp-directory> is not specified, B<rwuniq> writes
1322the temporary files it creates to this directory.  SILK_TMPDIR
1323overrides the value of TMPDIR.
1324
1325=item TMPDIR
1326
1327When set and SILK_TMPDIR is not set, B<rwuniq> writes the temporary
1328files it creates to this directory.
1329
1330=item PYTHONPATH
1331
1332This environment variable is used by Python to locate modules.  When
1333B<--python-file> is specified, B<rwuniq> must load the Python files
1334that comprise the PySiLK package, such as F<silk/__init__.py>.  If
1335this F<silk/> directory is located outside Python's normal search path
1336(for example, in the SiLK installation tree), it may be necessary to
1337set or modify the PYTHONPATH environment variable to include the
1338parent directory of F<silk/> so that Python can find the PySiLK
1339module.
1340
1341=item SILK_PYTHON_TRACEBACK
1342
1343When set, Python plug-ins print traceback information on Python
1344errors to the standard error.
1345
1346=item SILK_COUNTRY_CODES
1347
1348This environment variable allows the user to specify the country code
1349mapping file that B<rwuniq> uses when computing the scc and dcc
1350fields.  The value may be a complete path or a file relative to the
1351SILK_PATH.  See the L</FILES> section for standard locations of this
1352file.
1353
1354=item SILK_ADDRESS_TYPES
1355
1356This environment variable allows the user to specify the address type
1357mapping file that B<rwuniq> uses when computing the sType and dType
1358fields.  The value may be a complete path or a file relative to the
1359SILK_PATH.  See the L</FILES> section for standard locations of this
1360file.
1361
1362=item SILK_CLOBBER
1363
1364The SiLK tools normally refuse to overwrite existing files.  Setting
1365SILK_CLOBBER to a non-empty value removes this restriction.
1366
1367=item SILK_CONFIG_FILE
1368
1369This environment variable is used as the value for the
1370B<--site-config-file> when that switch is not provided.
1371
1372=item SILK_DATA_ROOTDIR
1373
1374This environment variable specifies the root directory of data
1375repository.  As described in the L</FILES> section, B<rwuniq> may
1376use this environment variable when searching for the SiLK site
1377configuration file.
1378
1379=item SILK_PATH
1380
1381This environment variable gives the root of the install tree.  When
1382searching for configuration files and plug-ins, B<rwuniq> may use this
1383environment variable.  See the L</FILES> section for details.
1384
1385=item TZ
1386
1387When the argument to the B<--timestamp-format> switch includes
1388C<local> or when a SiLK installation is built to use the local
1389timezone, the value of the TZ environment variable determines the
1390timezone in which B<rwuniq> displays timestamps.  (If both of
1391those are false, the TZ environment variable is ignored.)  If the TZ
1392environment variable is not set, the machine's default timezone is
1393used.  Setting TZ to the empty string or 0 causes timestamps to be
1394displayed in UTC.  For system information on the TZ variable, see
1395B<tzset(3)> or B<environ(7)>.  (To determine if SiLK was built with
1396support for the local timezone, check the C<Timezone support> value in
1397the output of B<rwuniq --version>.)
1398
1399=item SILK_PLUGIN_DEBUG
1400
1401When set to 1, B<rwuniq> prints status messages to the standard error
1402as it attempts to find and open each of its plug-ins.  In addition,
1403when an attempt to register a field fails, B<rwuniq> prints a message
1404specifying the additional function(s) that must be defined to register
1405the field in B<rwuniq>.  Be aware that the output can be rather
1406verbose.
1407
1408=item SILK_TEMPFILE_DEBUG
1409
1410When set to 1, B<rwuniq> prints debugging messages to the standard
1411error as it creates, re-opens, and removes temporary files.
1412
1413=item SILK_UNIQUE_DEBUG
1414
1415When set to 1, the binning engine used by B<rwuniq> prints debugging
1416messages to the standard error.
1417
1418=back
1419
1420=head1 FILES
1421
1422=over 4
1423
1424=item F<${SILK_ADDRESS_TYPES}>
1425
1426=item F<${SILK_PATH}/share/silk/address_types.pmap>
1427
1428=item F<${SILK_PATH}/share/address_types.pmap>
1429
1430=item F<@prefix@/share/silk/address_types.pmap>
1431
1432=item F<@prefix@/share/address_types.pmap>
1433
1434Possible locations for the address types mapping file required by the
1435sType and dType fields.
1436
1437=item F<${SILK_CONFIG_FILE}>
1438
1439=item F<${SILK_DATA_ROOTDIR}/silk.conf>
1440
1441=item F<@SILK_DATA_ROOTDIR@/silk.conf>
1442
1443=item F<${SILK_PATH}/share/silk/silk.conf>
1444
1445=item F<${SILK_PATH}/share/silk.conf>
1446
1447=item F<@prefix@/share/silk/silk.conf>
1448
1449=item F<@prefix@/share/silk.conf>
1450
1451Possible locations for the SiLK site configuration file which are
1452checked when the B<--site-config-file> switch is not provided.
1453
1454=item F<${SILK_COUNTRY_CODES}>
1455
1456=item F<${SILK_PATH}/share/silk/country_codes.pmap>
1457
1458=item F<${SILK_PATH}/share/country_codes.pmap>
1459
1460=item F<@prefix@/share/silk/country_codes.pmap>
1461
1462=item F<@prefix@/share/country_codes.pmap>
1463
1464Possible locations for the country code mapping file required by the
1465scc and dcc fields.
1466
1467=item F<${SILK_PATH}/lib64/silk/>
1468
1469=item F<${SILK_PATH}/lib64/>
1470
1471=item F<${SILK_PATH}/lib/silk/>
1472
1473=item F<${SILK_PATH}/lib/>
1474
1475=item F<@prefix@/lib64/silk/>
1476
1477=item F<@prefix@/lib64/>
1478
1479=item F<@prefix@/lib/silk/>
1480
1481=item F<@prefix@/lib/>
1482
1483Directories that B<rwuniq> checks when attempting to load a plug-in.
1484
1485=item F<${SILK_TMPDIR}/>
1486
1487=item F<${TMPDIR}/>
1488
1489=item F</tmp/>
1490
1491Directory in which to create temporary files.
1492
1493=back
1494
1495=head1 NOTES
1496
1497If multiple thresholds are given (e.g., C<--threshold=bytes=80
1498--threshold=flows=2>), the values must meet all thresholds before the
1499record is printed.  For example, if a given key saw a single 100-byte
1500flow, the entry would not printed given the switches above.
1501
1502B<rwuniq> functionally replaces the combination of
1503
1504 rwcut | sort | uniq -c
1505
1506To get a list of unique IP addresses in a data set without the
1507counting or threshold abilities of B<rwuniq>, consider using the IPset
1508tools B<rwset(1)> and B<rwsetcat(1)> for improved performance:
1509
1510 rwset --sip-set=stdout | rwsetcat --print-ips
1511
1512For situations where the key and value are each a single field, the
1513Bag tools (B<rwbag(1)>, B<rwbagcat(1)>) often provide better
1514performance, especially when the key length is one or two bytes:
1515
1516 rwbag --bag-file=sport,bytes,stdout | rwbagcat
1517
1518To create a binary file that contains B<rwuniq>-like output, use
1519B<rwaggbag(1)> or B<rwaggbagbuild(1)>.  The content of these files may
1520be printed with B<rwaggbagcat(1)>.
1521
1522B<rwgroup(1)> works similarly to B<rwuniq>, except the data remains in
1523the form of SiLK Flow records, and the next-hop-IP field is modified
1524to denote the records that form a bin.
1525
1526B<rwstats(1)> can do the same binning as B<rwuniq>, and then sort the
1527data by an aggregate field.
1528
1529When the B<--bin-time> switch is given and the three time fields
1530(starting-time (C<sTime>), ending-time (C<eTime>), and duration
1531(C<duration>)) are present in the key, the duration field's value will be
1532modified to be the difference between the ending and starting times.
1533
1534When the three time-related key fields (C<sTime>,C<duration>,C<eTime>) are
1535all in use, B<rwuniq> will ignore the final time field when binning
1536the records, but the field will appear in the output.  Due to
1537truncation of the milliseconds values, B<rwuniq> will print a
1538different number of rows depending on the order in which those three
1539values appear in the B<--fields> switch.
1540
1541B<rwuniq> supports counting distinct source and/or destination IPs.
1542To see the number of distinct sources for each 10 minute bin, run:
1543
1544 rwuniq --fields=stime --values=distinct:sip --bin-time=600 --sort-output
1545
1546When computing distinct counts over a field, the field may not be part
1547of the key; that is, you cannot have C<--fields=sip
1548--values=sip-distinct>.
1549
1550Using the B<--presorted-input> switch sometimes introduces more issues
1551than it solves, and B<--presorted-input> is less necessary now that
1552B<rwuniq> can use temporary files while processing input.
1553
1554When computing distinct IP counts, B<rwuniq> will typically run faster
1555if you do I<not> use the B<--presorted-input> switch, even if the data
1556was previously sorted.
1557
1558When using the B<--presorted-input> switch, it is highly recommended
1559that you use no more than one time-related key field (C<sTime>,
1560C<duration>, C<eTime>) in the B<--fields> switch and that the time-related
1561key appear last in B<--fields>.  The issue is caused by B<rwsort>
1562considering the millisecond values on the times when sorting, while
1563B<rwuniq> truncates the millisecond value.  The result may be unsorted
1564output and multiple rows in the output that have the same values for
1565the key fields:
1566
1567 $ rwsort --fields=stime,duration data.rw       \
1568   | rwuniq --fields=stime,dur --presorted
1569               sTime|durat|   Records|
1570 ...
1571 2009/02/12T00:00:57|    0|         2|
1572 2009/02/12T00:00:57|   29|         2|
1573 2009/02/12T00:00:57|    0|         2|
1574 2009/02/12T00:00:57|   13|         2|
1575 ...
1576
1577B<rwuniq>'s strength is its ability to build arbitrary keys and
1578aggregate fields.  For a key of a single IP address, see
1579B<rwaddrcount(1)> and B<rwbag(1)>; for a key made up of a single CIDR
1580block (/8, /16, /24 only), a single port, or a single protocol, use
1581B<rwtotal(1)> or B<rwbag(1)>.
1582
1583As of SiLK 3.17.0, fields that are specified with the legacy
1584thresholding switches (e.g., B<--bytes>) and not with B<--values> are
1585printed in the order in which those switches appear.  Previously, the
1586order was always bytes, packets, flows, stime, etime, sip-distinct,
1587dip-distinct.
1588
1589=head1 SEE ALSO
1590
1591B<rwfilter(1)>, B<rwbag(1)>, B<rwbagcat(1)>, B<rwaggbag(1)>,
1592B<rwaggbagbuild(1)>, B<rwaggbagcat(1)>, B<rwcut(1)>, B<rwset(1)>,
1593B<rwsetcat(1)>, B<rwaddrcount(1)>, B<rwgroup(1)>, B<rwstats(1)>,
1594B<rwnetmask(1)>, B<rwsort(1)>, B<rwtotal(1)>, B<rwcount(1)>,
1595B<rwpmapbuild(1)>, B<addrtype(3)>, B<ccfilter(3)>,
1596B<int-ext-fields(3)>, B<pmapfilter(3)>, B<pysilk(3)>,
1597B<silkpython(3)>, B<silk-plugin(3)>, B<sensor.conf(5)>,
1598B<rwflowpack(8)>, B<silk(7)>, B<yaf(1)>, B<dlopen(3)>, B<tzset(3)>,
1599B<environ(7)>
1600
1601=cut
1602
1603$SiLK: rwuniq.pod 861b66f000c2 2019-09-24 22:01:14Z mthomas $
1604
1605Local Variables:
1606mode:text
1607indent-tabs-mode:nil
1608End:
1609