1=pod 2 3=head1 NAME 4 5B<rwuniq> - Bin SiLK Flow records by a key and print each bin's volume 6 7=head1 SYNOPSIS 8 9 rwuniq --fields=KEY [--values=VALUES] 10 [{--threshold=MIN-MAX | --threshold=MIN}] 11 [--presorted-input] [--sort-output] 12 [{--bin-time=SECONDS | --bin-time}] 13 [--timestamp-format=FORMAT] [--epoch-time] 14 [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips] 15 [--integer-sensors] [--integer-tcp-flags] 16 [--no-titles] [--no-columns] [--column-separator=CHAR] 17 [--no-final-delimiter] [{--delimited | --delimited=CHAR}] 18 [--print-filenames] [--copy-input=PATH] [--output-path=PATH] 19 [--pager=PAGER_PROG] [--temp-directory=DIR_PATH] 20 [{--legacy-timestamps | --legacy-timestamps={1,0}}] 21 [--all-counts] [{--bytes | --bytes=MIN | --bytes=MIN-MAX}] 22 [{--packets | --packets=MIN | --packets=MIN-MAX}] 23 [{--flows | --flows=MIN | --flows=MIN-MAX}] 24 [--stime] [--etime] 25 [{--sip-distinct | --sip-distinct=MIN | --sip-distinct=MIN-MAX}] 26 [{--dip-distinct | --dip-distinct=MIN | --dip-distinct=MIN-MAX}] 27 [--ipv6-policy={ignore,asv4,mix,force,only}] 28 [--site-config-file=FILENAME] 29 [--plugin=PLUGIN [--plugin=PLUGIN ...]] 30 [--python-file=PATH [--python-file=PATH ...]] 31 [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]] 32 [--pmap-column-width=NUM] 33 {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]} 34 35 rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]] 36 [--plugin=PLUGIN ...] [--python-file=PATH ...] --help 37 38 rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]] 39 [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields 40 41 rwuniq --version 42 43=head1 DESCRIPTION 44 45B<rwuniq> reads SiLK Flow records and groups them by a key composed of 46user-specified attributes of the flows. For each group (or bin), a 47collection of user-specified I<aggregate values> is computed; these 48values are typically related to the volume of the bin, such as the sum 49of the bytes fields for all records that match the key. Once all the 50SiLK Flow records are read, the key fields and the aggregate values 51are printed. For some of the built-in aggregate values, it is 52possible to limit the output to the bins where the aggregate value 53meets a user-specified minimum and/or maximum. 54 55There is no need to sort the input to B<rwuniq> since B<rwuniq> 56normally rearranges the records as they are read. To have B<rwuniq> 57sort its output, use the B<--sort-output> switch. 58 59B<rwuniq> reads SiLK Flow records from the files named on the command 60line or from the standard input when no file names are specified and 61B<--xargs> is not present. To read the standard input in addition to 62the named files, use C<-> or C<stdin> as a file name. If an input 63file name ends in C<.gz>, the file is uncompressed as it is read. 64When the B<--xargs> switch is provided, B<rwuniq> reads the names of 65the files to process from the named text file or from the standard 66input if no file name argument is provided to the switch. The input 67to B<--xargs> must contain one file name per line. 68 69The user must provide the B<--fields> switch to select the flow 70attribute(s) (or field(s)) that comprise the key for each bin. The 71available fields are similar to those supported by B<rwcut(1)>; see 72the description of the B<--fields> switch in the L</OPTIONS> section 73below for the details. The list of fields can be extended by loading 74PySiLK files (see B<silkpython(3)>) or plug-ins (B<silk-plugin(3)>). 75The fields are printed in the order in which they occur in the 76B<--fields> switch. The size of the key is limited to 256 octets. A 77larger key more quickly uses the available the memory leading to 78slower performance. 79 80The aggregate value(s) to compute for each bin are also chosen by the 81user. As with the key fields, the user can extend the list of 82aggregate fields by using PySiLK or plug-ins. Specify the aggregate 83fields with the B<--values> switch; the aggregate fields are printed 84in the order they occur in the B<--values> switch. If the user does 85not provide B<--values> or a B<--threshold> switch (described next), 86B<rwuniq> defaults to computing the number of flow records for each 87bin. As with the key fields, requesting more aggregate values slows 88performance. 89 90The B<--threshold> switch (added in SiLK 3.17.0) allows the user to 91print only bins where a value field is within a certain range. The 92switch's argument contains the name of the value field, an equals 93sign, the minimum value (start of the range), and optionally a hyphen 94and the maximum value (end of the range); e.g., 95C<--threshold=bytes=1000-2000>. The upper bound is unlimited when no 96maximum is specified. The B<--threshold> switch may be repeated to 97set multiple thresholds, and only those bins that meet all thresholds 98are printed. Each field named by B<--threshold> is appended to the 99set of aggregate value fields unless that field was named in the 100B<--values> switch. 101 102The B<--presorted-input> switch may allow B<rwuniq> to process data 103more efficiently by causing B<rwuniq> to assume the input has been 104previously sorted with the B<rwsort(1)> command. With this switch, 105B<rwuniq> typically does not need large amounts of memory because it does not 106bin each flow; instead, it keeps a running summation and outputs the 107bin whenever the key changes. For the output to be meaningful, 108B<rwsort> and B<rwuniq> I<must> be invoked with the same B<--fields> 109value. When multiple input files are specified and 110B<--presorted-input> is given, B<rwuniq> merge-sorts the flow 111records from the input files. B<rwuniq> typically runs faster if 112you do I<not> include the B<--presorted-input> switch when counting 113distinct values, even when reading sorted input. Finally, you 114may get unusual results with B<--presorted-input> when the B<--fields> 115switch contains multiple time-related key fields (C<sTime>, 116C<duration>, C<eTime>), or when the time-related key is not the final 117key listed in B<--fields>; see the L</NOTES> section for details. 118 119B<rwuniq> attempts to keep all key and aggregate value data in the 120computer's memory. If B<rwuniq> runs out of memory, the current key 121and aggregate value data is written to a temporary file. Once all 122input has been processed, the data from the temporary files is merged 123to produce the final output. By default, these temporary files are 124stored in the F</tmp> directory. Because these files can be large, it 125is strongly recommended that F</tmp> I<not> be used as the temporary 126directory. To modify the temporary directory used by B<rwuniq>, 127provide the B<--temp-directory> switch, set the SILK_TMPDIR 128environment variable, or set the TMPDIR environment variable. 129 130=head1 OPTIONS 131 132Option names may be abbreviated if the abbreviation is unique or is an 133exact match for an option. A parameter to an option may be specified 134as B<--arg>=I<param> or B<--arg> I<param>, though the first form is 135required for options that take optional parameters. 136 137The B<--fields> switch is required. B<rwuniq> fails when it is 138not provided. 139 140=over 4 141 142=item B<--fields>=I<KEY> 143 144I<KEY> contains the list of flow attributes (a.k.a. fields or columns) 145that make up the key into which flows are binned. The columns are 146displayed in the order the fields are specified. Each field may be 147specified once only. I<KEY> is a comma separated list of field-names, 148field-integers, and ranges of field-integers; a range is specified by 149separating the start and end of the range with a hyphen (B<->). 150Field-names are case insensitive. Example: 151 152 --fields=stime,10,1-5 153 154There is no default value for the B<--fields> switch; the switch must 155be specified. 156 157The complete list of built-in fields that the SiLK tool suite supports 158follows, though note that not all fields are present in all SiLK file 159formats; when a field is not present, its value is 0. 160 161=over 4 162 163=item sIP,1 164 165source IP address 166 167=item dIP,2 168 169destination IP address 170 171=item sPort,3 172 173source port for TCP and UDP, or equivalent 174 175=item dPort,4 176 177destination port for TCP and UDP, or equivalent. See note at C<iType>. 178 179=item protocol,5 180 181IP protocol 182 183=item packets,pkts,6 184 185packet count 186 187=item bytes,7 188 189byte count 190 191=item flags,8 192 193bit-wise OR of TCP flags over all packets 194 195=item sTime,9 196 197starting time of flow (seconds resolution unless B<--bin-time> 198includes fractional seconds). When the time-related fields 199C<sTime>,C<duration>,C<eTime> are all in use, B<rwuniq> ignores the 200final time field when binning the records. 201 202=item duration,10 203 204duration of flow (seconds resolution unless B<--bin-time> includes 205fractional seconds). This field is not adjusted by B<--bin-time> 206unless B<--fields> includes both C<sTime> and C<eTime>. See note at 207C<sTime,9>. 208 209=item eTime,11 210 211end time of flow (seconds resolution unless B<--bin-time> includes 212fractional seconds). See note at C<sTime,9>. 213 214=item sensor,12 215 216name or ID of the sensor where the flow was collected 217 218=item class,20 219 220class assigned to the flow by B<rwflowpack(8)>. Binning by C<class> 221and/or C<type> equates to binning by the integer value used internally 222to represent the class/type pair. When B<--fields> contains C<class> 223but not C<type>, B<rwuniq>'s output contains multiple rows with the 224same value(s) for the key field(s). 225 226=item type,21 227 228type assigned to the flow by B<rwflowpack(8)>. See note on previous 229entry. 230 231=item iType 232 233the ICMP type value for ICMP or ICMPv6 flows and empty (numerically 234zero) for non-ICMP flows. Internally, SiLK stores the ICMP type and 235code in the C<dPort> field. To avoid getting very odd results, either 236do not use the C<dPort> field when your key includes ICMP field(s) or 237be certain to include the C<protocol> field as part of your key. This 238field was introduced in SiLK 3.8.1. 239 240=item iCode 241 242the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP 243flows. See note at C<iType>. 244 245=item icmpTypeCode,25 246 247equivalent to C<iType>,C<iCode> when used in B<--fields>. This field 248may not be mixed with C<iType> or C<iCode>, and this field is 249deprecated as of SiLK 3.8.1. As of SiLK 3.8.1, C<icmpTypeCode> may no 250longer be used as the argument to the C<Distinct:> value field; the 251C<dPort> field provides an equivalent result as long as the input 252is limited to ICMP flow records. 253 254=back 255 256Many SiLK file formats do not store the following fields and their 257values are always be 0; they are listed here for completeness: 258 259=over 4 260 261=item in,13 262 263router SNMP input interface or vlanId if packing tools were 264configured to capture it (see B<sensor.conf(5)>) 265 266=item out,14 267 268router SNMP output interface or postVlanId 269 270=item nhIP,15 271 272router next hop IP 273 274=back 275 276SiLK can store flows generated by enhanced collection software that 277provides more information than NetFlow v5. These flows may support 278some or all of these additional fields; for flows without this 279additional information, the field's value is always 0. 280 281=over 4 282 283=item initialFlags,26 284 285TCP flags on first packet in the flow 286 287=item sessionFlags,27 288 289bit-wise OR of TCP flags over all packets except the first in the flow 290 291=item attributes,28 292 293flow attributes set by the flow generator: 294 295=over 4 296 297=item C<S> 298 299all the packets in this flow record are exactly the same size 300 301=item C<F> 302 303flow generator saw additional packets in this flow following a packet 304with a FIN flag (excluding ACK packets) 305 306=item C<T> 307 308flow generator prematurely created a record for a long-running 309connection due to a timeout. (When the flow generator B<yaf(1)> is 310run with the B<--silk> switch, it prematurely creates a flow and 311mark it with C<T> if the byte count of the flow cannot be stored in a 31232-bit value.) 313 314=item C<C> 315 316flow generator created this flow as a continuation of long-running 317connection, where the previous flow for this connection met a timeout 318(or a byte threshold in the case of B<yaf>). 319 320=back 321 322Consider a long-running ssh session that exceeds the flow generator's 323I<active> timeout. (This is the active timeout since the flow 324generator creates a flow for a connection that still has activity). 325The flow generator will create multiple flow records for this ssh 326session, each spanning some portion of the total session. The first 327flow record will be marked with a C<T> indicating that it hit the 328timeout. The second through next-to-last records will be marked with 329C<TC> indicating that this flow both timed out and is a continuation 330of a flow that timed out. The final flow will be marked with a C<C>, 331indicating that it was created as a continuation of an active flow. 332 333=item application,29 334 335guess as to the content of the flow. Some software that generates flow 336records from packet data, such as B<yaf>, will inspect the contents of 337the packets that make up a flow and use traffic signatures to label 338the content of the flow. SiLK calls this label the I<application>; 339B<yaf> refers to it as the I<appLabel>. The application is the port 340number that is traditionally used for that type of traffic (see the 341F</etc/services> file on most UNIX systems). For example, traffic 342that the flow generator recognizes as FTP will have a value of 21, 343even if that traffic is being routed through the standard HTTP/web 344S<port (80)>. 345 346=back 347 348The following fields provide a way to label the IPs or ports on a 349record. These fields require external files to provide the mapping 350from the IP or port to the label: 351 352=over 4 353 354=item sType,16 355 356for the source IP address, the value 0 if the address is non-routable, 3571 if it is internal, or 2 if it is routable and external. Uses the 358mapping file specified by the SILK_ADDRESS_TYPES environment variable, 359or the F<address_types.pmap> mapping file, as described in 360B<addrtype(3)>. 361 362=item dType,17 363 364as B<sType> for the destination IP address 365 366=item scc,18 367 368for the source IP address, a two-letter country code abbreviation 369denoting the country where that IP address is located. Uses the 370mapping file specified by the SILK_COUNTRY_CODES environment variable, 371or the F<country_codes.pmap> mapping file, as described in 372B<ccfilter(3)>. The abbreviations are those defined by ISO 3166-1 373(see for example L<https://www.iso.org/iso-3166-country-codes.html> 374or L<https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>) or the 375following special codes: B<--> N/A (e.g. private and experimental 376reserved addresses); B<a1> anonymous proxy; B<a2> satellite provider; 377B<o1> other 378 379=item dcc,19 380 381as B<scc> for the destination IP 382 383=item src-I<map-name> 384 385label contained in the prefix map file associated with I<map-name>. 386If the prefix map is for IP addresses, the label is that associated 387with the source IP address. If the prefix map is for protocol/port 388pairs, the label is that associated with the protocol and source port. 389See also the description of the B<--pmap-file> switch below and the 390B<pmapfilter(3)> manual page. 391 392=item dst-I<map-name> 393 394as B<src-I<map-name>> for the destination IP address or the protocol 395and destination port. 396 397=item sval 398 399as B<src-I<map-name>> when no map-name is associated with the prefix 400map file 401 402=item dval 403 404as B<dst-I<map-name>> when no map-name is associated with the prefix 405map file 406 407=back 408 409Finally, the list of built-in fields may be augmented by the run-time 410loading of PySiLK code or plug-ins written in C (also called shared 411object files or dynamic libraries), as described by the 412B<--python-file> and B<--plugin> switches. 413 414=for comment 415########################################################################## 416# Whew! We've finally reached the end of the --fields help 417 418=item B<--values>=I<VALUES> 419 420Specify the aggregate values to compute for each bin as a comma 421separated list of names. Names are case insensitive. When the 422B<--threshold> switch specifies an aggregate value field that does 423appear in I<VALUES>, that field is appended to I<VALUES>. When 424neither the B<--values> switch nor any B<--threshold> switch is 425specified, B<rwuniq> counts the number of flow records for each bin. 426The aggregate fields are printed in the order they occur in I<VALUES>. 427The names of the built-in value fields follow. This list can be 428augmented through the use of PySiLK and plug-ins. 429 430=over 4 431 432=item Records 433 434Count the number of flow records that mapped to each bin. 435 436=item Packets 437 438Sum the number of packets across all records that mapped to each bin. 439 440=item Bytes 441 442Sum the number of bytes across all records that mapped to each bin. 443 444=item sTime-Earliest 445 446Keep track of the earliest start time (minimum time) seen across all 447records that mapped to each bin, in seconds resolution. The 448B<--bin-time> switch does not normally affect this value; however, 449this value uses milliseconds resolution when B<--bin-time> includes 450fractional seconds. 451 452=item eTime-Latest 453 454Keep track of the latest end time (maximum time) seen across all 455records that mapped to each bin, in seconds resolution. The 456B<--bin-time> switch does not normally affect this value; however, 457this value uses milliseconds resolution when B<--bin-time> includes 458fractional seconds. 459 460=item sIP-Distinct 461 462Count the number of distinct source IP addresses that were seen for 463each bin, an alias for Distinct:sIP. 464 465=item dIP-Distinct 466 467Count the number of distinct destination IP addresses that were seen 468for each bin, an alias for Distinct:dIP. 469 470=item Distinct:I<KEY_FIELD> 471 472Count the number of distinct values for I<KEY_FIELD>, where 473I<KEY_FIELD> is any field that can be used as an argument to 474B<--fields> except C<icmpTypeCode>. For example, C<Distinct:sPort> 475counts the number of distinct source ports for each bin. When this 476aggregate value field is used, the specified I<KEY_FIELD> cannot be 477present in the argument to B<--fields>. 478 479=item Flows 480 481Count the number of flow records that mapped to each bin; an alias for 482Records. 483 484=back 485 486=item B<--plugin>=I<PLUGIN> 487 488Augment the list of key fields and/or aggregate value fields by using 489run-time loading of the plug-in (shared object) whose path is 490I<PLUGIN>. The switch may be repeated to load multiple plug-ins. The 491creation of plug-ins is described in the B<silk-plugin(3)> manual 492page. When I<PLUGIN> does not contain a slash (C</>), B<rwuniq> 493attempts to find a file named I<PLUGIN> in the directories listed in 494the L</FILES> section. If B<rwuniq> finds the file, it uses that 495path. If I<PLUGIN> contains a slash or if B<rwuniq> does not find the 496file, B<rwuniq> relies on your operating system's B<dlopen(3)> call to 497find the file. When the SILK_PLUGIN_DEBUG environment variable is 498non-empty, B<rwuniq> prints status messages to the standard error as 499it attempts to find and open each of its plug-ins. 500 501=item B<--threshold>=I<VALUE_FIELD>B<=>I<MIN>B<->I<MAX> 502 503=item B<--threshold>=I<VALUE_FIELD>B<=>I<MIN> 504 505Limit the output of B<rwuniq> to the bins where the value of the 506aggregate value field I<VALUE_FIELD> is not less than I<MIN> and not 507more than I<MAX>. If I<MAX> is not given, limit the output to the 508bins where the value of I<VALUE_FIELD> is at least I<MIN>. The 509I<VALUE_FIELD> argument is case insensitive and may be abbreviated to 510the shortest unique prefix. This switch may be repeated to set 511thresholds for multiple fields, and B<rwuniq> only prints bins that 512meet all thresholds. A I<MIN> of 0 is treated as 1. If 513I<VALUE_FIELD> is not present in the argument to the B<--values> 514switch, it is appended to those aggregate values. I<VALUE_FIELD> may 515be B<Records> (or B<Flows)>, B<Packets>, B<Bytes>, B<sIP-Distinct>, 516B<dIP-Distinct>, or B<Distinct:>I<KEY_FIELD>. Setting thresholds for 517aggregate value fields defined by plug-ins is not supported. I<Since 518SiLK 3.17.0.> 519 520=back 521 522Miscellaneous options: 523 524=over 4 525 526=item B<--presorted-input> 527 528Cause B<rwuniq> to assume that it is reading sorted input; i.e., that 529B<rwuniq>'s input file(s) were generated by B<rwsort(1)> using the 530I<exact same> value for the B<--fields> switch. When no distinct 531counts are being computed, B<rwuniq> can process its input without 532needing to write temporary files. When multiple input files are 533specified, B<rwuniq> merge-sorts the flow records from the input 534files. See the L</NOTES> section for issues that may occur when using 535B<--presorted-input>. 536 537=item B<--sort-output> 538 539Cause B<rwuniq> to present the output in sorted numerical order. The 540key B<rwuniq> uses for sorting is the same key it uses to index each 541bin. 542 543=item B<--bin-time>=I<SECONDS> 544 545=item B<--bin-time> 546 547Adjust the times in the key fields C<sTime> and C<eTime> to appear on 548I<SECONDS>-second boundaries (the floor of the time is used). As of 549SiLK 3.17.0, I<SECONDS> may be a fractional value of 0.001 or greater, 550and B<rwuniq> uses millisecond timestamps when I<SECONDS> includes a 551fractional value that is non-zero. When this switch is not specified, 552times appear on 1-second boundaries. When the switch is used but no 553argument is given, B<rwuniq> uses 60-second time bins. (When the 554start-time is the only key field and time binning is desired, consider 555using B<rwcount(1)> instead.) 556 557=item B<--timestamp-format>=I<FORMAT> 558 559Specify the format and/or timezone to use when printing timestamps. 560When this switch is not specified, the SILK_TIMESTAMP_FORMAT 561environment variable is checked for a default format and/or timezone. 562If it is empty or contains invalid values, timestamps are printed in 563the default format, and the timezone is UTC unless SiLK was compiled 564with local timezone support. I<FORMAT> is a comma-separated list of a 565format and/or a timezone. The format is one of: 566 567=over 4 568 569=item default 570 571Print the timestamps as C<I<YYYY>/I<MM>/I<DD>TI<hh>:I<mm>:I<ss>>. 572 573=item iso 574 575Print the timestamps as S<C<I<YYYY>-I<MM>-I<DD> I<hh>:I<mm>:I<ss>>>. 576 577=item m/d/y 578 579Print the timestamps as S<C<I<MM>/I<DD>/I<YYYY> I<hh>:I<mm>:I<ss>>>. 580 581=item epoch 582 583Print the timestamps as the number of seconds since 00:00:00 UTC on 5841970-01-01. 585 586=back 587 588When a timezone is specified, it is used regardless of the default 589timezone support compiled into SiLK. The timezone is one of: 590 591=over 4 592 593=item utc 594 595Use Coordinated Universal Time to print timestamps. 596 597=item local 598 599Use the TZ environment variable or the local timezone. 600 601=back 602 603=item B<--epoch-time> 604 605Print timestamps as epoch time (number of seconds since midnight GMT 606on 1970-01-01). This switch is equivalent to 607B<--timestamp-format=epoch>, it is deprecated as of SiLK 3.0.0, and it 608will be removed in the SiLK 4.0 release. 609 610=item B<--ip-format>=I<FORMAT> 611 612Specify how IP addresses are printed, where I<FORMAT> is a 613comma-separated list of the arguments described below. When this 614switch is not specified, the SILK_IP_FORMAT environment variable is 615checked for a value and that format is used if it is valid. The 616default I<FORMAT> is C<canonical>. I<Since SiLK 3.7.0.> 617 618=over 4 619 620=item canonical 621 622Print IP addresses in the canonical format. If the key only contains 623IPv4 addresses, use dot-separated decimal (C<192.0.2.1>). Otherwise, 624use colon-separated hexadecimal (C<2001:db8::1>) or a mixed IPv4-IPv6 625representation for IPv4-mapped IPv6 addresses (the ::ffff:0:0/96 626netblock, e.g., C<::ffff:192.0.2.1>) and IPv4-compatible IPv6 627addresses (the ::/96 netblock other than ::/127, e.g., 628C<::192.0.2.1>). 629 630=item no-mixed 631 632Print IP addresses in the canonical format (C<192.0.2.1> or 633C<2001:db8::1>) but do not used the mixed IPv4-IPv6 representations. 634For example, use C<::ffff:c000:201> instead of C<::ffff:192.0.2.1>. 635I<Since SiLK 3.17.0>. 636 637=item decimal 638 639Print IP addresses as integers in decimal format. For example, print 640C<192.0.2.1> and C<2001:db8::1> as C<3221225985> and 641C<42540766411282592856903984951653826561>, respectively. 642 643=item hexadecimal 644 645Print IP addresses as integers in hexadecimal format. For example, 646print C<192.0.2.1> and C<2001:db8::1> as C<c00000201> and 647C<20010db8000000000000000000000001>, respectively. 648 649=item zero-padded 650 651Make all IP address strings contain the same number of characters by 652padding numbers with leading zeros. For example, print C<192.0.2.1> 653and C<2001:db8::1> as C<192.000.002.001> and 654C<2001:0db8:0000:0000:0000:0000:0000:0001>, respectively. For IPv6 655addresses, this setting implies C<no-mixed>, so that 656C<::ffff:192.0.2.1> is printed as 657C<0000:0000:0000:0000:0000:ffff:c000:0201>. As of SiLK 3.17.0, may be 658combined with any of the above, including C<decimal> and 659C<hexadecimal>. 660 661=back 662 663The following arguments modify certain IP addresses prior to printing. 664These arguments may be combined with the above formats. 665 666=over 4 667 668=item map-v4 669 670Change IPv4 addresses to IPv4-mapped IPv6 addresses (addresses in the 671::ffff:0:0/96 netblock) prior to formatting. I<Since SiLK 3.17.0>. 672 673=item unmap-v6 674 675When the key contains IPv6 addresses, change any IPv4-mapped IPv6 676addresses (addresses in the ::ffff:0:0/96 netblock) to IPv4 addresses 677prior to formatting. I<Since SiLK 3.17.0>. 678 679=back 680 681The following argument is also available: 682 683=over 4 684 685=item force-ipv6 686 687Set I<FORMAT> to C<map-v4>,C<no-mixed>. 688 689=back 690 691=item B<--integer-ips> 692 693Print IP addresses as integers. This switch is equivalent to 694B<--ip-format=decimal>, it is deprecated as of SiLK 3.7.0, and it will 695be removed in the SiLK 4.0 release. 696 697=item B<--zero-pad-ips> 698 699Print IP addresses as fully-expanded, zero-padded values in their 700canonical form. This switch is equivalent to 701B<--ip-format=zero-padded>, it is deprecated as of SiLK 3.7.0, and it 702will be removed in the SiLK 4.0 release. 703 704=item B<--integer-sensors> 705 706Print the integer ID of the sensor rather than its name. 707 708=item B<--integer-tcp-flags> 709 710Print the TCP flag fields (flags, initialFlags, sessionFlags) as an 711integer value. Typically, the characters C<F,S,R,P,A,U,E,C> are used 712to represent the TCP flags. 713 714=item B<--no-titles> 715 716Turn off column titles. By default, titles are printed. 717 718=item B<--no-columns> 719 720Disable fixed-width columnar output. 721 722=item B<--column-separator>=I<C> 723 724Use specified character between columns and after the final column. 725When this switch is not specified, the default of 'B<|>' is used. 726 727=item B<--no-final-delimiter> 728 729Do not print the column separator after the final column. Normally a 730delimiter is printed. 731 732=item B<--delimited> 733 734=item B<--delimited>=I<C> 735 736Run as if B<--no-columns> B<--no-final-delimiter> B<--column-sep>=I<C> 737had been specified. That is, disable fixed-width columnar output; if 738character I<C> is provided, it is used as the delimiter between 739columns instead of the default 'B<|>'. 740 741=item B<--print-filenames> 742 743Print to the standard error the names of input files as they are 744opened. 745 746=item B<--copy-input>=I<PATH> 747 748Copy all binary SiLK Flow records read as input to the specified file 749or named pipe. I<PATH> may be C<stdout> or C<-> to write flows to the 750standard output as long as the B<--output-path> switch is specified to 751redirect B<rwuniq>'s textual output to a different location. 752 753=item B<--output-path>=I<PATH> 754 755Write the textual output to I<PATH>, where I<PATH> is a filename, a 756named pipe, the keyword C<stderr> to write the output to the standard 757error, or the keyword C<stdout> or C<-> to write the output to the 758standard output (and bypass the paging program). If I<PATH> names an 759existing file, B<rwuniq> exits with an error unless the SILK_CLOBBER 760environment variable is set, in which case I<PATH> is overwritten. If 761this switch is not given, the output is either sent to the pager or 762written to the standard output. 763 764=item B<--pager>=I<PAGER_PROG> 765 766When output is to a terminal, invoke the program I<PAGER_PROG> to view 767the output one screen full at a time. This switch overrides the 768SILK_PAGER environment variable, which in turn overrides the PAGER 769variable. If the B<--output-path> switch is given or if the value of 770the pager is determined to be the empty string, no paging is performed 771and all output is written to the terminal. 772 773=item B<--ipv6-policy>=I<POLICY> 774 775Determine how IPv4 and IPv6 flows are handled when SiLK has been 776compiled with IPv6 support. When the switch is not provided, the 777SILK_IPV6_POLICY environment variable is checked for a policy. If it 778is also unset or contains an invalid policy, the I<POLICY> is 779B<mix>. When SiLK has not been compiled with IPv6 support, IPv6 780flows are always ignored, regardless of the value passed to this 781switch or in the SILK_IPV6_POLICY variable. The supported values for 782I<POLICY> are: 783 784=over 4 785 786=item ignore 787 788Ignore any flow record marked as IPv6, regardless of the IP addresses 789it contains. 790 791=item asv4 792 793Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 794netblock (that is, IPv4-mapped IPv6 addresses) to IPv4 and ignore all 795other IPv6 flow records. 796 797=item mix 798 799Process the input as a mixture of IPv4 and IPv6 flow records. When an 800IP address is used as part of the key or value, this policy is 801equivalent to B<force>. 802 803=item force 804 805Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the 806::ffff:0:0/96 netblock. 807 808=item only 809 810Process only flow records that are marked as IPv6 and ignore IPv4 flow 811records in the input. 812 813=back 814 815=item B<--temp-directory>=I<DIR_PATH> 816 817Specify the name of the directory in which to store data files 818temporarily when the memory is not large enough to store all the bins 819and their aggregate values. This switch overrides the directory 820specified in the SILK_TMPDIR environment variable, which overrides the 821directory specified in the TMPDIR variable, which overrides the 822default, F</tmp>. 823 824=item B<--site-config-file>=I<FILENAME> 825 826Read the SiLK site configuration from the named file I<FILENAME>. 827When this switch is not provided, B<rwuniq> searches for the site 828configuration file in the locations specified in the L</FILES> 829section. 830 831=item B<--legacy-timestamps> 832 833=item B<--legacy-timestamps>=I<NUM> 834 835When I<NUM> is not specified or is 1, this switch is equivalent to 836B<--timestamp-format=m/d/y>. Otherwise, the switch has no effect. 837This switch is deprecated as of SiLK 3.0.0, and it will be removed in 838the SiLK 4.0 release. 839 840=item B<--xargs> 841 842=item B<--xargs>=I<FILENAME> 843 844Read the names of the input files from I<FILENAME> or from the 845standard input if I<FILENAME> is not provided. The input is expected 846to have one filename per line. B<rwuniq> opens each named file in 847turn and reads records from it as if the filenames had been listed on 848the command line. 849 850=item B<--help> 851 852Print the available options and exit. Specifying switches that add 853new fields, values, or additional switches before B<--help> allows 854the output to include descriptions of those fields or switches. 855 856=item B<--help-fields> 857 858Print the description and alias(es) of each field and value and exit. 859Specifying switches that add new fields before B<--help-fields> 860allows the output to include descriptions of those fields. 861 862=item B<--version> 863 864Print the version number and information about how SiLK was 865configured, then exit the application. 866 867=item B<--pmap-file>=I<PATH> 868 869=item B<--pmap-file>=I<MAPNAME>:I<PATH> 870 871Load the prefix map file located at I<PATH> and create fields named 872src-I<map-name> and dst-I<map-name> where I<map-name> is either the 873I<MAPNAME> part of the argument or the map-name specified when the 874file was created (see B<rwpmapbuild(1)>). If no map-name is 875available, B<rwuniq> names the fields C<sval> and C<dval>. Specify 876I<PATH> as C<-> or C<stdin> to read from the standard input. The 877switch may be repeated to load multiple prefix map files, but each 878prefix map must use a unique map-name. The B<--pmap-file> switch(es) 879must precede the B<--fields> switch. See also B<pmapfilter(3)>. 880 881=item B<--pmap-column-width>=I<NUM> 882 883When printing a label associated with a prefix map, this switch gives 884the maximum number of characters to use when displaying the textual 885value of the field. 886 887=item B<--python-file>=I<PATH> 888 889When the SiLK Python plug-in is used, B<rwuniq> reads the Python code 890from the file I<PATH> to define additional fields that can be used as 891part of the key or as an aggregate value. This file should call 892B<register_field()> for each field it wishes to define. For details 893and examples, see the B<silkpython(3)> and B<pysilk(3)> manual pages. 894 895=back 896 897=head2 Deprecated volume switches 898 899These options add the named aggregate field(s) to B<--values> if the 900field is not present. When an argument is specified, the switch is 901equivalent to a B<--threshold> switch. Use of these switches is 902deprecated. 903 904=over 4 905 906=item B<--all-counts> 907 908Append the following fields to the argument of the B<--values> switch 909unless the field is already present: B<Bytes>, B<Packets>, B<Records>, 910B<sTime-Earliest>, and B<eTime-Latest>. Deprecated since SiLK 2.0.0. 911 912=item B<--bytes> 913 914Append B<Bytes> to the argument of the B<--values> switch unless it is 915already present. Deprecated since SiLK 2.0.0. 916 917=item B<--bytes>=I<MIN> 918 919Add B<--threshold=bytes>=I<MIN> to the options. Deprecated since SiLK 9203.17.0. 921 922=item B<--bytes>=I<MIN>-I<MAX> 923 924Add B<--threshold=bytes>=I<MIN>-I<MAX> to the options. Deprecated 925since SiLK 3.17.0. 926 927=item B<--packets> 928 929Append B<Packets> to the argument of the B<--values> switch unless it 930is already present. Deprecated since SiLK 2.0.0. 931 932=item B<--packets>=I<MIN> 933 934Add B<--threshold=packets>=I<MIN> to the options. Deprecated since 935SiLK 3.17.0. 936 937=item B<--packets>=I<MIN>-I<MAX> 938 939Add B<--threshold=packets>=I<MIN>-I<MAX> to the options. Deprecated 940since SiLK 3.17.0. 941 942=item B<--flows> 943 944Append B<Records> to the argument of the B<--values> switch unless it 945is already present. Deprecated since SiLK 2.0.0. 946 947=item B<--flows>=I<MIN> 948 949Add B<--threshold=records>=I<MIN> to the options. Deprecated since 950SiLK 3.17.0. 951 952=item B<--flows>=I<MIN>-I<MAX> 953 954Add B<--threshold=records>=I<MIN>-I<MAX> to the options. Deprecated 955since SiLK 3.17.0. 956 957=item B<--sip-distinct> 958 959Append B<Distinct:sIP> to the argument of the B<--values> switch 960unless it is already present. Deprecated since SiLK 2.0.0. 961 962=item B<--sip-distinct>=I<MIN> 963 964Add B<--threshold=distinct:sip>=I<MIN> to the options. Deprecated 965since SiLK 3.17.0. 966 967=item B<--sip-distinct>=I<MIN>-I<MAX> 968 969Add B<--threshold=distinct:sip>=I<MIN>-I<MAX> to the options. 970Deprecated since SiLK 3.17.0. 971 972=item B<--dip-distinct> 973 974Append B<Distinct:dIP> to the argument of the B<--values> switch 975unless it is already present. Deprecated since SiLK 2.0.0. 976 977=item B<--dip-distinct>=I<MIN> 978 979Add B<--threshold=distinct:dip>=I<MIN> to the options. Deprecated 980since SiLK 3.17.0. 981 982=item B<--dip-distinct>=I<MIN>-I<MAX> 983 984Add B<--threshold=distinct:dip>=I<MIN>-I<MAX> to the options. 985Deprecated since SiLK 3.17.0. 986 987=item B<--stime> 988 989Append B<sTime-Earliest> to the argument of the B<--values> switch 990unless it is already present. Deprecated since SiLK 2.0.0. 991 992=item B<--etime> 993 994Append B<eTime-Latest> to the argument of the B<--values> switch 995unless it is already present. Deprecated since SiLK 2.0.0. 996 997=back 998 999=head1 EXAMPLES 1000 1001In these examples, the dollar sign (C<$>) represents the shell prompt 1002and a backslash (C<\>) is used to continue a line for better 1003readability. Many examples assume previous B<rwfilter(1)> commands 1004have written data files named F<data.rw> and F<data-v6.rw>. 1005 1006=for comment 1007The output for nearly all commands is generated from the "make check" 1008test data. All commands assume data.rw only contains the incoming 1009data, that is "rwfilter --type=in,inweb". 1010 1011The B<--fields> switch is required to specify which field(s) comprise 1012the key. By default, B<rwuniq> counts the number of records for each 1013key. This example uses the source port as the key. 1014 1015 $ rwuniq --fields=sport data.rw | head 1016 sPort| Records| 1017 53| 62216| 1018 22| 27994| 1019 67| 7807| 1020 29897| 78| 1021 28816| 24| 1022 80| 27044| 1023 28925| 22| 1024 0| 7801| 1025 29246| 63| 1026 1027Notice how the keys are printed in an arbitrary order. Use the 1028B<--sort-output> switch to arrange the keys from lowest to highest. 1029 1030 $ rwuniq --fields=sport --sort-output data.rw | head 1031 sPort| Records| 1032 0| 7801| 1033 22| 27994| 1034 25| 15568| 1035 53| 62216| 1036 67| 7807| 1037 80| 27044| 1038 123| 7741| 1039 443| 7917| 1040 8080| 3946| 1041 1042To sort the output by a volume field (such as the number of records), 1043use B<rwstats(1)>. 1044 1045 $ rwstats --fields=sport --count=10 data.rw 1046 INPUT: 250928 Records for 4739 Bins and 250928 Total Records 1047 OUTPUT: Top 10 Bins by Records 1048 sPort| Records| %Records| cumul_%| 1049 53| 62216| 24.794363| 24.794363| 1050 22| 27994| 11.156188| 35.950552| 1051 80| 27044| 10.777594| 46.728145| 1052 25| 15568| 6.204170| 52.932315| 1053 443| 7917| 3.155088| 56.087404| 1054 67| 7807| 3.111251| 59.198655| 1055 0| 7801| 3.108860| 62.307515| 1056 123| 7741| 3.084949| 65.392463| 1057 8080| 3946| 1.572563| 66.965026| 1058 29921| 117| 0.046627| 67.011653| 1059 1060Alternatively, process the textual output of B<rwuniq> with the UNIX 1061B<sort(1)> utility. 1062 1063 $ rwuniq --fields=sport data.rw \ 1064 | sort -r -t '|' -k 2 | head 1065 sPort| Records| 1066 53| 62216| 1067 22| 27994| 1068 80| 27044| 1069 25| 15568| 1070 443| 7917| 1071 67| 7807| 1072 0| 7801| 1073 123| 7741| 1074 8080| 3946| 1075 1076Use the B<--values> field to change the volume that B<rwuniq> computes 1077for each key. This example prints the byte-, packet-, and 1078record-counts for each protocol, sorting the results by protocol. 1079 1080 $ rwuniq --fields=proto --values=bytes,packets,records --sort data.rw 1081 pro| Bytes| Packets| Records| 1082 1| 5344836| 73473| 7801| 1083 6| 59945492930| 72127917| 165363| 1084 17| 17553593| 77764| 77764| 1085 1086The B<--threshold> switch limits the output to rows where a value 1087field meets a minimum value or falls within a specific range. For 1088example, print the number of records and packets seen for each source 1089port for bins having at least 1000 records. 1090 1091 $ rwuniq --fields=sport --values=records,packets \ 1092 --threshold=records=1000 data.rw 1093 sPort| Records| Packets| 1094 53| 62216| 62216| 1095 22| 27994| 23434615| 1096 67| 7807| 7807| 1097 80| 27044| 8271125| 1098 0| 7801| 73473| 1099 123| 7741| 7741| 1100 25| 15568| 427777| 1101 443| 7917| 2421124| 1102 8080| 3946| 1202528| 1103 1104Multiple thresholds may be specified. 1105 1106 $ rwuniq --fields=sport --values=records,packets \ 1107 --threshold=records=1000-5000 --threshold=packets=1000000 \ 1108 data.rw 1109 sPort| Records| Packets| 1110 8080| 3946| 1202528| 1111 1112The B<--bin-time> switch adjusts the times used by the C<sTime> and 1113C<eTime> key fields. An argument of 86400 moves the starting and 1114ending time to day boundaries. 1115 1116 $ rwuniq --bin-time=86400 --fields=stime,etime data.rw 1117 sTime| eTime| Records| 1118 2009/02/12T00:00:00|2009/02/12T00:00:00| 82969| 1119 2009/02/12T00:00:00|2009/02/13T00:00:00| 360| 1120 2009/02/13T00:00:00|2009/02/13T00:00:00| 83594| 1121 2009/02/13T00:00:00|2009/02/14T00:00:00| 332| 1122 2009/02/14T00:00:00|2009/02/14T00:00:00| 83673| 1123 1124The B<--bin-time> switch does not adjust the C<duration> value unless 1125both C<sTime> and C<eTime> are given. 1126 1127 $ rwuniq --bin-time=86400 --fields=stime,dur --sort data.rw | head -6 1128 sTime|durat| Records| 1129 2009/02/12T00:00:00| 0| 29523| 1130 2009/02/12T00:00:00| 1| 4312| 1131 2009/02/12T00:00:00| 2| 4376| 1132 2009/02/12T00:00:00| 3| 3986| 1133 2009/02/12T00:00:00| 4| 923| 1134 1135 $ rwuniq --bin-time=86400 --fields=stime,dur,etime data.rw 1136 sTime|durat| eTime| Records| 1137 2009/02/12T00:00:00| 0|2009/02/12T00:00:00| 82969| 1138 2009/02/12T00:00:00|86400|2009/02/13T00:00:00| 360| 1139 2009/02/13T00:00:00| 0|2009/02/13T00:00:00| 83594| 1140 2009/02/13T00:00:00|86400|2009/02/14T00:00:00| 332| 1141 2009/02/14T00:00:00| 0|2009/02/14T00:00:00| 83673| 1142 1143As of SiLK 3.17.0, the B<--bin-time> switch accepts a floating point 1144value. When the fractional part is non-zero, B<rwuniq> uses 1145millisecond precision for the times and the duration. 1146 1147 $ rwuniq --bin-time=0.001 --fields=duration data.rw | head -6 1148 duration| Records| 1149 0.000| 85565| 1150 1791.045| 4| 1151 2.120| 19| 1152 22.263| 5| 1153 19.902| 3| 1154 1155The B<--bin-time> does not adjust the C<sTime-Earliest> and 1156C<eTime-Latest> aggregate value fields, but it does determine whether 1157those fields maintain millisecond precision. 1158 1159 $ rwuniq --bin-time=86400 --fields=stime --value=etime data.rw 1160 sTime| eTime-Latest| 1161 2009/02/12T00:00:00|2009/02/12T00:29:59| 1162 2009/02/13T00:00:00|2009/02/13T00:29:58| 1163 2009/02/14T00:00:00|2009/02/14T00:29:59| 1164 1165 $ rwuniq --bin-time=0.001 --fields=proto --value=stime,etime data.rw 1166 pro| sTime-Earliest| eTime-Latest| 1167 17|2009/02/12T00:00:02.745|1970/01/15T06:57:35.997| 1168 6|2009/02/12T00:00:03.004|1970/01/15T06:57:35.998| 1169 1|2009/02/12T00:00:20.601|1970/01/15T06:57:35.992| 1170 1171With an input of both IPv4 and IPv6 records, B<rwuniq> maps the IPv4 1172records into the ::ffff:0:0/96 netblock. The data is normally mapped 1173back to IPv4 on output. Given this input: 1174 1175 $ rwcut --fields=sip,packets /tmp/v4v6.rw 1176 sIP| packets| 1177 ::1| 45| 1178 192.0.2.22| 87| 1179 ::ffff:203.0.113.113| 2662| 1180 2001:db8:54:32:ab:cd::| 345| 1181 1182The B<rwuniq> tool produces: 1183 1184 $ rwuniq --fields=sip --values=packets /tmp/v4v6.rw 1185 sIP| Packets| 1186 ::1| 45| 1187 192.0.2.22| 87| 1188 203.0.113.113| 2662| 1189 2001:db8:54:32:ab:cd::| 345| 1190 1191Set the B<--ip-format> to map-v4 to leave the values as IPv4-mapped 1192IPv6. (Using an B<--ipv6-policy> of C<force-ipv6> has the same 1193effect.) 1194 1195 $ rwuniq --fields=sip --values=packets --ip-format=map-v4 /tmp/v4v6.rw 1196 sIP| Packets| 1197 ::1| 45| 1198 ::ffff:192.0.2.22| 87| 1199 ::ffff:203.0.113.113| 2662| 1200 2001:db8:54:32:ab:cd::| 345| 1201 1202Print the source addresses that sent more than 10,000,000 bytes, and 1203for each address print the number of unique destination hosts it 1204contacted: 1205 1206 $ rwuniq --fields=sip --values=bytes,distinct:dip \ 1207 --threshold=bytes=10000000 data-v6.rw 1208 sIP| Bytes|dIP-Distin| 1209 2001:db8:a:fd::90:bd| 14529210| 2| 1210 1211Print the number of bytes that host shared with each destination 1212(first use B<rwfilter> to limit the input to that host): 1213 1214 $ rwfilter --saddr=2001:db8:a:fd::90:bd --pass=- data-v6.rw \ 1215 | rwuniq --fields=dip --values=bytes 1216 dIP| Bytes| 1217 2001:db8:c0:a8::fa:5d| 7097847| 1218 2001:db8:c0:a8::dd:6| 7431363| 1219 1220Print the packet and byte counts for each IPv4 source-destination 1221pair, where the prefix length is 16 (use B<rwnetmask(1)> on the input 1222to B<rwuniq>): 1223 1224 $ rwnetmask --4sip-prefix=16 --4dip-prefix=16 data.rw \ 1225 | rwuniq --fields=sip,dip --values=packet,byte | head 1226 sIP| dIP| Packets| Bytes| 1227 10.139.0.0| 192.168.0.0| 33490| 22950353| 1228 10.40.0.0| 192.168.0.0| 258| 18544| 1229 10.204.0.0| 192.168.0.0| 353233| 288736424| 1230 10.106.0.0| 192.168.0.0| 13051| 3843693| 1231 10.71.0.0| 192.168.0.0| 4355| 1391194| 1232 10.98.0.0| 192.168.0.0| 7312| 7328359| 1233 10.114.0.0| 192.168.0.0| 2538| 4137927| 1234 10.168.0.0| 192.168.0.0| 92094| 86883062| 1235 10.176.0.0| 192.168.0.0| 122101| 116555051| 1236 1237Given a file of scan traffic, print the source of TCP traffic with no 1238more than 3 packets and which also appears at least 4 times. First 1239use B<rwfilter> to limit the traffic to TCP and find the flow records 1240where the packet count in that flow record is no more than 3. 1241 1242 $ rwfilter --proto=6 --packets=1-3 --pass=- scandata.rw \ 1243 | rwuniq --field=sip --values=flow,packets --threshold=flows=4 \ 1244 | head -5 1245 sIP| Records| Packets| 1246 10.249.216.38| 256| 256| 1247 10.155.55.93| 256| 256| 1248 10.61.255.154| 256| 256| 1249 10.60.122.82| 256| 256| 1250 1251The B<silkpython(3)> manual page provides examples that use PySiLK to 1252create arbitrary fields to use as part of the key for B<rwuniq>. 1253 1254When using B<rwuniq> on input that contains both incoming and outgoing 1255flow records, consider using the B<int-ext-fields(3)> plug-in which 1256defines four additional fields representing the external IP address, 1257the external port, the internal IP address, and the internal port. 1258The plug-in requires the user to specify which class/type pairs are 1259incoming and which are outgoing. See its manual page for additional 1260information. As an example, here we run B<rwuniq> on a file 1261containing incoming and outgoing web traffic. 1262 1263 $ rwuniq --fields=sip,sport,dip,dport --values=bytes \ 1264 --sort-output data.rw | head -7 1265 sIP|sPort| dIP|dPort| Bytes| 1266 10.4.52.235|29631|192.168.233.171| 80| 18260| 1267 10.5.231.251| 80|192.168.226.129|28770| 536169| 1268 10.9.77.117|29906| 192.168.184.65| 80| 55386| 1269 10.11.88.88| 80|192.168.251.222|28902| 433198| 1270 10.14.110.214|29989| 192.168.249.96| 80| 25903| 1271 10.15.224.27| 443| 192.168.231.49|29779| 163759| 1272 1273Here the B<int-ext-fields> plug-in is used: 1274 1275 $ export INCOMING_FLOWTYPES=all/in,all/inweb 1276 $ export OUTGOING_FLOWTYPES=all/out,all/outweb 1277 $ rwuniq --plugin=int-ext-fields.so \ 1278 --fields=ext-ip,ext-port,int-ip,int-port --value=bytes \ 1279 --sort-output data.rw | head -7 1280 ext-ip|ext-p| int-ip|int-p| Bytes| 1281 10.4.52.235|29631|192.168.233.171| 80| 726111| 1282 10.5.231.251| 80|192.168.226.129|28770| 561654| 1283 10.9.77.117|29906| 192.168.184.65| 80| 1811738| 1284 10.11.88.88| 80|192.168.251.222|28902| 444277| 1285 10.14.110.214|29989| 192.168.249.96| 80| 393068| 1286 10.15.224.27| 443| 192.168.231.49|29779| 167696| 1287 1288=head1 ENVIRONMENT 1289 1290=over 4 1291 1292=item SILK_IPV6_POLICY 1293 1294This environment variable is used as the value for B<--ipv6-policy> 1295when that switch is not provided. 1296 1297=item SILK_IP_FORMAT 1298 1299This environment variable is used as the value for B<--ip-format> when 1300that switch is not provided. I<Since SiLK 3.11.0.> 1301 1302=item SILK_TIMESTAMP_FORMAT 1303 1304This environment variable is used as the value for 1305B<--timestamp-format> when that switch is not provided. I<Since SiLK 13063.11.0.> 1307 1308=item SILK_PAGER 1309 1310When set to a non-empty string, B<rwuniq> automatically invokes this 1311program to display its output a screen at a time. If set to an empty 1312string, B<rwuniq> does not automatically page its output. 1313 1314=item PAGER 1315 1316When set and SILK_PAGER is not set, B<rwuniq> automatically invokes 1317this program to display its output a screen at a time. 1318 1319=item SILK_TMPDIR 1320 1321When set and B<--temp-directory> is not specified, B<rwuniq> writes 1322the temporary files it creates to this directory. SILK_TMPDIR 1323overrides the value of TMPDIR. 1324 1325=item TMPDIR 1326 1327When set and SILK_TMPDIR is not set, B<rwuniq> writes the temporary 1328files it creates to this directory. 1329 1330=item PYTHONPATH 1331 1332This environment variable is used by Python to locate modules. When 1333B<--python-file> is specified, B<rwuniq> must load the Python files 1334that comprise the PySiLK package, such as F<silk/__init__.py>. If 1335this F<silk/> directory is located outside Python's normal search path 1336(for example, in the SiLK installation tree), it may be necessary to 1337set or modify the PYTHONPATH environment variable to include the 1338parent directory of F<silk/> so that Python can find the PySiLK 1339module. 1340 1341=item SILK_PYTHON_TRACEBACK 1342 1343When set, Python plug-ins print traceback information on Python 1344errors to the standard error. 1345 1346=item SILK_COUNTRY_CODES 1347 1348This environment variable allows the user to specify the country code 1349mapping file that B<rwuniq> uses when computing the scc and dcc 1350fields. The value may be a complete path or a file relative to the 1351SILK_PATH. See the L</FILES> section for standard locations of this 1352file. 1353 1354=item SILK_ADDRESS_TYPES 1355 1356This environment variable allows the user to specify the address type 1357mapping file that B<rwuniq> uses when computing the sType and dType 1358fields. The value may be a complete path or a file relative to the 1359SILK_PATH. See the L</FILES> section for standard locations of this 1360file. 1361 1362=item SILK_CLOBBER 1363 1364The SiLK tools normally refuse to overwrite existing files. Setting 1365SILK_CLOBBER to a non-empty value removes this restriction. 1366 1367=item SILK_CONFIG_FILE 1368 1369This environment variable is used as the value for the 1370B<--site-config-file> when that switch is not provided. 1371 1372=item SILK_DATA_ROOTDIR 1373 1374This environment variable specifies the root directory of data 1375repository. As described in the L</FILES> section, B<rwuniq> may 1376use this environment variable when searching for the SiLK site 1377configuration file. 1378 1379=item SILK_PATH 1380 1381This environment variable gives the root of the install tree. When 1382searching for configuration files and plug-ins, B<rwuniq> may use this 1383environment variable. See the L</FILES> section for details. 1384 1385=item TZ 1386 1387When the argument to the B<--timestamp-format> switch includes 1388C<local> or when a SiLK installation is built to use the local 1389timezone, the value of the TZ environment variable determines the 1390timezone in which B<rwuniq> displays timestamps. (If both of 1391those are false, the TZ environment variable is ignored.) If the TZ 1392environment variable is not set, the machine's default timezone is 1393used. Setting TZ to the empty string or 0 causes timestamps to be 1394displayed in UTC. For system information on the TZ variable, see 1395B<tzset(3)> or B<environ(7)>. (To determine if SiLK was built with 1396support for the local timezone, check the C<Timezone support> value in 1397the output of B<rwuniq --version>.) 1398 1399=item SILK_PLUGIN_DEBUG 1400 1401When set to 1, B<rwuniq> prints status messages to the standard error 1402as it attempts to find and open each of its plug-ins. In addition, 1403when an attempt to register a field fails, B<rwuniq> prints a message 1404specifying the additional function(s) that must be defined to register 1405the field in B<rwuniq>. Be aware that the output can be rather 1406verbose. 1407 1408=item SILK_TEMPFILE_DEBUG 1409 1410When set to 1, B<rwuniq> prints debugging messages to the standard 1411error as it creates, re-opens, and removes temporary files. 1412 1413=item SILK_UNIQUE_DEBUG 1414 1415When set to 1, the binning engine used by B<rwuniq> prints debugging 1416messages to the standard error. 1417 1418=back 1419 1420=head1 FILES 1421 1422=over 4 1423 1424=item F<${SILK_ADDRESS_TYPES}> 1425 1426=item F<${SILK_PATH}/share/silk/address_types.pmap> 1427 1428=item F<${SILK_PATH}/share/address_types.pmap> 1429 1430=item F<@prefix@/share/silk/address_types.pmap> 1431 1432=item F<@prefix@/share/address_types.pmap> 1433 1434Possible locations for the address types mapping file required by the 1435sType and dType fields. 1436 1437=item F<${SILK_CONFIG_FILE}> 1438 1439=item F<${SILK_DATA_ROOTDIR}/silk.conf> 1440 1441=item F<@SILK_DATA_ROOTDIR@/silk.conf> 1442 1443=item F<${SILK_PATH}/share/silk/silk.conf> 1444 1445=item F<${SILK_PATH}/share/silk.conf> 1446 1447=item F<@prefix@/share/silk/silk.conf> 1448 1449=item F<@prefix@/share/silk.conf> 1450 1451Possible locations for the SiLK site configuration file which are 1452checked when the B<--site-config-file> switch is not provided. 1453 1454=item F<${SILK_COUNTRY_CODES}> 1455 1456=item F<${SILK_PATH}/share/silk/country_codes.pmap> 1457 1458=item F<${SILK_PATH}/share/country_codes.pmap> 1459 1460=item F<@prefix@/share/silk/country_codes.pmap> 1461 1462=item F<@prefix@/share/country_codes.pmap> 1463 1464Possible locations for the country code mapping file required by the 1465scc and dcc fields. 1466 1467=item F<${SILK_PATH}/lib64/silk/> 1468 1469=item F<${SILK_PATH}/lib64/> 1470 1471=item F<${SILK_PATH}/lib/silk/> 1472 1473=item F<${SILK_PATH}/lib/> 1474 1475=item F<@prefix@/lib64/silk/> 1476 1477=item F<@prefix@/lib64/> 1478 1479=item F<@prefix@/lib/silk/> 1480 1481=item F<@prefix@/lib/> 1482 1483Directories that B<rwuniq> checks when attempting to load a plug-in. 1484 1485=item F<${SILK_TMPDIR}/> 1486 1487=item F<${TMPDIR}/> 1488 1489=item F</tmp/> 1490 1491Directory in which to create temporary files. 1492 1493=back 1494 1495=head1 NOTES 1496 1497If multiple thresholds are given (e.g., C<--threshold=bytes=80 1498--threshold=flows=2>), the values must meet all thresholds before the 1499record is printed. For example, if a given key saw a single 100-byte 1500flow, the entry would not printed given the switches above. 1501 1502B<rwuniq> functionally replaces the combination of 1503 1504 rwcut | sort | uniq -c 1505 1506To get a list of unique IP addresses in a data set without the 1507counting or threshold abilities of B<rwuniq>, consider using the IPset 1508tools B<rwset(1)> and B<rwsetcat(1)> for improved performance: 1509 1510 rwset --sip-set=stdout | rwsetcat --print-ips 1511 1512For situations where the key and value are each a single field, the 1513Bag tools (B<rwbag(1)>, B<rwbagcat(1)>) often provide better 1514performance, especially when the key length is one or two bytes: 1515 1516 rwbag --bag-file=sport,bytes,stdout | rwbagcat 1517 1518To create a binary file that contains B<rwuniq>-like output, use 1519B<rwaggbag(1)> or B<rwaggbagbuild(1)>. The content of these files may 1520be printed with B<rwaggbagcat(1)>. 1521 1522B<rwgroup(1)> works similarly to B<rwuniq>, except the data remains in 1523the form of SiLK Flow records, and the next-hop-IP field is modified 1524to denote the records that form a bin. 1525 1526B<rwstats(1)> can do the same binning as B<rwuniq>, and then sort the 1527data by an aggregate field. 1528 1529When the B<--bin-time> switch is given and the three time fields 1530(starting-time (C<sTime>), ending-time (C<eTime>), and duration 1531(C<duration>)) are present in the key, the duration field's value will be 1532modified to be the difference between the ending and starting times. 1533 1534When the three time-related key fields (C<sTime>,C<duration>,C<eTime>) are 1535all in use, B<rwuniq> will ignore the final time field when binning 1536the records, but the field will appear in the output. Due to 1537truncation of the milliseconds values, B<rwuniq> will print a 1538different number of rows depending on the order in which those three 1539values appear in the B<--fields> switch. 1540 1541B<rwuniq> supports counting distinct source and/or destination IPs. 1542To see the number of distinct sources for each 10 minute bin, run: 1543 1544 rwuniq --fields=stime --values=distinct:sip --bin-time=600 --sort-output 1545 1546When computing distinct counts over a field, the field may not be part 1547of the key; that is, you cannot have C<--fields=sip 1548--values=sip-distinct>. 1549 1550Using the B<--presorted-input> switch sometimes introduces more issues 1551than it solves, and B<--presorted-input> is less necessary now that 1552B<rwuniq> can use temporary files while processing input. 1553 1554When computing distinct IP counts, B<rwuniq> will typically run faster 1555if you do I<not> use the B<--presorted-input> switch, even if the data 1556was previously sorted. 1557 1558When using the B<--presorted-input> switch, it is highly recommended 1559that you use no more than one time-related key field (C<sTime>, 1560C<duration>, C<eTime>) in the B<--fields> switch and that the time-related 1561key appear last in B<--fields>. The issue is caused by B<rwsort> 1562considering the millisecond values on the times when sorting, while 1563B<rwuniq> truncates the millisecond value. The result may be unsorted 1564output and multiple rows in the output that have the same values for 1565the key fields: 1566 1567 $ rwsort --fields=stime,duration data.rw \ 1568 | rwuniq --fields=stime,dur --presorted 1569 sTime|durat| Records| 1570 ... 1571 2009/02/12T00:00:57| 0| 2| 1572 2009/02/12T00:00:57| 29| 2| 1573 2009/02/12T00:00:57| 0| 2| 1574 2009/02/12T00:00:57| 13| 2| 1575 ... 1576 1577B<rwuniq>'s strength is its ability to build arbitrary keys and 1578aggregate fields. For a key of a single IP address, see 1579B<rwaddrcount(1)> and B<rwbag(1)>; for a key made up of a single CIDR 1580block (/8, /16, /24 only), a single port, or a single protocol, use 1581B<rwtotal(1)> or B<rwbag(1)>. 1582 1583As of SiLK 3.17.0, fields that are specified with the legacy 1584thresholding switches (e.g., B<--bytes>) and not with B<--values> are 1585printed in the order in which those switches appear. Previously, the 1586order was always bytes, packets, flows, stime, etime, sip-distinct, 1587dip-distinct. 1588 1589=head1 SEE ALSO 1590 1591B<rwfilter(1)>, B<rwbag(1)>, B<rwbagcat(1)>, B<rwaggbag(1)>, 1592B<rwaggbagbuild(1)>, B<rwaggbagcat(1)>, B<rwcut(1)>, B<rwset(1)>, 1593B<rwsetcat(1)>, B<rwaddrcount(1)>, B<rwgroup(1)>, B<rwstats(1)>, 1594B<rwnetmask(1)>, B<rwsort(1)>, B<rwtotal(1)>, B<rwcount(1)>, 1595B<rwpmapbuild(1)>, B<addrtype(3)>, B<ccfilter(3)>, 1596B<int-ext-fields(3)>, B<pmapfilter(3)>, B<pysilk(3)>, 1597B<silkpython(3)>, B<silk-plugin(3)>, B<sensor.conf(5)>, 1598B<rwflowpack(8)>, B<silk(7)>, B<yaf(1)>, B<dlopen(3)>, B<tzset(3)>, 1599B<environ(7)> 1600 1601=cut 1602 1603$SiLK: rwuniq.pod 861b66f000c2 2019-09-24 22:01:14Z mthomas $ 1604 1605Local Variables: 1606mode:text 1607indent-tabs-mode:nil 1608End: 1609