1=head1 NAME 2 3perlfaq4 - Data Manipulation 4 5=head1 VERSION 6 7version 5.20190126 8 9=head1 DESCRIPTION 10 11This section of the FAQ answers questions related to manipulating 12numbers, dates, strings, arrays, hashes, and miscellaneous data issues. 13 14=head1 Data: Numbers 15 16=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)? 17 18For the long explanation, see David Goldberg's "What Every Computer 19Scientist Should Know About Floating-Point Arithmetic" 20(L<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>). 21 22Internally, your computer represents floating-point numbers in binary. 23Digital (as in powers of two) computers cannot store all numbers 24exactly. Some real numbers lose precision in the process. This is a 25problem with how computers store numbers and affects all computer 26languages, not just Perl. 27 28L<perlnumber> shows the gory details of number representations and 29conversions. 30 31To limit the number of decimal places in your numbers, you can use the 32C<printf> or C<sprintf> function. See 33L<perlop/"Floating-point Arithmetic"> for more details. 34 35 printf "%.2f", 10/3; 36 37 my $number = sprintf "%.2f", 10/3; 38 39=head2 Why is int() broken? 40 41Your C<int()> is most probably working just fine. It's the numbers that 42aren't quite what you think. 43 44First, see the answer to "Why am I getting long decimals 45(eg, 19.9499999999999) instead of the numbers I should be getting 46(eg, 19.95)?". 47 48For example, this 49 50 print int(0.6/0.2-2), "\n"; 51 52will in most computers print 0, not 1, because even such simple 53numbers as 0.6 and 0.2 cannot be presented exactly by floating-point 54numbers. What you think in the above as 'three' is really more like 552.9999999999999995559. 56 57=head2 Why isn't my octal data interpreted correctly? 58 59(contributed by brian d foy) 60 61You're probably trying to convert a string to a number, which Perl only 62converts as a decimal number. When Perl converts a string to a number, it 63ignores leading spaces and zeroes, then assumes the rest of the digits 64are in base 10: 65 66 my $string = '0644'; 67 68 print $string + 0; # prints 644 69 70 print $string + 44; # prints 688, certainly not octal! 71 72This problem usually involves one of the Perl built-ins that has the 73same name a Unix command that uses octal numbers as arguments on the 74command line. In this example, C<chmod> on the command line knows that 75its first argument is octal because that's what it does: 76 77 %prompt> chmod 644 file 78 79If you want to use the same literal digits (644) in Perl, you have to tell 80Perl to treat them as octal numbers either by prefixing the digits with 81a C<0> or using C<oct>: 82 83 chmod( 0644, $filename ); # right, has leading zero 84 chmod( oct(644), $filename ); # also correct 85 86The problem comes in when you take your numbers from something that Perl 87thinks is a string, such as a command line argument in C<@ARGV>: 88 89 chmod( $ARGV[0], $filename ); # wrong, even if "0644" 90 91 chmod( oct($ARGV[0]), $filename ); # correct, treat string as octal 92 93You can always check the value you're using by printing it in octal 94notation to ensure it matches what you think it should be. Print it 95in octal and decimal format: 96 97 printf "0%o %d", $number, $number; 98 99=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions? 100 101Remember that C<int()> merely truncates toward 0. For rounding to a 102certain number of digits, C<sprintf()> or C<printf()> is usually the 103easiest route. 104 105 printf("%.3f", 3.1415926535); # prints 3.142 106 107The L<POSIX> module (part of the standard Perl distribution) 108implements C<ceil()>, C<floor()>, and a number of other mathematical 109and trigonometric functions. 110 111 use POSIX; 112 my $ceil = ceil(3.5); # 4 113 my $floor = floor(3.5); # 3 114 115In 5.000 to 5.003 perls, trigonometry was done in the L<Math::Complex> 116module. With 5.004, the L<Math::Trig> module (part of the standard Perl 117distribution) implements the trigonometric functions. Internally it 118uses the L<Math::Complex> module and some functions can break out from 119the real axis into the complex plane, for example the inverse sine of 1202. 121 122Rounding in financial applications can have serious implications, and 123the rounding method used should be specified precisely. In these 124cases, it probably pays not to trust whichever system of rounding is 125being used by Perl, but instead to implement the rounding function you 126need yourself. 127 128To see why, notice how you'll still have an issue on half-way-point 129alternation: 130 131 for (my $i = -5; $i <= 5; $i += 0.5) { printf "%.0f ",$i } 132 133 -5 -4 -4 -4 -3 -2 -2 -2 -1 -0 0 0 1 2 2 2 3 4 4 4 5 134 135Don't blame Perl. It's the same as in C. IEEE says we have to do 136this. Perl numbers whose absolute values are integers under 2**31 (on 13732-bit machines) will work pretty much like mathematical integers. 138Other numbers are not guaranteed. 139 140=head2 How do I convert between numeric representations/bases/radixes? 141 142As always with Perl there is more than one way to do it. Below are a 143few examples of approaches to making common conversions between number 144representations. This is intended to be representational rather than 145exhaustive. 146 147Some of the examples later in L<perlfaq4> use the L<Bit::Vector> 148module from CPAN. The reason you might choose L<Bit::Vector> over the 149perl built-in functions is that it works with numbers of ANY size, 150that it is optimized for speed on some operations, and for at least 151some programmers the notation might be familiar. 152 153=over 4 154 155=item How do I convert hexadecimal into decimal 156 157Using perl's built in conversion of C<0x> notation: 158 159 my $dec = 0xDEADBEEF; 160 161Using the C<hex> function: 162 163 my $dec = hex("DEADBEEF"); 164 165Using C<pack>: 166 167 my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8))); 168 169Using the CPAN module C<Bit::Vector>: 170 171 use Bit::Vector; 172 my $vec = Bit::Vector->new_Hex(32, "DEADBEEF"); 173 my $dec = $vec->to_Dec(); 174 175=item How do I convert from decimal to hexadecimal 176 177Using C<sprintf>: 178 179 my $hex = sprintf("%X", 3735928559); # upper case A-F 180 my $hex = sprintf("%x", 3735928559); # lower case a-f 181 182Using C<unpack>: 183 184 my $hex = unpack("H*", pack("N", 3735928559)); 185 186Using L<Bit::Vector>: 187 188 use Bit::Vector; 189 my $vec = Bit::Vector->new_Dec(32, -559038737); 190 my $hex = $vec->to_Hex(); 191 192And L<Bit::Vector> supports odd bit counts: 193 194 use Bit::Vector; 195 my $vec = Bit::Vector->new_Dec(33, 3735928559); 196 $vec->Resize(32); # suppress leading 0 if unwanted 197 my $hex = $vec->to_Hex(); 198 199=item How do I convert from octal to decimal 200 201Using Perl's built in conversion of numbers with leading zeros: 202 203 my $dec = 033653337357; # note the leading 0! 204 205Using the C<oct> function: 206 207 my $dec = oct("33653337357"); 208 209Using L<Bit::Vector>: 210 211 use Bit::Vector; 212 my $vec = Bit::Vector->new(32); 213 $vec->Chunk_List_Store(3, split(//, reverse "33653337357")); 214 my $dec = $vec->to_Dec(); 215 216=item How do I convert from decimal to octal 217 218Using C<sprintf>: 219 220 my $oct = sprintf("%o", 3735928559); 221 222Using L<Bit::Vector>: 223 224 use Bit::Vector; 225 my $vec = Bit::Vector->new_Dec(32, -559038737); 226 my $oct = reverse join('', $vec->Chunk_List_Read(3)); 227 228=item How do I convert from binary to decimal 229 230Perl 5.6 lets you write binary numbers directly with 231the C<0b> notation: 232 233 my $number = 0b10110110; 234 235Using C<oct>: 236 237 my $input = "10110110"; 238 my $decimal = oct( "0b$input" ); 239 240Using C<pack> and C<ord>: 241 242 my $decimal = ord(pack('B8', '10110110')); 243 244Using C<pack> and C<unpack> for larger strings: 245 246 my $int = unpack("N", pack("B32", 247 substr("0" x 32 . "11110101011011011111011101111", -32))); 248 my $dec = sprintf("%d", $int); 249 250 # substr() is used to left-pad a 32-character string with zeros. 251 252Using L<Bit::Vector>: 253 254 my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111"); 255 my $dec = $vec->to_Dec(); 256 257=item How do I convert from decimal to binary 258 259Using C<sprintf> (perl 5.6+): 260 261 my $bin = sprintf("%b", 3735928559); 262 263Using C<unpack>: 264 265 my $bin = unpack("B*", pack("N", 3735928559)); 266 267Using L<Bit::Vector>: 268 269 use Bit::Vector; 270 my $vec = Bit::Vector->new_Dec(32, -559038737); 271 my $bin = $vec->to_Bin(); 272 273The remaining transformations (e.g. hex -> oct, bin -> hex, etc.) 274are left as an exercise to the inclined reader. 275 276=back 277 278=head2 Why doesn't & work the way I want it to? 279 280The behavior of binary arithmetic operators depends on whether they're 281used on numbers or strings. The operators treat a string as a series 282of bits and work with that (the string C<"3"> is the bit pattern 283C<00110011>). The operators work with the binary form of a number 284(the number C<3> is treated as the bit pattern C<00000011>). 285 286So, saying C<11 & 3> performs the "and" operation on numbers (yielding 287C<3>). Saying C<"11" & "3"> performs the "and" operation on strings 288(yielding C<"1">). 289 290Most problems with C<&> and C<|> arise because the programmer thinks 291they have a number but really it's a string or vice versa. To avoid this, 292stringify the arguments explicitly (using C<""> or C<qq()>) or convert them 293to numbers explicitly (using C<0+$arg>). The rest arise because 294the programmer says: 295 296 if ("\020\020" & "\101\101") { 297 # ... 298 } 299 300but a string consisting of two null bytes (the result of C<"\020\020" 301& "\101\101">) is not a false value in Perl. You need: 302 303 if ( ("\020\020" & "\101\101") !~ /[^\000]/) { 304 # ... 305 } 306 307=head2 How do I multiply matrices? 308 309Use the L<Math::Matrix> or L<Math::MatrixReal> modules (available from CPAN) 310or the L<PDL> extension (also available from CPAN). 311 312=head2 How do I perform an operation on a series of integers? 313 314To call a function on each element in an array, and collect the 315results, use: 316 317 my @results = map { my_func($_) } @array; 318 319For example: 320 321 my @triple = map { 3 * $_ } @single; 322 323To call a function on each element of an array, but ignore the 324results: 325 326 foreach my $iterator (@array) { 327 some_func($iterator); 328 } 329 330To call a function on each integer in a (small) range, you B<can> use: 331 332 my @results = map { some_func($_) } (5 .. 25); 333 334but you should be aware that in this form, the C<..> operator 335creates a list of all integers in the range, which can take a lot of 336memory for large ranges. However, the problem does not occur when 337using C<..> within a C<for> loop, because in that case the range 338operator is optimized to I<iterate> over the range, without creating 339the entire list. So 340 341 my @results = (); 342 for my $i (5 .. 500_005) { 343 push(@results, some_func($i)); 344 } 345 346or even 347 348 push(@results, some_func($_)) for 5 .. 500_005; 349 350will not create an intermediate list of 500,000 integers. 351 352=head2 How can I output Roman numerals? 353 354Get the L<http://www.cpan.org/modules/by-module/Roman> module. 355 356=head2 Why aren't my random numbers random? 357 358If you're using a version of Perl before 5.004, you must call C<srand> 359once at the start of your program to seed the random number generator. 360 361 BEGIN { srand() if $] < 5.004 } 362 3635.004 and later automatically call C<srand> at the beginning. Don't 364call C<srand> more than once--you make your numbers less random, 365rather than more. 366 367Computers are good at being predictable and bad at being random 368(despite appearances caused by bugs in your programs :-). The 369F<random> article in the "Far More Than You Ever Wanted To Know" 370collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy 371of Tom Phoenix, talks more about this. John von Neumann said, "Anyone 372who attempts to generate random numbers by deterministic means is, of 373course, living in a state of sin." 374 375Perl relies on the underlying system for the implementation of 376C<rand> and C<srand>; on some systems, the generated numbers are 377not random enough (especially on Windows : see 378L<http://www.perlmonks.org/?node_id=803632>). 379Several CPAN modules in the C<Math> namespace implement better 380pseudorandom generators; see for example 381L<Math::Random::MT> ("Mersenne Twister", fast), or 382L<Math::TrulyRandom> (uses the imperfections in the system's 383timer to generate random numbers, which is rather slow). 384More algorithms for random numbers are described in 385"Numerical Recipes in C" at L<http://www.nr.com/> 386 387=head2 How do I get a random number between X and Y? 388 389To get a random number between two values, you can use the C<rand()> 390built-in to get a random number between 0 and 1. From there, you shift 391that into the range that you want. 392 393C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus 394what you want to have perl figure out is a random number in the range 395from 0 to the difference between your I<X> and I<Y>. 396 397That is, to get a number between 10 and 15, inclusive, you want a 398random number between 0 and 5 that you can then add to 10. 399 400 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 ) 401 402Hence you derive the following simple function to abstract 403that. It selects a random integer between the two given 404integers (inclusive). For example: C<random_int_between(50,120)>. 405 406 sub random_int_between { 407 my($min, $max) = @_; 408 # Assumes that the two arguments are integers themselves! 409 return $min if $min == $max; 410 ($min, $max) = ($max, $min) if $min > $max; 411 return $min + int rand(1 + $max - $min); 412 } 413 414=head1 Data: Dates 415 416=head2 How do I find the day or week of the year? 417 418The day of the year is in the list returned 419by the C<localtime> function. Without an 420argument C<localtime> uses the current time. 421 422 my $day_of_year = (localtime)[7]; 423 424The L<POSIX> module can also format a date as the day of the year or 425week of the year. 426 427 use POSIX qw/strftime/; 428 my $day_of_year = strftime "%j", localtime; 429 my $week_of_year = strftime "%W", localtime; 430 431To get the day of year for any date, use L<POSIX>'s C<mktime> to get 432a time in epoch seconds for the argument to C<localtime>. 433 434 use POSIX qw/mktime strftime/; 435 my $week_of_year = strftime "%W", 436 localtime( mktime( 0, 0, 0, 18, 11, 87 ) ); 437 438You can also use L<Time::Piece>, which comes with Perl and provides a 439C<localtime> that returns an object: 440 441 use Time::Piece; 442 my $day_of_year = localtime->yday; 443 my $week_of_year = localtime->week; 444 445The L<Date::Calc> module provides two functions to calculate these, too: 446 447 use Date::Calc; 448 my $day_of_year = Day_of_Year( 1987, 12, 18 ); 449 my $week_of_year = Week_of_Year( 1987, 12, 18 ); 450 451=head2 How do I find the current century or millennium? 452 453Use the following simple functions: 454 455 sub get_century { 456 return int((((localtime(shift || time))[5] + 1999))/100); 457 } 458 459 sub get_millennium { 460 return 1+int((((localtime(shift || time))[5] + 1899))/1000); 461 } 462 463On some systems, the L<POSIX> module's C<strftime()> function has been 464extended in a non-standard way to use a C<%C> format, which they 465sometimes claim is the "century". It isn't, because on most such 466systems, this is only the first two digits of the four-digit year, and 467thus cannot be used to determine reliably the current century or 468millennium. 469 470=head2 How can I compare two dates and find the difference? 471 472(contributed by brian d foy) 473 474You could just store all your dates as a number and then subtract. 475Life isn't always that simple though. 476 477The L<Time::Piece> module, which comes with Perl, replaces L<localtime> 478with a version that returns an object. It also overloads the comparison 479operators so you can compare them directly: 480 481 use Time::Piece; 482 my $date1 = localtime( $some_time ); 483 my $date2 = localtime( $some_other_time ); 484 485 if( $date1 < $date2 ) { 486 print "The date was in the past\n"; 487 } 488 489You can also get differences with a subtraction, which returns a 490L<Time::Seconds> object: 491 492 my $date_diff = $date1 - $date2; 493 print "The difference is ", $date_diff->days, " days\n"; 494 495If you want to work with formatted dates, the L<Date::Manip>, 496L<Date::Calc>, or L<DateTime> modules can help you. 497 498=head2 How can I take a string and turn it into epoch seconds? 499 500If it's a regular enough string that it always has the same format, 501you can split it up and pass the parts to C<timelocal> in the standard 502L<Time::Local> module. Otherwise, you should look into the L<Date::Calc>, 503L<Date::Parse>, and L<Date::Manip> modules from CPAN. 504 505=head2 How can I find the Julian Day? 506 507(contributed by brian d foy and Dave Cross) 508 509You can use the L<Time::Piece> module, part of the Standard Library, 510which can convert a date/time to a Julian Day: 511 512 $ perl -MTime::Piece -le 'print localtime->julian_day' 513 2455607.7959375 514 515Or the modified Julian Day: 516 517 $ perl -MTime::Piece -le 'print localtime->mjd' 518 55607.2961226851 519 520Or even the day of the year (which is what some people think of as a 521Julian day): 522 523 $ perl -MTime::Piece -le 'print localtime->yday' 524 45 525 526You can also do the same things with the L<DateTime> module: 527 528 $ perl -MDateTime -le'print DateTime->today->jd' 529 2453401.5 530 $ perl -MDateTime -le'print DateTime->today->mjd' 531 53401 532 $ perl -MDateTime -le'print DateTime->today->doy' 533 31 534 535You can use the L<Time::JulianDay> module available on CPAN. Ensure 536that you really want to find a Julian day, though, as many people have 537different ideas about Julian days (see L<http://www.hermetic.ch/cal_stud/jdn.htm> 538for instance): 539 540 $ perl -MTime::JulianDay -le 'print local_julian_day( time )' 541 55608 542 543=head2 How do I find yesterday's date? 544X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local> 545X<daylight saving time> X<day> X<Today_and_Now> X<localtime> 546X<timelocal> 547 548(contributed by brian d foy) 549 550To do it correctly, you can use one of the C<Date> modules since they 551work with calendars instead of times. The L<DateTime> module makes it 552simple, and give you the same time of day, only the day before, 553despite daylight saving time changes: 554 555 use DateTime; 556 557 my $yesterday = DateTime->now->subtract( days => 1 ); 558 559 print "Yesterday was $yesterday\n"; 560 561You can also use the L<Date::Calc> module using its C<Today_and_Now> 562function. 563 564 use Date::Calc qw( Today_and_Now Add_Delta_DHMS ); 565 566 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 ); 567 568 print "@date_time\n"; 569 570Most people try to use the time rather than the calendar to figure out 571dates, but that assumes that days are twenty-four hours each. For 572most people, there are two days a year when they aren't: the switch to 573and from summer time throws this off. For example, the rest of the 574suggestions will be wrong sometimes: 575 576Starting with Perl 5.10, L<Time::Piece> and L<Time::Seconds> are part 577of the standard distribution, so you might think that you could do 578something like this: 579 580 use Time::Piece; 581 use Time::Seconds; 582 583 my $yesterday = localtime() - ONE_DAY; # WRONG 584 print "Yesterday was $yesterday\n"; 585 586The L<Time::Piece> module exports a new C<localtime> that returns an 587object, and L<Time::Seconds> exports the C<ONE_DAY> constant that is a 588set number of seconds. This means that it always gives the time 24 589hours ago, which is not always yesterday. This can cause problems 590around the end of daylight saving time when there's one day that is 25 591hours long. 592 593You have the same problem with L<Time::Local>, which will give the wrong 594answer for those same special cases: 595 596 # contributed by Gunnar Hjalmarsson 597 use Time::Local; 598 my $today = timelocal 0, 0, 12, ( localtime )[3..5]; 599 my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG 600 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d; 601 602=head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant? 603 604(contributed by brian d foy) 605 606Perl itself never had a Y2K problem, although that never stopped people 607from creating Y2K problems on their own. See the documentation for 608C<localtime> for its proper use. 609 610Starting with Perl 5.12, C<localtime> and C<gmtime> can handle dates past 61103:14:08 January 19, 2038, when a 32-bit based time would overflow. You 612still might get a warning on a 32-bit C<perl>: 613 614 % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )' 615 Integer overflow in hexadecimal number at -e line 1. 616 Wed Nov 1 19:42:39 5576711 617 618On a 64-bit C<perl>, you can get even larger dates for those really long 619running projects: 620 621 % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )' 622 Thu Nov 2 00:42:39 5576711 623 624You're still out of luck if you need to keep track of decaying protons 625though. 626 627=head1 Data: Strings 628 629=head2 How do I validate input? 630 631(contributed by brian d foy) 632 633There are many ways to ensure that values are what you expect or 634want to accept. Besides the specific examples that we cover in the 635perlfaq, you can also look at the modules with "Assert" and "Validate" 636in their names, along with other modules such as L<Regexp::Common>. 637 638Some modules have validation for particular types of input, such 639as L<Business::ISBN>, L<Business::CreditCard>, L<Email::Valid>, 640and L<Data::Validate::IP>. 641 642=head2 How do I unescape a string? 643 644It depends just what you mean by "escape". URL escapes are dealt 645with in L<perlfaq9>. Shell escapes with the backslash (C<\>) 646character are removed with 647 648 s/\\(.)/$1/g; 649 650This won't expand C<"\n"> or C<"\t"> or any other special escapes. 651 652=head2 How do I remove consecutive pairs of characters? 653 654(contributed by brian d foy) 655 656You can use the substitution operator to find pairs of characters (or 657runs of characters) and replace them with a single instance. In this 658substitution, we find a character in C<(.)>. The memory parentheses 659store the matched character in the back-reference C<\g1> and we use 660that to require that the same thing immediately follow it. We replace 661that part of the string with the character in C<$1>. 662 663 s/(.)\g1/$1/g; 664 665We can also use the transliteration operator, C<tr///>. In this 666example, the search list side of our C<tr///> contains nothing, but 667the C<c> option complements that so it contains everything. The 668replacement list also contains nothing, so the transliteration is 669almost a no-op since it won't do any replacements (or more exactly, 670replace the character with itself). However, the C<s> option squashes 671duplicated and consecutive characters in the string so a character 672does not show up next to itself 673 674 my $str = 'Haarlem'; # in the Netherlands 675 $str =~ tr///cs; # Now Harlem, like in New York 676 677=head2 How do I expand function calls in a string? 678 679(contributed by brian d foy) 680 681This is documented in L<perlref>, and although it's not the easiest 682thing to read, it does work. In each of these examples, we call the 683function inside the braces used to dereference a reference. If we 684have more than one return value, we can construct and dereference an 685anonymous array. In this case, we call the function in list context. 686 687 print "The time values are @{ [localtime] }.\n"; 688 689If we want to call the function in scalar context, we have to do a bit 690more work. We can really have any code we like inside the braces, so 691we simply have to end with the scalar reference, although how you do 692that is up to you, and you can use code inside the braces. Note that 693the use of parens creates a list context, so we need C<scalar> to 694force the scalar context on the function: 695 696 print "The time is ${\(scalar localtime)}.\n" 697 698 print "The time is ${ my $x = localtime; \$x }.\n"; 699 700If your function already returns a reference, you don't need to create 701the reference yourself. 702 703 sub timestamp { my $t = localtime; \$t } 704 705 print "The time is ${ timestamp() }.\n"; 706 707The C<Interpolation> module can also do a lot of magic for you. You can 708specify a variable name, in this case C<E>, to set up a tied hash that 709does the interpolation for you. It has several other methods to do this 710as well. 711 712 use Interpolation E => 'eval'; 713 print "The time values are $E{localtime()}.\n"; 714 715In most cases, it is probably easier to simply use string concatenation, 716which also forces scalar context. 717 718 print "The time is " . localtime() . ".\n"; 719 720=head2 How do I find matching/nesting anything? 721 722To find something between two single 723characters, a pattern like C</x([^x]*)x/> will get the intervening 724bits in $1. For multiple ones, then something more like 725C</alpha(.*?)omega/> would be needed. For nested patterns 726and/or balanced expressions, see the so-called 727L<< (?PARNO)|perlre/C<(?PARNO)> C<(?-PARNO)> C<(?+PARNO)> C<(?R)> C<(?0)> >> 728construct (available since perl 5.10). 729The CPAN module L<Regexp::Common> can help to build such 730regular expressions (see in particular 731L<Regexp::Common::balanced> and L<Regexp::Common::delimited>). 732 733More complex cases will require to write a parser, probably 734using a parsing module from CPAN, like 735L<Regexp::Grammars>, L<Parse::RecDescent>, L<Parse::Yapp>, 736L<Text::Balanced>, or L<Marpa::R2>. 737 738=head2 How do I reverse a string? 739 740Use C<reverse()> in scalar context, as documented in 741L<perlfunc/reverse>. 742 743 my $reversed = reverse $string; 744 745=head2 How do I expand tabs in a string? 746 747You can do it yourself: 748 749 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; 750 751Or you can just use the L<Text::Tabs> module (part of the standard Perl 752distribution). 753 754 use Text::Tabs; 755 my @expanded_lines = expand(@lines_with_tabs); 756 757=head2 How do I reformat a paragraph? 758 759Use L<Text::Wrap> (part of the standard Perl distribution): 760 761 use Text::Wrap; 762 print wrap("\t", ' ', @paragraphs); 763 764The paragraphs you give to L<Text::Wrap> should not contain embedded 765newlines. L<Text::Wrap> doesn't justify the lines (flush-right). 766 767Or use the CPAN module L<Text::Autoformat>. Formatting files can be 768easily done by making a shell alias, like so: 769 770 alias fmt="perl -i -MText::Autoformat -n0777 \ 771 -e 'print autoformat $_, {all=>1}' $*" 772 773See the documentation for L<Text::Autoformat> to appreciate its many 774capabilities. 775 776=head2 How can I access or change N characters of a string? 777 778You can access the first characters of a string with substr(). 779To get the first character, for example, start at position 0 780and grab the string of length 1. 781 782 783 my $string = "Just another Perl Hacker"; 784 my $first_char = substr( $string, 0, 1 ); # 'J' 785 786To change part of a string, you can use the optional fourth 787argument which is the replacement string. 788 789 substr( $string, 13, 4, "Perl 5.8.0" ); 790 791You can also use substr() as an lvalue. 792 793 substr( $string, 13, 4 ) = "Perl 5.8.0"; 794 795=head2 How do I change the Nth occurrence of something? 796 797You have to keep track of N yourself. For example, let's say you want 798to change the fifth occurrence of C<"whoever"> or C<"whomever"> into 799C<"whosoever"> or C<"whomsoever">, case insensitively. These 800all assume that $_ contains the string to be altered. 801 802 $count = 0; 803 s{((whom?)ever)}{ 804 ++$count == 5 # is it the 5th? 805 ? "${2}soever" # yes, swap 806 : $1 # renege and leave it there 807 }ige; 808 809In the more general case, you can use the C</g> modifier in a C<while> 810loop, keeping count of matches. 811 812 $WANT = 3; 813 $count = 0; 814 $_ = "One fish two fish red fish blue fish"; 815 while (/(\w+)\s+fish\b/gi) { 816 if (++$count == $WANT) { 817 print "The third fish is a $1 one.\n"; 818 } 819 } 820 821That prints out: C<"The third fish is a red one."> You can also use a 822repetition count and repeated pattern like this: 823 824 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i; 825 826=head2 How can I count the number of occurrences of a substring within a string? 827 828There are a number of ways, with varying efficiency. If you want a 829count of a certain single character (X) within a string, you can use the 830C<tr///> function like so: 831 832 my $string = "ThisXlineXhasXsomeXx'sXinXit"; 833 my $count = ($string =~ tr/X//); 834 print "There are $count X characters in the string"; 835 836This is fine if you are just looking for a single character. However, 837if you are trying to count multiple character substrings within a 838larger string, C<tr///> won't work. What you can do is wrap a while() 839loop around a global pattern match. For example, let's count negative 840integers: 841 842 my $string = "-9 55 48 -2 23 -76 4 14 -44"; 843 my $count = 0; 844 while ($string =~ /-\d+/g) { $count++ } 845 print "There are $count negative numbers in the string"; 846 847Another version uses a global match in list context, then assigns the 848result to a scalar, producing a count of the number of matches. 849 850 my $count = () = $string =~ /-\d+/g; 851 852=head2 How do I capitalize all the words on one line? 853X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence> 854 855(contributed by brian d foy) 856 857Damian Conway's L<Text::Autoformat> handles all of the thinking 858for you. 859 860 use Text::Autoformat; 861 my $x = "Dr. Strangelove or: How I Learned to Stop ". 862 "Worrying and Love the Bomb"; 863 864 print $x, "\n"; 865 for my $style (qw( sentence title highlight )) { 866 print autoformat($x, { case => $style }), "\n"; 867 } 868 869How do you want to capitalize those words? 870 871 FRED AND BARNEY'S LODGE # all uppercase 872 Fred And Barney's Lodge # title case 873 Fred and Barney's Lodge # highlight case 874 875It's not as easy a problem as it looks. How many words do you think 876are in there? Wait for it... wait for it.... If you answered 5 877you're right. Perl words are groups of C<\w+>, but that's not what 878you want to capitalize. How is Perl supposed to know not to capitalize 879that C<s> after the apostrophe? You could try a regular expression: 880 881 $string =~ s/ ( 882 (^\w) #at the beginning of the line 883 | # or 884 (\s\w) #preceded by whitespace 885 ) 886 /\U$1/xg; 887 888 $string =~ s/([\w']+)/\u\L$1/g; 889 890Now, what if you don't want to capitalize that "and"? Just use 891L<Text::Autoformat> and get on with the next problem. :) 892 893=head2 How can I split a [character]-delimited string except when inside [character]? 894 895Several modules can handle this sort of parsing--L<Text::Balanced>, 896L<Text::CSV>, L<Text::CSV_XS>, and L<Text::ParseWords>, among others. 897 898Take the example case of trying to split a string that is 899comma-separated into its different fields. You can't use C<split(/,/)> 900because you shouldn't split if the comma is inside quotes. For 901example, take a data line like this: 902 903 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" 904 905Due to the restriction of the quotes, this is a fairly complex 906problem. Thankfully, we have Jeffrey Friedl, author of 907I<Mastering Regular Expressions>, to handle these for us. He 908suggests (assuming your string is contained in C<$text>): 909 910 my @new = (); 911 push(@new, $+) while $text =~ m{ 912 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes 913 | ([^,]+),? 914 | , 915 }gx; 916 push(@new, undef) if substr($text,-1,1) eq ','; 917 918If you want to represent quotation marks inside a 919quotation-mark-delimited field, escape them with backslashes (eg, 920C<"like \"this\"">. 921 922Alternatively, the L<Text::ParseWords> module (part of the standard 923Perl distribution) lets you say: 924 925 use Text::ParseWords; 926 @new = quotewords(",", 0, $text); 927 928For parsing or generating CSV, though, using L<Text::CSV> rather than 929implementing it yourself is highly recommended; you'll save yourself odd bugs 930popping up later by just using code which has already been tried and tested in 931production for years. 932 933=head2 How do I strip blank space from the beginning/end of a string? 934 935(contributed by brian d foy) 936 937A substitution can do this for you. For a single line, you want to 938replace all the leading or trailing whitespace with nothing. You 939can do that with a pair of substitutions: 940 941 s/^\s+//; 942 s/\s+$//; 943 944You can also write that as a single substitution, although it turns 945out the combined statement is slower than the separate ones. That 946might not matter to you, though: 947 948 s/^\s+|\s+$//g; 949 950In this regular expression, the alternation matches either at the 951beginning or the end of the string since the anchors have a lower 952precedence than the alternation. With the C</g> flag, the substitution 953makes all possible matches, so it gets both. Remember, the trailing 954newline matches the C<\s+>, and the C<$> anchor can match to the 955absolute end of the string, so the newline disappears too. Just add 956the newline to the output, which has the added benefit of preserving 957"blank" (consisting entirely of whitespace) lines which the C<^\s+> 958would remove all by itself: 959 960 while( <> ) { 961 s/^\s+|\s+$//g; 962 print "$_\n"; 963 } 964 965For a multi-line string, you can apply the regular expression to each 966logical line in the string by adding the C</m> flag (for 967"multi-line"). With the C</m> flag, the C<$> matches I<before> an 968embedded newline, so it doesn't remove it. This pattern still removes 969the newline at the end of the string: 970 971 $string =~ s/^\s+|\s+$//gm; 972 973Remember that lines consisting entirely of whitespace will disappear, 974since the first part of the alternation can match the entire string 975and replace it with nothing. If you need to keep embedded blank lines, 976you have to do a little more work. Instead of matching any whitespace 977(since that includes a newline), just match the other whitespace: 978 979 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg; 980 981=head2 How do I pad a string with blanks or pad a number with zeroes? 982 983In the following examples, C<$pad_len> is the length to which you wish 984to pad the string, C<$text> or C<$num> contains the string to be padded, 985and C<$pad_char> contains the padding character. You can use a single 986character string constant instead of the C<$pad_char> variable if you 987know what it is in advance. And in the same way you can use an integer in 988place of C<$pad_len> if you know the pad length in advance. 989 990The simplest method uses the C<sprintf> function. It can pad on the left 991or right with blanks and on the left with zeroes and it will not 992truncate the result. The C<pack> function can only pad strings on the 993right with blanks and it will truncate the result to a maximum length of 994C<$pad_len>. 995 996 # Left padding a string with blanks (no truncation): 997 my $padded = sprintf("%${pad_len}s", $text); 998 my $padded = sprintf("%*s", $pad_len, $text); # same thing 999 1000 # Right padding a string with blanks (no truncation): 1001 my $padded = sprintf("%-${pad_len}s", $text); 1002 my $padded = sprintf("%-*s", $pad_len, $text); # same thing 1003 1004 # Left padding a number with 0 (no truncation): 1005 my $padded = sprintf("%0${pad_len}d", $num); 1006 my $padded = sprintf("%0*d", $pad_len, $num); # same thing 1007 1008 # Right padding a string with blanks using pack (will truncate): 1009 my $padded = pack("A$pad_len",$text); 1010 1011If you need to pad with a character other than blank or zero you can use 1012one of the following methods. They all generate a pad string with the 1013C<x> operator and combine that with C<$text>. These methods do 1014not truncate C<$text>. 1015 1016Left and right padding with any character, creating a new string: 1017 1018 my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text; 1019 my $padded = $text . $pad_char x ( $pad_len - length( $text ) ); 1020 1021Left and right padding with any character, modifying C<$text> directly: 1022 1023 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ); 1024 $text .= $pad_char x ( $pad_len - length( $text ) ); 1025 1026=head2 How do I extract selected columns from a string? 1027 1028(contributed by brian d foy) 1029 1030If you know the columns that contain the data, you can 1031use C<substr> to extract a single column. 1032 1033 my $column = substr( $line, $start_column, $length ); 1034 1035You can use C<split> if the columns are separated by whitespace or 1036some other delimiter, as long as whitespace or the delimiter cannot 1037appear as part of the data. 1038 1039 my $line = ' fred barney betty '; 1040 my @columns = split /\s+/, $line; 1041 # ( '', 'fred', 'barney', 'betty' ); 1042 1043 my $line = 'fred||barney||betty'; 1044 my @columns = split /\|/, $line; 1045 # ( 'fred', '', 'barney', '', 'betty' ); 1046 1047If you want to work with comma-separated values, don't do this since 1048that format is a bit more complicated. Use one of the modules that 1049handle that format, such as L<Text::CSV>, L<Text::CSV_XS>, or 1050L<Text::CSV_PP>. 1051 1052If you want to break apart an entire line of fixed columns, you can use 1053C<unpack> with the A (ASCII) format. By using a number after the format 1054specifier, you can denote the column width. See the C<pack> and C<unpack> 1055entries in L<perlfunc> for more details. 1056 1057 my @fields = unpack( $line, "A8 A8 A8 A16 A4" ); 1058 1059Note that spaces in the format argument to C<unpack> do not denote literal 1060spaces. If you have space separated data, you may want C<split> instead. 1061 1062=head2 How do I find the soundex value of a string? 1063 1064(contributed by brian d foy) 1065 1066You can use the C<Text::Soundex> module. If you want to do fuzzy or close 1067matching, you might also try the L<String::Approx>, and 1068L<Text::Metaphone>, and L<Text::DoubleMetaphone> modules. 1069 1070=head2 How can I expand variables in text strings? 1071 1072(contributed by brian d foy) 1073 1074If you can avoid it, don't, or if you can use a templating system, 1075such as L<Text::Template> or L<Template> Toolkit, do that instead. You 1076might even be able to get the job done with C<sprintf> or C<printf>: 1077 1078 my $string = sprintf 'Say hello to %s and %s', $foo, $bar; 1079 1080However, for the one-off simple case where I don't want to pull out a 1081full templating system, I'll use a string that has two Perl scalar 1082variables in it. In this example, I want to expand C<$foo> and C<$bar> 1083to their variable's values: 1084 1085 my $foo = 'Fred'; 1086 my $bar = 'Barney'; 1087 $string = 'Say hello to $foo and $bar'; 1088 1089One way I can do this involves the substitution operator and a double 1090C</e> flag. The first C</e> evaluates C<$1> on the replacement side and 1091turns it into C<$foo>. The second /e starts with C<$foo> and replaces 1092it with its value. C<$foo>, then, turns into 'Fred', and that's finally 1093what's left in the string: 1094 1095 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney' 1096 1097The C</e> will also silently ignore violations of strict, replacing 1098undefined variable names with the empty string. Since I'm using the 1099C</e> flag (twice even!), I have all of the same security problems I 1100have with C<eval> in its string form. If there's something odd in 1101C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then 1102I could get myself in trouble. 1103 1104To get around the security problem, I could also pull the values from 1105a hash instead of evaluating variable names. Using a single C</e>, I 1106can check the hash to ensure the value exists, and if it doesn't, I 1107can replace the missing value with a marker, in this case C<???> to 1108signal that I missed something: 1109 1110 my $string = 'This has $foo and $bar'; 1111 1112 my %Replacements = ( 1113 foo => 'Fred', 1114 ); 1115 1116 # $string =~ s/\$(\w+)/$Replacements{$1}/g; 1117 $string =~ s/\$(\w+)/ 1118 exists $Replacements{$1} ? $Replacements{$1} : '???' 1119 /eg; 1120 1121 print $string; 1122 1123=head2 What's wrong with always quoting "$vars"? 1124 1125The problem is that those double-quotes force 1126stringification--coercing numbers and references into strings--even 1127when you don't want them to be strings. Think of it this way: 1128double-quote expansion is used to produce new strings. If you already 1129have a string, why do you need more? 1130 1131If you get used to writing odd things like these: 1132 1133 print "$var"; # BAD 1134 my $new = "$old"; # BAD 1135 somefunc("$var"); # BAD 1136 1137You'll be in trouble. Those should (in 99.8% of the cases) be 1138the simpler and more direct: 1139 1140 print $var; 1141 my $new = $old; 1142 somefunc($var); 1143 1144Otherwise, besides slowing you down, you're going to break code when 1145the thing in the scalar is actually neither a string nor a number, but 1146a reference: 1147 1148 func(\@array); 1149 sub func { 1150 my $aref = shift; 1151 my $oref = "$aref"; # WRONG 1152 } 1153 1154You can also get into subtle problems on those few operations in Perl 1155that actually do care about the difference between a string and a 1156number, such as the magical C<++> autoincrement operator or the 1157syscall() function. 1158 1159Stringification also destroys arrays. 1160 1161 my @lines = `command`; 1162 print "@lines"; # WRONG - extra blanks 1163 print @lines; # right 1164 1165=head2 Why don't my E<lt>E<lt>HERE documents work? 1166 1167Here documents are found in L<perlop>. Check for these three things: 1168 1169=over 4 1170 1171=item There must be no space after the E<lt>E<lt> part. 1172 1173=item There (probably) should be a semicolon at the end of the opening token 1174 1175=item You can't (easily) have any space in front of the tag. 1176 1177=item There needs to be at least a line separator after the end token. 1178 1179=back 1180 1181If you want to indent the text in the here document, you 1182can do this: 1183 1184 # all in one 1185 (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm; 1186 your text 1187 goes here 1188 HERE_TARGET 1189 1190But the HERE_TARGET must still be flush against the margin. 1191If you want that indented also, you'll have to quote 1192in the indentation. 1193 1194 (my $quote = <<' FINIS') =~ s/^\s+//gm; 1195 ...we will have peace, when you and all your works have 1196 perished--and the works of your dark master to whom you 1197 would deliver us. You are a liar, Saruman, and a corrupter 1198 of men's hearts. --Theoden in /usr/src/perl/taint.c 1199 FINIS 1200 $quote =~ s/\s+--/\n--/; 1201 1202A nice general-purpose fixer-upper function for indented here documents 1203follows. It expects to be called with a here document as its argument. 1204It looks to see whether each line begins with a common substring, and 1205if so, strips that substring off. Otherwise, it takes the amount of leading 1206whitespace found on the first line and removes that much off each 1207subsequent line. 1208 1209 sub fix { 1210 local $_ = shift; 1211 my ($white, $leader); # common whitespace and common leading string 1212 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) { 1213 ($white, $leader) = ($2, quotemeta($1)); 1214 } else { 1215 ($white, $leader) = (/^(\s+)/, ''); 1216 } 1217 s/^\s*?$leader(?:$white)?//gm; 1218 return $_; 1219 } 1220 1221This works with leading special strings, dynamically determined: 1222 1223 my $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP'; 1224 @@@ int 1225 @@@ runops() { 1226 @@@ SAVEI32(runlevel); 1227 @@@ runlevel++; 1228 @@@ while ( op = (*op->op_ppaddr)() ); 1229 @@@ TAINT_NOT; 1230 @@@ return 0; 1231 @@@ } 1232 MAIN_INTERPRETER_LOOP 1233 1234Or with a fixed amount of leading whitespace, with remaining 1235indentation correctly preserved: 1236 1237 my $poem = fix<<EVER_ON_AND_ON; 1238 Now far ahead the Road has gone, 1239 And I must follow, if I can, 1240 Pursuing it with eager feet, 1241 Until it joins some larger way 1242 Where many paths and errands meet. 1243 And whither then? I cannot say. 1244 --Bilbo in /usr/src/perl/pp_ctl.c 1245 EVER_ON_AND_ON 1246 1247Beginning with Perl version 5.26, a much simpler and cleaner way to 1248write indented here documents has been added to the language: the 1249tilde (~) modifier. See L<perlop/"Indented Here-docs"> for details. 1250 1251=head1 Data: Arrays 1252 1253=head2 What is the difference between a list and an array? 1254 1255(contributed by brian d foy) 1256 1257A list is a fixed collection of scalars. An array is a variable that 1258holds a variable collection of scalars. An array can supply its collection 1259for list operations, so list operations also work on arrays: 1260 1261 # slices 1262 ( 'dog', 'cat', 'bird' )[2,3]; 1263 @animals[2,3]; 1264 1265 # iteration 1266 foreach ( qw( dog cat bird ) ) { ... } 1267 foreach ( @animals ) { ... } 1268 1269 my @three = grep { length == 3 } qw( dog cat bird ); 1270 my @three = grep { length == 3 } @animals; 1271 1272 # supply an argument list 1273 wash_animals( qw( dog cat bird ) ); 1274 wash_animals( @animals ); 1275 1276Array operations, which change the scalars, rearrange them, or add 1277or subtract some scalars, only work on arrays. These can't work on a 1278list, which is fixed. Array operations include C<shift>, C<unshift>, 1279C<push>, C<pop>, and C<splice>. 1280 1281An array can also change its length: 1282 1283 $#animals = 1; # truncate to two elements 1284 $#animals = 10000; # pre-extend to 10,001 elements 1285 1286You can change an array element, but you can't change a list element: 1287 1288 $animals[0] = 'Rottweiler'; 1289 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error! 1290 1291 foreach ( @animals ) { 1292 s/^d/fr/; # works fine 1293 } 1294 1295 foreach ( qw( dog cat bird ) ) { 1296 s/^d/fr/; # Error! Modification of read only value! 1297 } 1298 1299However, if the list element is itself a variable, it appears that you 1300can change a list element. However, the list element is the variable, not 1301the data. You're not changing the list element, but something the list 1302element refers to. The list element itself doesn't change: it's still 1303the same variable. 1304 1305You also have to be careful about context. You can assign an array to 1306a scalar to get the number of elements in the array. This only works 1307for arrays, though: 1308 1309 my $count = @animals; # only works with arrays 1310 1311If you try to do the same thing with what you think is a list, you 1312get a quite different result. Although it looks like you have a list 1313on the righthand side, Perl actually sees a bunch of scalars separated 1314by a comma: 1315 1316 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird 1317 1318Since you're assigning to a scalar, the righthand side is in scalar 1319context. The comma operator (yes, it's an operator!) in scalar 1320context evaluates its lefthand side, throws away the result, and 1321evaluates it's righthand side and returns the result. In effect, 1322that list-lookalike assigns to C<$scalar> it's rightmost value. Many 1323people mess this up because they choose a list-lookalike whose 1324last element is also the count they expect: 1325 1326 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally 1327 1328=head2 What is the difference between $array[1] and @array[1]? 1329 1330(contributed by brian d foy) 1331 1332The difference is the sigil, that special character in front of the 1333array name. The C<$> sigil means "exactly one item", while the C<@> 1334sigil means "zero or more items". The C<$> gets you a single scalar, 1335while the C<@> gets you a list. 1336 1337The confusion arises because people incorrectly assume that the sigil 1338denotes the variable type. 1339 1340The C<$array[1]> is a single-element access to the array. It's going 1341to return the item in index 1 (or undef if there is no item there). 1342If you intend to get exactly one element from the array, this is the 1343form you should use. 1344 1345The C<@array[1]> is an array slice, although it has only one index. 1346You can pull out multiple elements simultaneously by specifying 1347additional indices as a list, like C<@array[1,4,3,0]>. 1348 1349Using a slice on the lefthand side of the assignment supplies list 1350context to the righthand side. This can lead to unexpected results. 1351For instance, if you want to read a single line from a filehandle, 1352assigning to a scalar value is fine: 1353 1354 $array[1] = <STDIN>; 1355 1356However, in list context, the line input operator returns all of the 1357lines as a list. The first line goes into C<@array[1]> and the rest 1358of the lines mysteriously disappear: 1359 1360 @array[1] = <STDIN>; # most likely not what you want 1361 1362Either the C<use warnings> pragma or the B<-w> flag will warn you when 1363you use an array slice with a single index. 1364 1365=head2 How can I remove duplicate elements from a list or array? 1366 1367(contributed by brian d foy) 1368 1369Use a hash. When you think the words "unique" or "duplicated", think 1370"hash keys". 1371 1372If you don't care about the order of the elements, you could just 1373create the hash then extract the keys. It's not important how you 1374create that hash: just that you use C<keys> to get the unique 1375elements. 1376 1377 my %hash = map { $_, 1 } @array; 1378 # or a hash slice: @hash{ @array } = (); 1379 # or a foreach: $hash{$_} = 1 foreach ( @array ); 1380 1381 my @unique = keys %hash; 1382 1383If you want to use a module, try the C<uniq> function from 1384L<List::MoreUtils>. In list context it returns the unique elements, 1385preserving their order in the list. In scalar context, it returns the 1386number of unique elements. 1387 1388 use List::MoreUtils qw(uniq); 1389 1390 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7 1391 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7 1392 1393You can also go through each element and skip the ones you've seen 1394before. Use a hash to keep track. The first time the loop sees an 1395element, that element has no key in C<%Seen>. The C<next> statement 1396creates the key and immediately uses its value, which is C<undef>, so 1397the loop continues to the C<push> and increments the value for that 1398key. The next time the loop sees that same element, its key exists in 1399the hash I<and> the value for that key is true (since it's not 0 or 1400C<undef>), so the next skips that iteration and the loop goes to the 1401next element. 1402 1403 my @unique = (); 1404 my %seen = (); 1405 1406 foreach my $elem ( @array ) { 1407 next if $seen{ $elem }++; 1408 push @unique, $elem; 1409 } 1410 1411You can write this more briefly using a grep, which does the 1412same thing. 1413 1414 my %seen = (); 1415 my @unique = grep { ! $seen{ $_ }++ } @array; 1416 1417=head2 How can I tell whether a certain element is contained in a list or array? 1418 1419(portions of this answer contributed by Anno Siegel and brian d foy) 1420 1421Hearing the word "in" is an I<in>dication that you probably should have 1422used a hash, not a list or array, to store your data. Hashes are 1423designed to answer this question quickly and efficiently. Arrays aren't. 1424 1425That being said, there are several ways to approach this. In Perl 5.10 1426and later, you can use the smart match operator to check that an item is 1427contained in an array or a hash: 1428 1429 use 5.010; 1430 1431 if( $item ~~ @array ) { 1432 say "The array contains $item" 1433 } 1434 1435 if( $item ~~ %hash ) { 1436 say "The hash contains $item" 1437 } 1438 1439With earlier versions of Perl, you have to do a bit more work. If you 1440are going to make this query many times over arbitrary string values, 1441the fastest way is probably to invert the original array and maintain a 1442hash whose keys are the first array's values: 1443 1444 my @blues = qw/azure cerulean teal turquoise lapis-lazuli/; 1445 my %is_blue = (); 1446 for (@blues) { $is_blue{$_} = 1 } 1447 1448Now you can check whether C<$is_blue{$some_color}>. It might have 1449been a good idea to keep the blues all in a hash in the first place. 1450 1451If the values are all small integers, you could use a simple indexed 1452array. This kind of an array will take up less space: 1453 1454 my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); 1455 my @is_tiny_prime = (); 1456 for (@primes) { $is_tiny_prime[$_] = 1 } 1457 # or simply @istiny_prime[@primes] = (1) x @primes; 1458 1459Now you check whether $is_tiny_prime[$some_number]. 1460 1461If the values in question are integers instead of strings, you can save 1462quite a lot of space by using bit strings instead: 1463 1464 my @articles = ( 1..10, 150..2000, 2017 ); 1465 undef $read; 1466 for (@articles) { vec($read,$_,1) = 1 } 1467 1468Now check whether C<vec($read,$n,1)> is true for some C<$n>. 1469 1470These methods guarantee fast individual tests but require a re-organization 1471of the original list or array. They only pay off if you have to test 1472multiple values against the same array. 1473 1474If you are testing only once, the standard module L<List::Util> exports 1475the function C<first> for this purpose. It works by stopping once it 1476finds the element. It's written in C for speed, and its Perl equivalent 1477looks like this subroutine: 1478 1479 sub first (&@) { 1480 my $code = shift; 1481 foreach (@_) { 1482 return $_ if &{$code}(); 1483 } 1484 undef; 1485 } 1486 1487If speed is of little concern, the common idiom uses grep in scalar context 1488(which returns the number of items that passed its condition) to traverse the 1489entire list. This does have the benefit of telling you how many matches it 1490found, though. 1491 1492 my $is_there = grep $_ eq $whatever, @array; 1493 1494If you want to actually extract the matching elements, simply use grep in 1495list context. 1496 1497 my @matches = grep $_ eq $whatever, @array; 1498 1499=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays? 1500 1501Use a hash. Here's code to do both and more. It assumes that each 1502element is unique in a given array: 1503 1504 my (@union, @intersection, @difference); 1505 my %count = (); 1506 foreach my $element (@array1, @array2) { $count{$element}++ } 1507 foreach my $element (keys %count) { 1508 push @union, $element; 1509 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; 1510 } 1511 1512Note that this is the I<symmetric difference>, that is, all elements 1513in either A or in B but not in both. Think of it as an xor operation. 1514 1515=head2 How do I test whether two arrays or hashes are equal? 1516 1517With Perl 5.10 and later, the smart match operator can give you the answer 1518with the least amount of work: 1519 1520 use 5.010; 1521 1522 if( @array1 ~~ @array2 ) { 1523 say "The arrays are the same"; 1524 } 1525 1526 if( %hash1 ~~ %hash2 ) # doesn't check values! { 1527 say "The hash keys are the same"; 1528 } 1529 1530The following code works for single-level arrays. It uses a 1531stringwise comparison, and does not distinguish defined versus 1532undefined empty strings. Modify if you have other needs. 1533 1534 $are_equal = compare_arrays(\@frogs, \@toads); 1535 1536 sub compare_arrays { 1537 my ($first, $second) = @_; 1538 no warnings; # silence spurious -w undef complaints 1539 return 0 unless @$first == @$second; 1540 for (my $i = 0; $i < @$first; $i++) { 1541 return 0 if $first->[$i] ne $second->[$i]; 1542 } 1543 return 1; 1544 } 1545 1546For multilevel structures, you may wish to use an approach more 1547like this one. It uses the CPAN module L<FreezeThaw>: 1548 1549 use FreezeThaw qw(cmpStr); 1550 my @a = my @b = ( "this", "that", [ "more", "stuff" ] ); 1551 1552 printf "a and b contain %s arrays\n", 1553 cmpStr(\@a, \@b) == 0 1554 ? "the same" 1555 : "different"; 1556 1557This approach also works for comparing hashes. Here we'll demonstrate 1558two different answers: 1559 1560 use FreezeThaw qw(cmpStr cmpStrHard); 1561 1562 my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] ); 1563 $a{EXTRA} = \%b; 1564 $b{EXTRA} = \%a; 1565 1566 printf "a and b contain %s hashes\n", 1567 cmpStr(\%a, \%b) == 0 ? "the same" : "different"; 1568 1569 printf "a and b contain %s hashes\n", 1570 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different"; 1571 1572 1573The first reports that both those the hashes contain the same data, 1574while the second reports that they do not. Which you prefer is left as 1575an exercise to the reader. 1576 1577=head2 How do I find the first array element for which a condition is true? 1578 1579To find the first array element which satisfies a condition, you can 1580use the C<first()> function in the L<List::Util> module, which comes 1581with Perl 5.8. This example finds the first element that contains 1582"Perl". 1583 1584 use List::Util qw(first); 1585 1586 my $element = first { /Perl/ } @array; 1587 1588If you cannot use L<List::Util>, you can make your own loop to do the 1589same thing. Once you find the element, you stop the loop with last. 1590 1591 my $found; 1592 foreach ( @array ) { 1593 if( /Perl/ ) { $found = $_; last } 1594 } 1595 1596If you want the array index, use the C<firstidx()> function from 1597C<List::MoreUtils>: 1598 1599 use List::MoreUtils qw(firstidx); 1600 my $index = firstidx { /Perl/ } @array; 1601 1602Or write it yourself, iterating through the indices 1603and checking the array element at each index until you find one 1604that satisfies the condition: 1605 1606 my( $found, $index ) = ( undef, -1 ); 1607 for( $i = 0; $i < @array; $i++ ) { 1608 if( $array[$i] =~ /Perl/ ) { 1609 $found = $array[$i]; 1610 $index = $i; 1611 last; 1612 } 1613 } 1614 1615=head2 How do I handle linked lists? 1616 1617(contributed by brian d foy) 1618 1619Perl's arrays do not have a fixed size, so you don't need linked lists 1620if you just want to add or remove items. You can use array operations 1621such as C<push>, C<pop>, C<shift>, C<unshift>, or C<splice> to do 1622that. 1623 1624Sometimes, however, linked lists can be useful in situations where you 1625want to "shard" an array so you have many small arrays instead of 1626a single big array. You can keep arrays longer than Perl's largest 1627array index, lock smaller arrays separately in threaded programs, 1628reallocate less memory, or quickly insert elements in the middle of 1629the chain. 1630 1631Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly 1632Linked Lists" ( L<http://www.slideshare.net/lembark/perly-linked-lists> ), 1633although you can just use his L<LinkedList::Single> module. 1634 1635=head2 How do I handle circular lists? 1636X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular> 1637X<cycle> X<modulus> 1638 1639(contributed by brian d foy) 1640 1641If you want to cycle through an array endlessly, you can increment the 1642index modulo the number of elements in the array: 1643 1644 my @array = qw( a b c ); 1645 my $i = 0; 1646 1647 while( 1 ) { 1648 print $array[ $i++ % @array ], "\n"; 1649 last if $i > 20; 1650 } 1651 1652You can also use L<Tie::Cycle> to use a scalar that always has the 1653next element of the circular array: 1654 1655 use Tie::Cycle; 1656 1657 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ]; 1658 1659 print $cycle; # FFFFFF 1660 print $cycle; # 000000 1661 print $cycle; # FFFF00 1662 1663The L<Array::Iterator::Circular> creates an iterator object for 1664circular arrays: 1665 1666 use Array::Iterator::Circular; 1667 1668 my $color_iterator = Array::Iterator::Circular->new( 1669 qw(red green blue orange) 1670 ); 1671 1672 foreach ( 1 .. 20 ) { 1673 print $color_iterator->next, "\n"; 1674 } 1675 1676=head2 How do I shuffle an array randomly? 1677 1678If you either have Perl 5.8.0 or later installed, or if you have 1679Scalar-List-Utils 1.03 or later installed, you can say: 1680 1681 use List::Util 'shuffle'; 1682 1683 @shuffled = shuffle(@list); 1684 1685If not, you can use a Fisher-Yates shuffle. 1686 1687 sub fisher_yates_shuffle { 1688 my $deck = shift; # $deck is a reference to an array 1689 return unless @$deck; # must not be empty! 1690 1691 my $i = @$deck; 1692 while (--$i) { 1693 my $j = int rand ($i+1); 1694 @$deck[$i,$j] = @$deck[$j,$i]; 1695 } 1696 } 1697 1698 # shuffle my mpeg collection 1699 # 1700 my @mpeg = <audio/*/*.mp3>; 1701 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place 1702 print @mpeg; 1703 1704Note that the above implementation shuffles an array in place, 1705unlike the C<List::Util::shuffle()> which takes a list and returns 1706a new shuffled list. 1707 1708You've probably seen shuffling algorithms that work using splice, 1709randomly picking another element to swap the current element with 1710 1711 srand; 1712 @new = (); 1713 @old = 1 .. 10; # just a demo 1714 while (@old) { 1715 push(@new, splice(@old, rand @old, 1)); 1716 } 1717 1718This is bad because splice is already O(N), and since you do it N 1719times, you just invented a quadratic algorithm; that is, O(N**2). 1720This does not scale, although Perl is so efficient that you probably 1721won't notice this until you have rather largish arrays. 1722 1723=head2 How do I process/modify each element of an array? 1724 1725Use C<for>/C<foreach>: 1726 1727 for (@lines) { 1728 s/foo/bar/; # change that word 1729 tr/XZ/ZX/; # swap those letters 1730 } 1731 1732Here's another; let's compute spherical volumes: 1733 1734 my @volumes = @radii; 1735 for (@volumes) { # @volumes has changed parts 1736 $_ **= 3; 1737 $_ *= (4/3) * 3.14159; # this will be constant folded 1738 } 1739 1740which can also be done with C<map()> which is made to transform 1741one list into another: 1742 1743 my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii; 1744 1745If you want to do the same thing to modify the values of the 1746hash, you can use the C<values> function. As of Perl 5.6 1747the values are not copied, so if you modify $orbit (in this 1748case), you modify the value. 1749 1750 for my $orbit ( values %orbits ) { 1751 ($orbit **= 3) *= (4/3) * 3.14159; 1752 } 1753 1754Prior to perl 5.6 C<values> returned copies of the values, 1755so older perl code often contains constructions such as 1756C<@orbits{keys %orbits}> instead of C<values %orbits> where 1757the hash is to be modified. 1758 1759=head2 How do I select a random element from an array? 1760 1761Use the C<rand()> function (see L<perlfunc/rand>): 1762 1763 my $index = rand @array; 1764 my $element = $array[$index]; 1765 1766Or, simply: 1767 1768 my $element = $array[ rand @array ]; 1769 1770=head2 How do I permute N elements of a list? 1771X<List::Permutor> X<permute> X<Algorithm::Loops> X<Knuth> 1772X<The Art of Computer Programming> X<Fischer-Krause> 1773 1774Use the L<List::Permutor> module on CPAN. If the list is actually an 1775array, try the L<Algorithm::Permute> module (also on CPAN). It's 1776written in XS code and is very efficient: 1777 1778 use Algorithm::Permute; 1779 1780 my @array = 'a'..'d'; 1781 my $p_iterator = Algorithm::Permute->new ( \@array ); 1782 1783 while (my @perm = $p_iterator->next) { 1784 print "next permutation: (@perm)\n"; 1785 } 1786 1787For even faster execution, you could do: 1788 1789 use Algorithm::Permute; 1790 1791 my @array = 'a'..'d'; 1792 1793 Algorithm::Permute::permute { 1794 print "next permutation: (@array)\n"; 1795 } @array; 1796 1797Here's a little program that generates all permutations of all the 1798words on each line of input. The algorithm embodied in the 1799C<permute()> function is discussed in Volume 4 (still unpublished) of 1800Knuth's I<The Art of Computer Programming> and will work on any list: 1801 1802 #!/usr/bin/perl -n 1803 # Fischer-Krause ordered permutation generator 1804 1805 sub permute (&@) { 1806 my $code = shift; 1807 my @idx = 0..$#_; 1808 while ( $code->(@_[@idx]) ) { 1809 my $p = $#idx; 1810 --$p while $idx[$p-1] > $idx[$p]; 1811 my $q = $p or return; 1812 push @idx, reverse splice @idx, $p; 1813 ++$q while $idx[$p-1] > $idx[$q]; 1814 @idx[$p-1,$q]=@idx[$q,$p-1]; 1815 } 1816 } 1817 1818 permute { print "@_\n" } split; 1819 1820The L<Algorithm::Loops> module also provides the C<NextPermute> and 1821C<NextPermuteNum> functions which efficiently find all unique permutations 1822of an array, even if it contains duplicate values, modifying it in-place: 1823if its elements are in reverse-sorted order then the array is reversed, 1824making it sorted, and it returns false; otherwise the next 1825permutation is returned. 1826 1827C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so 1828you can enumerate all the permutations of C<0..9> like this: 1829 1830 use Algorithm::Loops qw(NextPermuteNum); 1831 1832 my @list= 0..9; 1833 do { print "@list\n" } while NextPermuteNum @list; 1834 1835=head2 How do I sort an array by (anything)? 1836 1837Supply a comparison function to sort() (described in L<perlfunc/sort>): 1838 1839 @list = sort { $a <=> $b } @list; 1840 1841The default sort function is cmp, string comparison, which would 1842sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is 1843the numerical comparison operator. 1844 1845If you have a complicated function needed to pull out the part you 1846want to sort on, then don't do it inside the sort function. Pull it 1847out first, because the sort BLOCK can be called many times for the 1848same element. Here's an example of how to pull out the first word 1849after the first number on each item, and then sort those words 1850case-insensitively. 1851 1852 my @idx; 1853 for (@data) { 1854 my $item; 1855 ($item) = /\d+\s*(\S+)/; 1856 push @idx, uc($item); 1857 } 1858 my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ]; 1859 1860which could also be written this way, using a trick 1861that's come to be known as the Schwartzian Transform: 1862 1863 my @sorted = map { $_->[0] } 1864 sort { $a->[1] cmp $b->[1] } 1865 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data; 1866 1867If you need to sort on several fields, the following paradigm is useful. 1868 1869 my @sorted = sort { 1870 field1($a) <=> field1($b) || 1871 field2($a) cmp field2($b) || 1872 field3($a) cmp field3($b) 1873 } @data; 1874 1875This can be conveniently combined with precalculation of keys as given 1876above. 1877 1878See the F<sort> article in the "Far More Than You Ever Wanted 1879To Know" collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for 1880more about this approach. 1881 1882See also the question later in L<perlfaq4> on sorting hashes. 1883 1884=head2 How do I manipulate arrays of bits? 1885 1886Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise 1887operations. 1888 1889For example, you don't have to store individual bits in an array 1890(which would mean that you're wasting a lot of space). To convert an 1891array of bits to a string, use C<vec()> to set the right bits. This 1892sets C<$vec> to have bit N set only if C<$ints[N]> was set: 1893 1894 my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... ) 1895 my $vec = ''; 1896 foreach( 0 .. $#ints ) { 1897 vec($vec,$_,1) = 1 if $ints[$_]; 1898 } 1899 1900The string C<$vec> only takes up as many bits as it needs. For 1901instance, if you had 16 entries in C<@ints>, C<$vec> only needs two 1902bytes to store them (not counting the scalar variable overhead). 1903 1904Here's how, given a vector in C<$vec>, you can get those bits into 1905your C<@ints> array: 1906 1907 sub bitvec_to_list { 1908 my $vec = shift; 1909 my @ints; 1910 # Find null-byte density then select best algorithm 1911 if ($vec =~ tr/\0// / length $vec > 0.95) { 1912 use integer; 1913 my $i; 1914 1915 # This method is faster with mostly null-bytes 1916 while($vec =~ /[^\0]/g ) { 1917 $i = -9 + 8 * pos $vec; 1918 push @ints, $i if vec($vec, ++$i, 1); 1919 push @ints, $i if vec($vec, ++$i, 1); 1920 push @ints, $i if vec($vec, ++$i, 1); 1921 push @ints, $i if vec($vec, ++$i, 1); 1922 push @ints, $i if vec($vec, ++$i, 1); 1923 push @ints, $i if vec($vec, ++$i, 1); 1924 push @ints, $i if vec($vec, ++$i, 1); 1925 push @ints, $i if vec($vec, ++$i, 1); 1926 } 1927 } 1928 else { 1929 # This method is a fast general algorithm 1930 use integer; 1931 my $bits = unpack "b*", $vec; 1932 push @ints, 0 if $bits =~ s/^(\d)// && $1; 1933 push @ints, pos $bits while($bits =~ /1/g); 1934 } 1935 1936 return \@ints; 1937 } 1938 1939This method gets faster the more sparse the bit vector is. 1940(Courtesy of Tim Bunce and Winfried Koenig.) 1941 1942You can make the while loop a lot shorter with this suggestion 1943from Benjamin Goldberg: 1944 1945 while($vec =~ /[^\0]+/g ) { 1946 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8; 1947 } 1948 1949Or use the CPAN module L<Bit::Vector>: 1950 1951 my $vector = Bit::Vector->new($num_of_bits); 1952 $vector->Index_List_Store(@ints); 1953 my @ints = $vector->Index_List_Read(); 1954 1955L<Bit::Vector> provides efficient methods for bit vector, sets of 1956small integers and "big int" math. 1957 1958Here's a more extensive illustration using vec(): 1959 1960 # vec demo 1961 my $vector = "\xff\x0f\xef\xfe"; 1962 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ", 1963 unpack("N", $vector), "\n"; 1964 my $is_set = vec($vector, 23, 1); 1965 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n"; 1966 pvec($vector); 1967 1968 set_vec(1,1,1); 1969 set_vec(3,1,1); 1970 set_vec(23,1,1); 1971 1972 set_vec(3,1,3); 1973 set_vec(3,2,3); 1974 set_vec(3,4,3); 1975 set_vec(3,4,7); 1976 set_vec(3,8,3); 1977 set_vec(3,8,7); 1978 1979 set_vec(0,32,17); 1980 set_vec(1,32,17); 1981 1982 sub set_vec { 1983 my ($offset, $width, $value) = @_; 1984 my $vector = ''; 1985 vec($vector, $offset, $width) = $value; 1986 print "offset=$offset width=$width value=$value\n"; 1987 pvec($vector); 1988 } 1989 1990 sub pvec { 1991 my $vector = shift; 1992 my $bits = unpack("b*", $vector); 1993 my $i = 0; 1994 my $BASE = 8; 1995 1996 print "vector length in bytes: ", length($vector), "\n"; 1997 @bytes = unpack("A8" x length($vector), $bits); 1998 print "bits are: @bytes\n\n"; 1999 } 2000 2001=head2 Why does defined() return true on empty arrays and hashes? 2002 2003The short story is that you should probably only use defined on scalars or 2004functions, not on aggregates (arrays and hashes). See L<perlfunc/defined> 2005in the 5.004 release or later of Perl for more detail. 2006 2007=head1 Data: Hashes (Associative Arrays) 2008 2009=head2 How do I process an entire hash? 2010 2011(contributed by brian d foy) 2012 2013There are a couple of ways that you can process an entire hash. You 2014can get a list of keys, then go through each key, or grab a one 2015key-value pair at a time. 2016 2017To go through all of the keys, use the C<keys> function. This extracts 2018all of the keys of the hash and gives them back to you as a list. You 2019can then get the value through the particular key you're processing: 2020 2021 foreach my $key ( keys %hash ) { 2022 my $value = $hash{$key} 2023 ... 2024 } 2025 2026Once you have the list of keys, you can process that list before you 2027process the hash elements. For instance, you can sort the keys so you 2028can process them in lexical order: 2029 2030 foreach my $key ( sort keys %hash ) { 2031 my $value = $hash{$key} 2032 ... 2033 } 2034 2035Or, you might want to only process some of the items. If you only want 2036to deal with the keys that start with C<text:>, you can select just 2037those using C<grep>: 2038 2039 foreach my $key ( grep /^text:/, keys %hash ) { 2040 my $value = $hash{$key} 2041 ... 2042 } 2043 2044If the hash is very large, you might not want to create a long list of 2045keys. To save some memory, you can grab one key-value pair at a time using 2046C<each()>, which returns a pair you haven't seen yet: 2047 2048 while( my( $key, $value ) = each( %hash ) ) { 2049 ... 2050 } 2051 2052The C<each> operator returns the pairs in apparently random order, so if 2053ordering matters to you, you'll have to stick with the C<keys> method. 2054 2055The C<each()> operator can be a bit tricky though. You can't add or 2056delete keys of the hash while you're using it without possibly 2057skipping or re-processing some pairs after Perl internally rehashes 2058all of the elements. Additionally, a hash has only one iterator, so if 2059you mix C<keys>, C<values>, or C<each> on the same hash, you risk resetting 2060the iterator and messing up your processing. See the C<each> entry in 2061L<perlfunc> for more details. 2062 2063=head2 How do I merge two hashes? 2064X<hash> X<merge> X<slice, hash> 2065 2066(contributed by brian d foy) 2067 2068Before you decide to merge two hashes, you have to decide what to do 2069if both hashes contain keys that are the same and if you want to leave 2070the original hashes as they were. 2071 2072If you want to preserve the original hashes, copy one hash (C<%hash1>) 2073to a new hash (C<%new_hash>), then add the keys from the other hash 2074(C<%hash2> to the new hash. Checking that the key already exists in 2075C<%new_hash> gives you a chance to decide what to do with the 2076duplicates: 2077 2078 my %new_hash = %hash1; # make a copy; leave %hash1 alone 2079 2080 foreach my $key2 ( keys %hash2 ) { 2081 if( exists $new_hash{$key2} ) { 2082 warn "Key [$key2] is in both hashes!"; 2083 # handle the duplicate (perhaps only warning) 2084 ... 2085 next; 2086 } 2087 else { 2088 $new_hash{$key2} = $hash2{$key2}; 2089 } 2090 } 2091 2092If you don't want to create a new hash, you can still use this looping 2093technique; just change the C<%new_hash> to C<%hash1>. 2094 2095 foreach my $key2 ( keys %hash2 ) { 2096 if( exists $hash1{$key2} ) { 2097 warn "Key [$key2] is in both hashes!"; 2098 # handle the duplicate (perhaps only warning) 2099 ... 2100 next; 2101 } 2102 else { 2103 $hash1{$key2} = $hash2{$key2}; 2104 } 2105 } 2106 2107If you don't care that one hash overwrites keys and values from the other, you 2108could just use a hash slice to add one hash to another. In this case, values 2109from C<%hash2> replace values from C<%hash1> when they have keys in common: 2110 2111 @hash1{ keys %hash2 } = values %hash2; 2112 2113=head2 What happens if I add or remove keys from a hash while iterating over it? 2114 2115(contributed by brian d foy) 2116 2117The easy answer is "Don't do that!" 2118 2119If you iterate through the hash with each(), you can delete the key 2120most recently returned without worrying about it. If you delete or add 2121other keys, the iterator may skip or double up on them since perl 2122may rearrange the hash table. See the 2123entry for C<each()> in L<perlfunc>. 2124 2125=head2 How do I look up a hash element by value? 2126 2127Create a reverse hash: 2128 2129 my %by_value = reverse %by_key; 2130 my $key = $by_value{$value}; 2131 2132That's not particularly efficient. It would be more space-efficient 2133to use: 2134 2135 while (my ($key, $value) = each %by_key) { 2136 $by_value{$value} = $key; 2137 } 2138 2139If your hash could have repeated values, the methods above will only find 2140one of the associated keys. This may or may not worry you. If it does 2141worry you, you can always reverse the hash into a hash of arrays instead: 2142 2143 while (my ($key, $value) = each %by_key) { 2144 push @{$key_list_by_value{$value}}, $key; 2145 } 2146 2147=head2 How can I know how many entries are in a hash? 2148 2149(contributed by brian d foy) 2150 2151This is very similar to "How do I process an entire hash?", also in 2152L<perlfaq4>, but a bit simpler in the common cases. 2153 2154You can use the C<keys()> built-in function in scalar context to find out 2155have many entries you have in a hash: 2156 2157 my $key_count = keys %hash; # must be scalar context! 2158 2159If you want to find out how many entries have a defined value, that's 2160a bit different. You have to check each value. A C<grep> is handy: 2161 2162 my $defined_value_count = grep { defined } values %hash; 2163 2164You can use that same structure to count the entries any way that 2165you like. If you want the count of the keys with vowels in them, 2166you just test for that instead: 2167 2168 my $vowel_count = grep { /[aeiou]/ } keys %hash; 2169 2170The C<grep> in scalar context returns the count. If you want the list 2171of matching items, just use it in list context instead: 2172 2173 my @defined_values = grep { defined } values %hash; 2174 2175The C<keys()> function also resets the iterator, which means that you may 2176see strange results if you use this between uses of other hash operators 2177such as C<each()>. 2178 2179=head2 How do I sort a hash (optionally by value instead of key)? 2180 2181(contributed by brian d foy) 2182 2183To sort a hash, start with the keys. In this example, we give the list of 2184keys to the sort function which then compares them ASCIIbetically (which 2185might be affected by your locale settings). The output list has the keys 2186in ASCIIbetical order. Once we have the keys, we can go through them to 2187create a report which lists the keys in ASCIIbetical order. 2188 2189 my @keys = sort { $a cmp $b } keys %hash; 2190 2191 foreach my $key ( @keys ) { 2192 printf "%-20s %6d\n", $key, $hash{$key}; 2193 } 2194 2195We could get more fancy in the C<sort()> block though. Instead of 2196comparing the keys, we can compute a value with them and use that 2197value as the comparison. 2198 2199For instance, to make our report order case-insensitive, we use 2200C<lc> to lowercase the keys before comparing them: 2201 2202 my @keys = sort { lc $a cmp lc $b } keys %hash; 2203 2204Note: if the computation is expensive or the hash has many elements, 2205you may want to look at the Schwartzian Transform to cache the 2206computation results. 2207 2208If we want to sort by the hash value instead, we use the hash key 2209to look it up. We still get out a list of keys, but this time they 2210are ordered by their value. 2211 2212 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash; 2213 2214From there we can get more complex. If the hash values are the same, 2215we can provide a secondary sort on the hash key. 2216 2217 my @keys = sort { 2218 $hash{$a} <=> $hash{$b} 2219 or 2220 "\L$a" cmp "\L$b" 2221 } keys %hash; 2222 2223=head2 How can I always keep my hash sorted? 2224X<hash tie sort DB_File Tie::IxHash> 2225 2226You can look into using the C<DB_File> module and C<tie()> using the 2227C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory 2228Databases">. The L<Tie::IxHash> module from CPAN might also be 2229instructive. Although this does keep your hash sorted, you might not 2230like the slowdown you suffer from the tie interface. Are you sure you 2231need to do this? :) 2232 2233=head2 What's the difference between "delete" and "undef" with hashes? 2234 2235Hashes contain pairs of scalars: the first is the key, the 2236second is the value. The key will be coerced to a string, 2237although the value can be any kind of scalar: string, 2238number, or reference. If a key C<$key> is present in 2239%hash, C<exists($hash{$key})> will return true. The value 2240for a given key can be C<undef>, in which case 2241C<$hash{$key}> will be C<undef> while C<exists $hash{$key}> 2242will return true. This corresponds to (C<$key>, C<undef>) 2243being in the hash. 2244 2245Pictures help... Here's the C<%hash> table: 2246 2247 keys values 2248 +------+------+ 2249 | a | 3 | 2250 | x | 7 | 2251 | d | 0 | 2252 | e | 2 | 2253 +------+------+ 2254 2255And these conditions hold 2256 2257 $hash{'a'} is true 2258 $hash{'d'} is false 2259 defined $hash{'d'} is true 2260 defined $hash{'a'} is true 2261 exists $hash{'a'} is true (Perl 5 only) 2262 grep ($_ eq 'a', keys %hash) is true 2263 2264If you now say 2265 2266 undef $hash{'a'} 2267 2268your table now reads: 2269 2270 2271 keys values 2272 +------+------+ 2273 | a | undef| 2274 | x | 7 | 2275 | d | 0 | 2276 | e | 2 | 2277 +------+------+ 2278 2279and these conditions now hold; changes in caps: 2280 2281 $hash{'a'} is FALSE 2282 $hash{'d'} is false 2283 defined $hash{'d'} is true 2284 defined $hash{'a'} is FALSE 2285 exists $hash{'a'} is true (Perl 5 only) 2286 grep ($_ eq 'a', keys %hash) is true 2287 2288Notice the last two: you have an undef value, but a defined key! 2289 2290Now, consider this: 2291 2292 delete $hash{'a'} 2293 2294your table now reads: 2295 2296 keys values 2297 +------+------+ 2298 | x | 7 | 2299 | d | 0 | 2300 | e | 2 | 2301 +------+------+ 2302 2303and these conditions now hold; changes in caps: 2304 2305 $hash{'a'} is false 2306 $hash{'d'} is false 2307 defined $hash{'d'} is true 2308 defined $hash{'a'} is false 2309 exists $hash{'a'} is FALSE (Perl 5 only) 2310 grep ($_ eq 'a', keys %hash) is FALSE 2311 2312See, the whole entry is gone! 2313 2314=head2 Why don't my tied hashes make the defined/exists distinction? 2315 2316This depends on the tied hash's implementation of EXISTS(). 2317For example, there isn't the concept of undef with hashes 2318that are tied to DBM* files. It also means that exists() and 2319defined() do the same thing with a DBM* file, and what they 2320end up doing is not what they do with ordinary hashes. 2321 2322=head2 How do I reset an each() operation part-way through? 2323 2324(contributed by brian d foy) 2325 2326You can use the C<keys> or C<values> functions to reset C<each>. To 2327simply reset the iterator used by C<each> without doing anything else, 2328use one of them in void context: 2329 2330 keys %hash; # resets iterator, nothing else. 2331 values %hash; # resets iterator, nothing else. 2332 2333See the documentation for C<each> in L<perlfunc>. 2334 2335=head2 How can I get the unique keys from two hashes? 2336 2337First you extract the keys from the hashes into lists, then solve 2338the "removing duplicates" problem described above. For example: 2339 2340 my %seen = (); 2341 for my $element (keys(%foo), keys(%bar)) { 2342 $seen{$element}++; 2343 } 2344 my @uniq = keys %seen; 2345 2346Or more succinctly: 2347 2348 my @uniq = keys %{{%foo,%bar}}; 2349 2350Or if you really want to save space: 2351 2352 my %seen = (); 2353 while (defined ($key = each %foo)) { 2354 $seen{$key}++; 2355 } 2356 while (defined ($key = each %bar)) { 2357 $seen{$key}++; 2358 } 2359 my @uniq = keys %seen; 2360 2361=head2 How can I store a multidimensional array in a DBM file? 2362 2363Either stringify the structure yourself (no fun), or else 2364get the MLDBM (which uses Data::Dumper) module from CPAN and layer 2365it on top of either DB_File or GDBM_File. You might also try DBM::Deep, but 2366it can be a bit slow. 2367 2368=head2 How can I make my hash remember the order I put elements into it? 2369 2370Use the L<Tie::IxHash> from CPAN. 2371 2372 use Tie::IxHash; 2373 2374 tie my %myhash, 'Tie::IxHash'; 2375 2376 for (my $i=0; $i<20; $i++) { 2377 $myhash{$i} = 2*$i; 2378 } 2379 2380 my @keys = keys %myhash; 2381 # @keys = (0,1,2,3,...) 2382 2383=head2 Why does passing a subroutine an undefined element in a hash create it? 2384 2385(contributed by brian d foy) 2386 2387Are you using a really old version of Perl? 2388 2389Normally, accessing a hash key's value for a nonexistent key will 2390I<not> create the key. 2391 2392 my %hash = (); 2393 my $value = $hash{ 'foo' }; 2394 print "This won't print\n" if exists $hash{ 'foo' }; 2395 2396Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though. 2397Since you could assign directly to C<$_[0]>, Perl had to be ready to 2398make that assignment so it created the hash key ahead of time: 2399 2400 my_sub( $hash{ 'foo' } ); 2401 print "This will print before 5.004\n" if exists $hash{ 'foo' }; 2402 2403 sub my_sub { 2404 # $_[0] = 'bar'; # create hash key in case you do this 2405 1; 2406 } 2407 2408Since Perl 5.004, however, this situation is a special case and Perl 2409creates the hash key only when you make the assignment: 2410 2411 my_sub( $hash{ 'foo' } ); 2412 print "This will print, even after 5.004\n" if exists $hash{ 'foo' }; 2413 2414 sub my_sub { 2415 $_[0] = 'bar'; 2416 } 2417 2418However, if you want the old behavior (and think carefully about that 2419because it's a weird side effect), you can pass a hash slice instead. 2420Perl 5.004 didn't make this a special case: 2421 2422 my_sub( @hash{ qw/foo/ } ); 2423 2424=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays? 2425 2426Usually a hash ref, perhaps like this: 2427 2428 $record = { 2429 NAME => "Jason", 2430 EMPNO => 132, 2431 TITLE => "deputy peon", 2432 AGE => 23, 2433 SALARY => 37_000, 2434 PALS => [ "Norbert", "Rhys", "Phineas"], 2435 }; 2436 2437References are documented in L<perlref> and L<perlreftut>. 2438Examples of complex data structures are given in L<perldsc> and 2439L<perllol>. Examples of structures and object-oriented classes are 2440in L<perlootut>. 2441 2442=head2 How can I use a reference as a hash key? 2443 2444(contributed by brian d foy and Ben Morrow) 2445 2446Hash keys are strings, so you can't really use a reference as the key. 2447When you try to do that, perl turns the reference into its stringified 2448form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get 2449back the reference from the stringified form, at least without doing 2450some extra work on your own. 2451 2452Remember that the entry in the hash will still be there even if 2453the referenced variable goes out of scope, and that it is entirely 2454possible for Perl to subsequently allocate a different variable at 2455the same address. This will mean a new variable might accidentally 2456be associated with the value for an old. 2457 2458If you have Perl 5.10 or later, and you just want to store a value 2459against the reference for lookup later, you can use the core 2460Hash::Util::Fieldhash module. This will also handle renaming the 2461keys if you use multiple threads (which causes all variables to be 2462reallocated at new addresses, changing their stringification), and 2463garbage-collecting the entries when the referenced variable goes out 2464of scope. 2465 2466If you actually need to be able to get a real reference back from 2467each hash entry, you can use the Tie::RefHash module, which does the 2468required work for you. 2469 2470=head2 How can I check if a key exists in a multilevel hash? 2471 2472(contributed by brian d foy) 2473 2474The trick to this problem is avoiding accidental autovivification. If 2475you want to check three keys deep, you might naE<0xEF>vely try this: 2476 2477 my %hash; 2478 if( exists $hash{key1}{key2}{key3} ) { 2479 ...; 2480 } 2481 2482Even though you started with a completely empty hash, after that call to 2483C<exists> you've created the structure you needed to check for C<key3>: 2484 2485 %hash = ( 2486 'key1' => { 2487 'key2' => {} 2488 } 2489 ); 2490 2491That's autovivification. You can get around this in a few ways. The 2492easiest way is to just turn it off. The lexical C<autovivification> 2493pragma is available on CPAN. Now you don't add to the hash: 2494 2495 { 2496 no autovivification; 2497 my %hash; 2498 if( exists $hash{key1}{key2}{key3} ) { 2499 ...; 2500 } 2501 } 2502 2503The L<Data::Diver> module on CPAN can do it for you too. Its C<Dive> 2504subroutine can tell you not only if the keys exist but also get the 2505value: 2506 2507 use Data::Diver qw(Dive); 2508 2509 my @exists = Dive( \%hash, qw(key1 key2 key3) ); 2510 if( ! @exists ) { 2511 ...; # keys do not exist 2512 } 2513 elsif( ! defined $exists[0] ) { 2514 ...; # keys exist but value is undef 2515 } 2516 2517You can easily do this yourself too by checking each level of the hash 2518before you move onto the next level. This is essentially what 2519L<Data::Diver> does for you: 2520 2521 if( check_hash( \%hash, qw(key1 key2 key3) ) ) { 2522 ...; 2523 } 2524 2525 sub check_hash { 2526 my( $hash, @keys ) = @_; 2527 2528 return unless @keys; 2529 2530 foreach my $key ( @keys ) { 2531 return unless eval { exists $hash->{$key} }; 2532 $hash = $hash->{$key}; 2533 } 2534 2535 return 1; 2536 } 2537 2538=head2 How can I prevent addition of unwanted keys into a hash? 2539 2540Since version 5.8.0, hashes can be I<restricted> to a fixed number 2541of given keys. Methods for creating and dealing with restricted hashes 2542are exported by the L<Hash::Util> module. 2543 2544=head1 Data: Misc 2545 2546=head2 How do I handle binary data correctly? 2547 2548Perl is binary-clean, so it can handle binary data just fine. 2549On Windows or DOS, however, you have to use C<binmode> for binary 2550files to avoid conversions for line endings. In general, you should 2551use C<binmode> any time you want to work with binary data. 2552 2553Also see L<perlfunc/"binmode"> or L<perlopentut>. 2554 2555If you're concerned about 8-bit textual data then see L<perllocale>. 2556If you want to deal with multibyte characters, however, there are 2557some gotchas. See the section on Regular Expressions. 2558 2559=head2 How do I determine whether a scalar is a number/whole/integer/float? 2560 2561Assuming that you don't care about IEEE notations like "NaN" or 2562"Infinity", you probably just want to use a regular expression (see also 2563L<perlretut> and L<perlre>): 2564 2565 use 5.010; 2566 2567 if ( /\D/ ) 2568 { say "\thas nondigits"; } 2569 if ( /^\d+\z/ ) 2570 { say "\tis a whole number"; } 2571 if ( /^-?\d+\z/ ) 2572 { say "\tis an integer"; } 2573 if ( /^[+-]?\d+\z/ ) 2574 { say "\tis a +/- integer"; } 2575 if ( /^-?(?:\d+\.?|\.\d)\d*\z/ ) 2576 { say "\tis a real number"; } 2577 if ( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i ) 2578 { say "\tis a C float" } 2579 2580There are also some commonly used modules for the task. 2581L<Scalar::Util> (distributed with 5.8) provides access to perl's 2582internal function C<looks_like_number> for determining whether a 2583variable looks like a number. L<Data::Types> exports functions that 2584validate data types using both the above and other regular 2585expressions. Thirdly, there is L<Regexp::Common> which has regular 2586expressions to match various types of numbers. Those three modules are 2587available from the CPAN. 2588 2589If you're on a POSIX system, Perl supports the C<POSIX::strtod> 2590function for converting strings to doubles (and also C<POSIX::strtol> 2591for longs). Its semantics are somewhat cumbersome, so here's a 2592C<getnum> wrapper function for more convenient access. This function 2593takes a string and returns the number it found, or C<undef> for input 2594that isn't a C float. The C<is_numeric> function is a front end to 2595C<getnum> if you just want to say, "Is this a float?" 2596 2597 sub getnum { 2598 use POSIX qw(strtod); 2599 my $str = shift; 2600 $str =~ s/^\s+//; 2601 $str =~ s/\s+$//; 2602 $! = 0; 2603 my($num, $unparsed) = strtod($str); 2604 if (($str eq '') || ($unparsed != 0) || $!) { 2605 return undef; 2606 } 2607 else { 2608 return $num; 2609 } 2610 } 2611 2612 sub is_numeric { defined getnum($_[0]) } 2613 2614Or you could check out the L<String::Scanf> module on the CPAN 2615instead. 2616 2617=head2 How do I keep persistent data across program calls? 2618 2619For some specific applications, you can use one of the DBM modules. 2620See L<AnyDBM_File>. More generically, you should consult the L<FreezeThaw> 2621or L<Storable> modules from CPAN. Starting from Perl 5.8, L<Storable> is part 2622of the standard distribution. Here's one example using L<Storable>'s C<store> 2623and C<retrieve> functions: 2624 2625 use Storable; 2626 store(\%hash, "filename"); 2627 2628 # later on... 2629 $href = retrieve("filename"); # by ref 2630 %hash = %{ retrieve("filename") }; # direct to hash 2631 2632=head2 How do I print out or copy a recursive data structure? 2633 2634The L<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great 2635for printing out data structures. The L<Storable> module on CPAN (or the 26365.8 release of Perl), provides a function called C<dclone> that recursively 2637copies its argument. 2638 2639 use Storable qw(dclone); 2640 $r2 = dclone($r1); 2641 2642Where C<$r1> can be a reference to any kind of data structure you'd like. 2643It will be deeply copied. Because C<dclone> takes and returns references, 2644you'd have to add extra punctuation if you had a hash of arrays that 2645you wanted to copy. 2646 2647 %newhash = %{ dclone(\%oldhash) }; 2648 2649=head2 How do I define methods for every class/object? 2650 2651(contributed by Ben Morrow) 2652 2653You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please 2654be very careful to consider the consequences of doing this: adding 2655methods to every object is very likely to have unintended 2656consequences. If possible, it would be better to have all your object 2657inherit from some common base class, or to use an object system like 2658Moose that supports roles. 2659 2660=head2 How do I verify a credit card checksum? 2661 2662Get the L<Business::CreditCard> module from CPAN. 2663 2664=head2 How do I pack arrays of doubles or floats for XS code? 2665 2666The arrays.h/arrays.c code in the L<PGPLOT> module on CPAN does just this. 2667If you're doing a lot of float or double processing, consider using 2668the L<PDL> module from CPAN instead--it makes number-crunching easy. 2669 2670See L<https://metacpan.org/release/PGPLOT> for the code. 2671 2672 2673=head1 AUTHOR AND COPYRIGHT 2674 2675Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and 2676other authors as noted. All rights reserved. 2677 2678This documentation is free; you can redistribute it and/or modify it 2679under the same terms as Perl itself. 2680 2681Irrespective of its distribution, all code examples in this file 2682are hereby placed into the public domain. You are permitted and 2683encouraged to use this code in your own programs for fun 2684or for profit as you see fit. A simple comment in the code giving 2685credit would be courteous but is not required. 2686