1=head1 NAME
2
3perlfaq4 - Data Manipulation
4
5=head1 VERSION
6
7version 5.20190126
8
9=head1 DESCRIPTION
10
11This section of the FAQ answers questions related to manipulating
12numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
13
14=head1 Data: Numbers
15
16=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
17
18For the long explanation, see David Goldberg's "What Every Computer
19Scientist Should Know About Floating-Point Arithmetic"
20(L<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>).
21
22Internally, your computer represents floating-point numbers in binary.
23Digital (as in powers of two) computers cannot store all numbers
24exactly. Some real numbers lose precision in the process. This is a
25problem with how computers store numbers and affects all computer
26languages, not just Perl.
27
28L<perlnumber> shows the gory details of number representations and
29conversions.
30
31To limit the number of decimal places in your numbers, you can use the
32C<printf> or C<sprintf> function. See
33L<perlop/"Floating-point Arithmetic"> for more details.
34
35    printf "%.2f", 10/3;
36
37    my $number = sprintf "%.2f", 10/3;
38
39=head2 Why is int() broken?
40
41Your C<int()> is most probably working just fine. It's the numbers that
42aren't quite what you think.
43
44First, see the answer to "Why am I getting long decimals
45(eg, 19.9499999999999) instead of the numbers I should be getting
46(eg, 19.95)?".
47
48For example, this
49
50    print int(0.6/0.2-2), "\n";
51
52will in most computers print 0, not 1, because even such simple
53numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
54numbers. What you think in the above as 'three' is really more like
552.9999999999999995559.
56
57=head2 Why isn't my octal data interpreted correctly?
58
59(contributed by brian d foy)
60
61You're probably trying to convert a string to a number, which Perl only
62converts as a decimal number. When Perl converts a string to a number, it
63ignores leading spaces and zeroes, then assumes the rest of the digits
64are in base 10:
65
66    my $string = '0644';
67
68    print $string + 0;  # prints 644
69
70    print $string + 44; # prints 688, certainly not octal!
71
72This problem usually involves one of the Perl built-ins that has the
73same name a Unix command that uses octal numbers as arguments on the
74command line. In this example, C<chmod> on the command line knows that
75its first argument is octal because that's what it does:
76
77    %prompt> chmod 644 file
78
79If you want to use the same literal digits (644) in Perl, you have to tell
80Perl to treat them as octal numbers either by prefixing the digits with
81a C<0> or using C<oct>:
82
83    chmod(     0644, $filename );  # right, has leading zero
84    chmod( oct(644), $filename );  # also correct
85
86The problem comes in when you take your numbers from something that Perl
87thinks is a string, such as a command line argument in C<@ARGV>:
88
89    chmod( $ARGV[0],      $filename );  # wrong, even if "0644"
90
91    chmod( oct($ARGV[0]), $filename );  # correct, treat string as octal
92
93You can always check the value you're using by printing it in octal
94notation to ensure it matches what you think it should be. Print it
95in octal  and decimal format:
96
97    printf "0%o %d", $number, $number;
98
99=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
100
101Remember that C<int()> merely truncates toward 0. For rounding to a
102certain number of digits, C<sprintf()> or C<printf()> is usually the
103easiest route.
104
105    printf("%.3f", 3.1415926535);   # prints 3.142
106
107The L<POSIX> module (part of the standard Perl distribution)
108implements C<ceil()>, C<floor()>, and a number of other mathematical
109and trigonometric functions.
110
111    use POSIX;
112    my $ceil   = ceil(3.5);   # 4
113    my $floor  = floor(3.5);  # 3
114
115In 5.000 to 5.003 perls, trigonometry was done in the L<Math::Complex>
116module. With 5.004, the L<Math::Trig> module (part of the standard Perl
117distribution) implements the trigonometric functions. Internally it
118uses the L<Math::Complex> module and some functions can break out from
119the real axis into the complex plane, for example the inverse sine of
1202.
121
122Rounding in financial applications can have serious implications, and
123the rounding method used should be specified precisely. In these
124cases, it probably pays not to trust whichever system of rounding is
125being used by Perl, but instead to implement the rounding function you
126need yourself.
127
128To see why, notice how you'll still have an issue on half-way-point
129alternation:
130
131    for (my $i = -5; $i <= 5; $i += 0.5) { printf "%.0f ",$i }
132
133    -5 -4 -4 -4 -3 -2 -2 -2 -1 -0 0 0 1 2 2 2 3 4 4 4 5
134
135Don't blame Perl. It's the same as in C. IEEE says we have to do
136this. Perl numbers whose absolute values are integers under 2**31 (on
13732-bit machines) will work pretty much like mathematical integers.
138Other numbers are not guaranteed.
139
140=head2 How do I convert between numeric representations/bases/radixes?
141
142As always with Perl there is more than one way to do it. Below are a
143few examples of approaches to making common conversions between number
144representations. This is intended to be representational rather than
145exhaustive.
146
147Some of the examples later in L<perlfaq4> use the L<Bit::Vector>
148module from CPAN. The reason you might choose L<Bit::Vector> over the
149perl built-in functions is that it works with numbers of ANY size,
150that it is optimized for speed on some operations, and for at least
151some programmers the notation might be familiar.
152
153=over 4
154
155=item How do I convert hexadecimal into decimal
156
157Using perl's built in conversion of C<0x> notation:
158
159    my $dec = 0xDEADBEEF;
160
161Using the C<hex> function:
162
163    my $dec = hex("DEADBEEF");
164
165Using C<pack>:
166
167    my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
168
169Using the CPAN module C<Bit::Vector>:
170
171    use Bit::Vector;
172    my $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
173    my $dec = $vec->to_Dec();
174
175=item How do I convert from decimal to hexadecimal
176
177Using C<sprintf>:
178
179    my $hex = sprintf("%X", 3735928559); # upper case A-F
180    my $hex = sprintf("%x", 3735928559); # lower case a-f
181
182Using C<unpack>:
183
184    my $hex = unpack("H*", pack("N", 3735928559));
185
186Using L<Bit::Vector>:
187
188    use Bit::Vector;
189    my $vec = Bit::Vector->new_Dec(32, -559038737);
190    my $hex = $vec->to_Hex();
191
192And L<Bit::Vector> supports odd bit counts:
193
194    use Bit::Vector;
195    my $vec = Bit::Vector->new_Dec(33, 3735928559);
196    $vec->Resize(32); # suppress leading 0 if unwanted
197    my $hex = $vec->to_Hex();
198
199=item How do I convert from octal to decimal
200
201Using Perl's built in conversion of numbers with leading zeros:
202
203    my $dec = 033653337357; # note the leading 0!
204
205Using the C<oct> function:
206
207    my $dec = oct("33653337357");
208
209Using L<Bit::Vector>:
210
211    use Bit::Vector;
212    my $vec = Bit::Vector->new(32);
213    $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
214    my $dec = $vec->to_Dec();
215
216=item How do I convert from decimal to octal
217
218Using C<sprintf>:
219
220    my $oct = sprintf("%o", 3735928559);
221
222Using L<Bit::Vector>:
223
224    use Bit::Vector;
225    my $vec = Bit::Vector->new_Dec(32, -559038737);
226    my $oct = reverse join('', $vec->Chunk_List_Read(3));
227
228=item How do I convert from binary to decimal
229
230Perl 5.6 lets you write binary numbers directly with
231the C<0b> notation:
232
233    my $number = 0b10110110;
234
235Using C<oct>:
236
237    my $input = "10110110";
238    my $decimal = oct( "0b$input" );
239
240Using C<pack> and C<ord>:
241
242    my $decimal = ord(pack('B8', '10110110'));
243
244Using C<pack> and C<unpack> for larger strings:
245
246    my $int = unpack("N", pack("B32",
247    substr("0" x 32 . "11110101011011011111011101111", -32)));
248    my $dec = sprintf("%d", $int);
249
250    # substr() is used to left-pad a 32-character string with zeros.
251
252Using L<Bit::Vector>:
253
254    my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
255    my $dec = $vec->to_Dec();
256
257=item How do I convert from decimal to binary
258
259Using C<sprintf> (perl 5.6+):
260
261    my $bin = sprintf("%b", 3735928559);
262
263Using C<unpack>:
264
265    my $bin = unpack("B*", pack("N", 3735928559));
266
267Using L<Bit::Vector>:
268
269    use Bit::Vector;
270    my $vec = Bit::Vector->new_Dec(32, -559038737);
271    my $bin = $vec->to_Bin();
272
273The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
274are left as an exercise to the inclined reader.
275
276=back
277
278=head2 Why doesn't & work the way I want it to?
279
280The behavior of binary arithmetic operators depends on whether they're
281used on numbers or strings. The operators treat a string as a series
282of bits and work with that (the string C<"3"> is the bit pattern
283C<00110011>). The operators work with the binary form of a number
284(the number C<3> is treated as the bit pattern C<00000011>).
285
286So, saying C<11 & 3> performs the "and" operation on numbers (yielding
287C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
288(yielding C<"1">).
289
290Most problems with C<&> and C<|> arise because the programmer thinks
291they have a number but really it's a string or vice versa. To avoid this,
292stringify the arguments explicitly (using C<""> or C<qq()>) or convert them
293to numbers explicitly (using C<0+$arg>). The rest arise because
294the programmer says:
295
296    if ("\020\020" & "\101\101") {
297        # ...
298    }
299
300but a string consisting of two null bytes (the result of C<"\020\020"
301& "\101\101">) is not a false value in Perl. You need:
302
303    if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
304        # ...
305    }
306
307=head2 How do I multiply matrices?
308
309Use the L<Math::Matrix> or L<Math::MatrixReal> modules (available from CPAN)
310or the L<PDL> extension (also available from CPAN).
311
312=head2 How do I perform an operation on a series of integers?
313
314To call a function on each element in an array, and collect the
315results, use:
316
317    my @results = map { my_func($_) } @array;
318
319For example:
320
321    my @triple = map { 3 * $_ } @single;
322
323To call a function on each element of an array, but ignore the
324results:
325
326    foreach my $iterator (@array) {
327        some_func($iterator);
328    }
329
330To call a function on each integer in a (small) range, you B<can> use:
331
332    my @results = map { some_func($_) } (5 .. 25);
333
334but you should be aware that in this form, the C<..> operator
335creates a list of all integers in the range, which can take a lot of
336memory for large ranges. However, the problem does not occur when
337using C<..> within a C<for> loop, because in that case the range
338operator is optimized to I<iterate> over the range, without creating
339the entire list. So
340
341    my @results = ();
342    for my $i (5 .. 500_005) {
343        push(@results, some_func($i));
344    }
345
346or even
347
348   push(@results, some_func($_)) for 5 .. 500_005;
349
350will not create an intermediate list of 500,000 integers.
351
352=head2 How can I output Roman numerals?
353
354Get the L<http://www.cpan.org/modules/by-module/Roman> module.
355
356=head2 Why aren't my random numbers random?
357
358If you're using a version of Perl before 5.004, you must call C<srand>
359once at the start of your program to seed the random number generator.
360
361     BEGIN { srand() if $] < 5.004 }
362
3635.004 and later automatically call C<srand> at the beginning. Don't
364call C<srand> more than once--you make your numbers less random,
365rather than more.
366
367Computers are good at being predictable and bad at being random
368(despite appearances caused by bugs in your programs :-). The
369F<random> article in the "Far More Than You Ever Wanted To Know"
370collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy
371of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
372who attempts to generate random numbers by deterministic means is, of
373course, living in a state of sin."
374
375Perl relies on the underlying system for the implementation of
376C<rand> and C<srand>; on some systems, the generated numbers are
377not random enough (especially on Windows : see
378L<http://www.perlmonks.org/?node_id=803632>).
379Several CPAN modules in the C<Math> namespace implement better
380pseudorandom generators; see for example
381L<Math::Random::MT> ("Mersenne Twister", fast), or
382L<Math::TrulyRandom> (uses the imperfections in the system's
383timer to generate random numbers, which is rather slow).
384More algorithms for random numbers are described in
385"Numerical Recipes in C" at L<http://www.nr.com/>
386
387=head2 How do I get a random number between X and Y?
388
389To get a random number between two values, you can use the C<rand()>
390built-in to get a random number between 0 and 1. From there, you shift
391that into the range that you want.
392
393C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
394what you want to have perl figure out is a random number in the range
395from 0 to the difference between your I<X> and I<Y>.
396
397That is, to get a number between 10 and 15, inclusive, you want a
398random number between 0 and 5 that you can then add to 10.
399
400    my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
401
402Hence you derive the following simple function to abstract
403that. It selects a random integer between the two given
404integers (inclusive). For example: C<random_int_between(50,120)>.
405
406    sub random_int_between {
407        my($min, $max) = @_;
408        # Assumes that the two arguments are integers themselves!
409        return $min if $min == $max;
410        ($min, $max) = ($max, $min)  if  $min > $max;
411        return $min + int rand(1 + $max - $min);
412    }
413
414=head1 Data: Dates
415
416=head2 How do I find the day or week of the year?
417
418The day of the year is in the list returned
419by the C<localtime> function. Without an
420argument C<localtime> uses the current time.
421
422    my $day_of_year = (localtime)[7];
423
424The L<POSIX> module can also format a date as the day of the year or
425week of the year.
426
427    use POSIX qw/strftime/;
428    my $day_of_year  = strftime "%j", localtime;
429    my $week_of_year = strftime "%W", localtime;
430
431To get the day of year for any date, use L<POSIX>'s C<mktime> to get
432a time in epoch seconds for the argument to C<localtime>.
433
434    use POSIX qw/mktime strftime/;
435    my $week_of_year = strftime "%W",
436        localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
437
438You can also use L<Time::Piece>, which comes with Perl and provides a
439C<localtime> that returns an object:
440
441    use Time::Piece;
442    my $day_of_year  = localtime->yday;
443    my $week_of_year = localtime->week;
444
445The L<Date::Calc> module provides two functions to calculate these, too:
446
447    use Date::Calc;
448    my $day_of_year  = Day_of_Year(  1987, 12, 18 );
449    my $week_of_year = Week_of_Year( 1987, 12, 18 );
450
451=head2 How do I find the current century or millennium?
452
453Use the following simple functions:
454
455    sub get_century    {
456        return int((((localtime(shift || time))[5] + 1999))/100);
457    }
458
459    sub get_millennium {
460        return 1+int((((localtime(shift || time))[5] + 1899))/1000);
461    }
462
463On some systems, the L<POSIX> module's C<strftime()> function has been
464extended in a non-standard way to use a C<%C> format, which they
465sometimes claim is the "century". It isn't, because on most such
466systems, this is only the first two digits of the four-digit year, and
467thus cannot be used to determine reliably the current century or
468millennium.
469
470=head2 How can I compare two dates and find the difference?
471
472(contributed by brian d foy)
473
474You could just store all your dates as a number and then subtract.
475Life isn't always that simple though.
476
477The L<Time::Piece> module, which comes with Perl, replaces L<localtime>
478with a version that returns an object. It also overloads the comparison
479operators so you can compare them directly:
480
481    use Time::Piece;
482    my $date1 = localtime( $some_time );
483    my $date2 = localtime( $some_other_time );
484
485    if( $date1 < $date2 ) {
486        print "The date was in the past\n";
487    }
488
489You can also get differences with a subtraction, which returns a
490L<Time::Seconds> object:
491
492    my $date_diff = $date1 - $date2;
493    print "The difference is ", $date_diff->days, " days\n";
494
495If you want to work with formatted dates, the L<Date::Manip>,
496L<Date::Calc>, or L<DateTime> modules can help you.
497
498=head2 How can I take a string and turn it into epoch seconds?
499
500If it's a regular enough string that it always has the same format,
501you can split it up and pass the parts to C<timelocal> in the standard
502L<Time::Local> module. Otherwise, you should look into the L<Date::Calc>,
503L<Date::Parse>, and L<Date::Manip> modules from CPAN.
504
505=head2 How can I find the Julian Day?
506
507(contributed by brian d foy and Dave Cross)
508
509You can use the L<Time::Piece> module, part of the Standard Library,
510which can convert a date/time to a Julian Day:
511
512    $ perl -MTime::Piece -le 'print localtime->julian_day'
513    2455607.7959375
514
515Or the modified Julian Day:
516
517    $ perl -MTime::Piece -le 'print localtime->mjd'
518    55607.2961226851
519
520Or even the day of the year (which is what some people think of as a
521Julian day):
522
523    $ perl -MTime::Piece -le 'print localtime->yday'
524    45
525
526You can also do the same things with the L<DateTime> module:
527
528    $ perl -MDateTime -le'print DateTime->today->jd'
529    2453401.5
530    $ perl -MDateTime -le'print DateTime->today->mjd'
531    53401
532    $ perl -MDateTime -le'print DateTime->today->doy'
533    31
534
535You can use the L<Time::JulianDay> module available on CPAN. Ensure
536that you really want to find a Julian day, though, as many people have
537different ideas about Julian days (see L<http://www.hermetic.ch/cal_stud/jdn.htm>
538for instance):
539
540    $  perl -MTime::JulianDay -le 'print local_julian_day( time )'
541    55608
542
543=head2 How do I find yesterday's date?
544X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
545X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
546X<timelocal>
547
548(contributed by brian d foy)
549
550To do it correctly, you can use one of the C<Date> modules since they
551work with calendars instead of times. The L<DateTime> module makes it
552simple, and give you the same time of day, only the day before,
553despite daylight saving time changes:
554
555    use DateTime;
556
557    my $yesterday = DateTime->now->subtract( days => 1 );
558
559    print "Yesterday was $yesterday\n";
560
561You can also use the L<Date::Calc> module using its C<Today_and_Now>
562function.
563
564    use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
565
566    my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
567
568    print "@date_time\n";
569
570Most people try to use the time rather than the calendar to figure out
571dates, but that assumes that days are twenty-four hours each. For
572most people, there are two days a year when they aren't: the switch to
573and from summer time throws this off. For example, the rest of the
574suggestions will be wrong sometimes:
575
576Starting with Perl 5.10, L<Time::Piece> and L<Time::Seconds> are part
577of the standard distribution, so you might think that you could do
578something like this:
579
580    use Time::Piece;
581    use Time::Seconds;
582
583    my $yesterday = localtime() - ONE_DAY; # WRONG
584    print "Yesterday was $yesterday\n";
585
586The L<Time::Piece> module exports a new C<localtime> that returns an
587object, and L<Time::Seconds> exports the C<ONE_DAY> constant that is a
588set number of seconds. This means that it always gives the time 24
589hours ago, which is not always yesterday. This can cause problems
590around the end of daylight saving time when there's one day that is 25
591hours long.
592
593You have the same problem with L<Time::Local>, which will give the wrong
594answer for those same special cases:
595
596    # contributed by Gunnar Hjalmarsson
597     use Time::Local;
598     my $today = timelocal 0, 0, 12, ( localtime )[3..5];
599     my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
600     printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
601
602=head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
603
604(contributed by brian d foy)
605
606Perl itself never had a Y2K problem, although that never stopped people
607from creating Y2K problems on their own. See the documentation for
608C<localtime> for its proper use.
609
610Starting with Perl 5.12, C<localtime> and C<gmtime> can handle dates past
61103:14:08 January 19, 2038, when a 32-bit based time would overflow. You
612still might get a warning on a 32-bit C<perl>:
613
614    % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
615    Integer overflow in hexadecimal number at -e line 1.
616    Wed Nov  1 19:42:39 5576711
617
618On a 64-bit C<perl>, you can get even larger dates for those really long
619running projects:
620
621    % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
622    Thu Nov  2 00:42:39 5576711
623
624You're still out of luck if you need to keep track of decaying protons
625though.
626
627=head1 Data: Strings
628
629=head2 How do I validate input?
630
631(contributed by brian d foy)
632
633There are many ways to ensure that values are what you expect or
634want to accept. Besides the specific examples that we cover in the
635perlfaq, you can also look at the modules with "Assert" and "Validate"
636in their names, along with other modules such as L<Regexp::Common>.
637
638Some modules have validation for particular types of input, such
639as L<Business::ISBN>, L<Business::CreditCard>, L<Email::Valid>,
640and L<Data::Validate::IP>.
641
642=head2 How do I unescape a string?
643
644It depends just what you mean by "escape". URL escapes are dealt
645with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
646character are removed with
647
648    s/\\(.)/$1/g;
649
650This won't expand C<"\n"> or C<"\t"> or any other special escapes.
651
652=head2 How do I remove consecutive pairs of characters?
653
654(contributed by brian d foy)
655
656You can use the substitution operator to find pairs of characters (or
657runs of characters) and replace them with a single instance. In this
658substitution, we find a character in C<(.)>. The memory parentheses
659store the matched character in the back-reference C<\g1> and we use
660that to require that the same thing immediately follow it. We replace
661that part of the string with the character in C<$1>.
662
663    s/(.)\g1/$1/g;
664
665We can also use the transliteration operator, C<tr///>. In this
666example, the search list side of our C<tr///> contains nothing, but
667the C<c> option complements that so it contains everything. The
668replacement list also contains nothing, so the transliteration is
669almost a no-op since it won't do any replacements (or more exactly,
670replace the character with itself). However, the C<s> option squashes
671duplicated and consecutive characters in the string so a character
672does not show up next to itself
673
674    my $str = 'Haarlem';   # in the Netherlands
675    $str =~ tr///cs;       # Now Harlem, like in New York
676
677=head2 How do I expand function calls in a string?
678
679(contributed by brian d foy)
680
681This is documented in L<perlref>, and although it's not the easiest
682thing to read, it does work. In each of these examples, we call the
683function inside the braces used to dereference a reference. If we
684have more than one return value, we can construct and dereference an
685anonymous array. In this case, we call the function in list context.
686
687    print "The time values are @{ [localtime] }.\n";
688
689If we want to call the function in scalar context, we have to do a bit
690more work. We can really have any code we like inside the braces, so
691we simply have to end with the scalar reference, although how you do
692that is up to you, and you can use code inside the braces. Note that
693the use of parens creates a list context, so we need C<scalar> to
694force the scalar context on the function:
695
696    print "The time is ${\(scalar localtime)}.\n"
697
698    print "The time is ${ my $x = localtime; \$x }.\n";
699
700If your function already returns a reference, you don't need to create
701the reference yourself.
702
703    sub timestamp { my $t = localtime; \$t }
704
705    print "The time is ${ timestamp() }.\n";
706
707The C<Interpolation> module can also do a lot of magic for you. You can
708specify a variable name, in this case C<E>, to set up a tied hash that
709does the interpolation for you. It has several other methods to do this
710as well.
711
712    use Interpolation E => 'eval';
713    print "The time values are $E{localtime()}.\n";
714
715In most cases, it is probably easier to simply use string concatenation,
716which also forces scalar context.
717
718    print "The time is " . localtime() . ".\n";
719
720=head2 How do I find matching/nesting anything?
721
722To find something between two single
723characters, a pattern like C</x([^x]*)x/> will get the intervening
724bits in $1. For multiple ones, then something more like
725C</alpha(.*?)omega/> would be needed. For nested patterns
726and/or balanced expressions, see the so-called
727L<< (?PARNO)|perlre/C<(?PARNO)> C<(?-PARNO)> C<(?+PARNO)> C<(?R)> C<(?0)> >>
728construct (available since perl 5.10).
729The CPAN module L<Regexp::Common> can help to build such
730regular expressions (see in particular
731L<Regexp::Common::balanced> and L<Regexp::Common::delimited>).
732
733More complex cases will require to write a parser, probably
734using a parsing module from CPAN, like
735L<Regexp::Grammars>, L<Parse::RecDescent>, L<Parse::Yapp>,
736L<Text::Balanced>, or L<Marpa::R2>.
737
738=head2 How do I reverse a string?
739
740Use C<reverse()> in scalar context, as documented in
741L<perlfunc/reverse>.
742
743    my $reversed = reverse $string;
744
745=head2 How do I expand tabs in a string?
746
747You can do it yourself:
748
749    1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
750
751Or you can just use the L<Text::Tabs> module (part of the standard Perl
752distribution).
753
754    use Text::Tabs;
755    my @expanded_lines = expand(@lines_with_tabs);
756
757=head2 How do I reformat a paragraph?
758
759Use L<Text::Wrap> (part of the standard Perl distribution):
760
761    use Text::Wrap;
762    print wrap("\t", '  ', @paragraphs);
763
764The paragraphs you give to L<Text::Wrap> should not contain embedded
765newlines. L<Text::Wrap> doesn't justify the lines (flush-right).
766
767Or use the CPAN module L<Text::Autoformat>. Formatting files can be
768easily done by making a shell alias, like so:
769
770    alias fmt="perl -i -MText::Autoformat -n0777 \
771        -e 'print autoformat $_, {all=>1}' $*"
772
773See the documentation for L<Text::Autoformat> to appreciate its many
774capabilities.
775
776=head2 How can I access or change N characters of a string?
777
778You can access the first characters of a string with substr().
779To get the first character, for example, start at position 0
780and grab the string of length 1.
781
782
783    my $string = "Just another Perl Hacker";
784    my $first_char = substr( $string, 0, 1 );  #  'J'
785
786To change part of a string, you can use the optional fourth
787argument which is the replacement string.
788
789    substr( $string, 13, 4, "Perl 5.8.0" );
790
791You can also use substr() as an lvalue.
792
793    substr( $string, 13, 4 ) =  "Perl 5.8.0";
794
795=head2 How do I change the Nth occurrence of something?
796
797You have to keep track of N yourself. For example, let's say you want
798to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
799C<"whosoever"> or C<"whomsoever">, case insensitively. These
800all assume that $_ contains the string to be altered.
801
802    $count = 0;
803    s{((whom?)ever)}{
804    ++$count == 5       # is it the 5th?
805        ? "${2}soever"  # yes, swap
806        : $1            # renege and leave it there
807        }ige;
808
809In the more general case, you can use the C</g> modifier in a C<while>
810loop, keeping count of matches.
811
812    $WANT = 3;
813    $count = 0;
814    $_ = "One fish two fish red fish blue fish";
815    while (/(\w+)\s+fish\b/gi) {
816        if (++$count == $WANT) {
817            print "The third fish is a $1 one.\n";
818        }
819    }
820
821That prints out: C<"The third fish is a red one.">  You can also use a
822repetition count and repeated pattern like this:
823
824    /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
825
826=head2 How can I count the number of occurrences of a substring within a string?
827
828There are a number of ways, with varying efficiency. If you want a
829count of a certain single character (X) within a string, you can use the
830C<tr///> function like so:
831
832    my $string = "ThisXlineXhasXsomeXx'sXinXit";
833    my $count = ($string =~ tr/X//);
834    print "There are $count X characters in the string";
835
836This is fine if you are just looking for a single character. However,
837if you are trying to count multiple character substrings within a
838larger string, C<tr///> won't work. What you can do is wrap a while()
839loop around a global pattern match. For example, let's count negative
840integers:
841
842    my $string = "-9 55 48 -2 23 -76 4 14 -44";
843    my $count = 0;
844    while ($string =~ /-\d+/g) { $count++ }
845    print "There are $count negative numbers in the string";
846
847Another version uses a global match in list context, then assigns the
848result to a scalar, producing a count of the number of matches.
849
850    my $count = () = $string =~ /-\d+/g;
851
852=head2 How do I capitalize all the words on one line?
853X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
854
855(contributed by brian d foy)
856
857Damian Conway's L<Text::Autoformat> handles all of the thinking
858for you.
859
860    use Text::Autoformat;
861    my $x = "Dr. Strangelove or: How I Learned to Stop ".
862      "Worrying and Love the Bomb";
863
864    print $x, "\n";
865    for my $style (qw( sentence title highlight )) {
866        print autoformat($x, { case => $style }), "\n";
867    }
868
869How do you want to capitalize those words?
870
871    FRED AND BARNEY'S LODGE        # all uppercase
872    Fred And Barney's Lodge        # title case
873    Fred and Barney's Lodge        # highlight case
874
875It's not as easy a problem as it looks. How many words do you think
876are in there? Wait for it... wait for it.... If you answered 5
877you're right. Perl words are groups of C<\w+>, but that's not what
878you want to capitalize. How is Perl supposed to know not to capitalize
879that C<s> after the apostrophe? You could try a regular expression:
880
881    $string =~ s/ (
882                 (^\w)    #at the beginning of the line
883                   |      # or
884                 (\s\w)   #preceded by whitespace
885                   )
886                /\U$1/xg;
887
888    $string =~ s/([\w']+)/\u\L$1/g;
889
890Now, what if you don't want to capitalize that "and"? Just use
891L<Text::Autoformat> and get on with the next problem. :)
892
893=head2 How can I split a [character]-delimited string except when inside [character]?
894
895Several modules can handle this sort of parsing--L<Text::Balanced>,
896L<Text::CSV>, L<Text::CSV_XS>, and L<Text::ParseWords>, among others.
897
898Take the example case of trying to split a string that is
899comma-separated into its different fields. You can't use C<split(/,/)>
900because you shouldn't split if the comma is inside quotes. For
901example, take a data line like this:
902
903    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
904
905Due to the restriction of the quotes, this is a fairly complex
906problem. Thankfully, we have Jeffrey Friedl, author of
907I<Mastering Regular Expressions>, to handle these for us. He
908suggests (assuming your string is contained in C<$text>):
909
910     my @new = ();
911     push(@new, $+) while $text =~ m{
912         "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
913        | ([^,]+),?
914        | ,
915     }gx;
916     push(@new, undef) if substr($text,-1,1) eq ',';
917
918If you want to represent quotation marks inside a
919quotation-mark-delimited field, escape them with backslashes (eg,
920C<"like \"this\"">.
921
922Alternatively, the L<Text::ParseWords> module (part of the standard
923Perl distribution) lets you say:
924
925    use Text::ParseWords;
926    @new = quotewords(",", 0, $text);
927
928For parsing or generating CSV, though, using L<Text::CSV> rather than
929implementing it yourself is highly recommended; you'll save yourself odd bugs
930popping up later by just using code which has already been tried and tested in
931production for years.
932
933=head2 How do I strip blank space from the beginning/end of a string?
934
935(contributed by brian d foy)
936
937A substitution can do this for you. For a single line, you want to
938replace all the leading or trailing whitespace with nothing. You
939can do that with a pair of substitutions:
940
941    s/^\s+//;
942    s/\s+$//;
943
944You can also write that as a single substitution, although it turns
945out the combined statement is slower than the separate ones. That
946might not matter to you, though:
947
948    s/^\s+|\s+$//g;
949
950In this regular expression, the alternation matches either at the
951beginning or the end of the string since the anchors have a lower
952precedence than the alternation. With the C</g> flag, the substitution
953makes all possible matches, so it gets both. Remember, the trailing
954newline matches the C<\s+>, and  the C<$> anchor can match to the
955absolute end of the string, so the newline disappears too. Just add
956the newline to the output, which has the added benefit of preserving
957"blank" (consisting entirely of whitespace) lines which the C<^\s+>
958would remove all by itself:
959
960    while( <> ) {
961        s/^\s+|\s+$//g;
962        print "$_\n";
963    }
964
965For a multi-line string, you can apply the regular expression to each
966logical line in the string by adding the C</m> flag (for
967"multi-line"). With the C</m> flag, the C<$> matches I<before> an
968embedded newline, so it doesn't remove it. This pattern still removes
969the newline at the end of the string:
970
971    $string =~ s/^\s+|\s+$//gm;
972
973Remember that lines consisting entirely of whitespace will disappear,
974since the first part of the alternation can match the entire string
975and replace it with nothing. If you need to keep embedded blank lines,
976you have to do a little more work. Instead of matching any whitespace
977(since that includes a newline), just match the other whitespace:
978
979    $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
980
981=head2 How do I pad a string with blanks or pad a number with zeroes?
982
983In the following examples, C<$pad_len> is the length to which you wish
984to pad the string, C<$text> or C<$num> contains the string to be padded,
985and C<$pad_char> contains the padding character. You can use a single
986character string constant instead of the C<$pad_char> variable if you
987know what it is in advance. And in the same way you can use an integer in
988place of C<$pad_len> if you know the pad length in advance.
989
990The simplest method uses the C<sprintf> function. It can pad on the left
991or right with blanks and on the left with zeroes and it will not
992truncate the result. The C<pack> function can only pad strings on the
993right with blanks and it will truncate the result to a maximum length of
994C<$pad_len>.
995
996    # Left padding a string with blanks (no truncation):
997    my $padded = sprintf("%${pad_len}s", $text);
998    my $padded = sprintf("%*s", $pad_len, $text);  # same thing
999
1000    # Right padding a string with blanks (no truncation):
1001    my $padded = sprintf("%-${pad_len}s", $text);
1002    my $padded = sprintf("%-*s", $pad_len, $text); # same thing
1003
1004    # Left padding a number with 0 (no truncation):
1005    my $padded = sprintf("%0${pad_len}d", $num);
1006    my $padded = sprintf("%0*d", $pad_len, $num); # same thing
1007
1008    # Right padding a string with blanks using pack (will truncate):
1009    my $padded = pack("A$pad_len",$text);
1010
1011If you need to pad with a character other than blank or zero you can use
1012one of the following methods. They all generate a pad string with the
1013C<x> operator and combine that with C<$text>. These methods do
1014not truncate C<$text>.
1015
1016Left and right padding with any character, creating a new string:
1017
1018    my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
1019    my $padded = $text . $pad_char x ( $pad_len - length( $text ) );
1020
1021Left and right padding with any character, modifying C<$text> directly:
1022
1023    substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
1024    $text .= $pad_char x ( $pad_len - length( $text ) );
1025
1026=head2 How do I extract selected columns from a string?
1027
1028(contributed by brian d foy)
1029
1030If you know the columns that contain the data, you can
1031use C<substr> to extract a single column.
1032
1033    my $column = substr( $line, $start_column, $length );
1034
1035You can use C<split> if the columns are separated by whitespace or
1036some other delimiter, as long as whitespace or the delimiter cannot
1037appear as part of the data.
1038
1039    my $line    = ' fred barney   betty   ';
1040    my @columns = split /\s+/, $line;
1041        # ( '', 'fred', 'barney', 'betty' );
1042
1043    my $line    = 'fred||barney||betty';
1044    my @columns = split /\|/, $line;
1045        # ( 'fred', '', 'barney', '', 'betty' );
1046
1047If you want to work with comma-separated values, don't do this since
1048that format is a bit more complicated. Use one of the modules that
1049handle that format, such as L<Text::CSV>, L<Text::CSV_XS>, or
1050L<Text::CSV_PP>.
1051
1052If you want to break apart an entire line of fixed columns, you can use
1053C<unpack> with the A (ASCII) format. By using a number after the format
1054specifier, you can denote the column width. See the C<pack> and C<unpack>
1055entries in L<perlfunc> for more details.
1056
1057    my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1058
1059Note that spaces in the format argument to C<unpack> do not denote literal
1060spaces. If you have space separated data, you may want C<split> instead.
1061
1062=head2 How do I find the soundex value of a string?
1063
1064(contributed by brian d foy)
1065
1066You can use the C<Text::Soundex> module. If you want to do fuzzy or close
1067matching, you might also try the L<String::Approx>, and
1068L<Text::Metaphone>, and L<Text::DoubleMetaphone> modules.
1069
1070=head2 How can I expand variables in text strings?
1071
1072(contributed by brian d foy)
1073
1074If you can avoid it, don't, or if you can use a templating system,
1075such as L<Text::Template> or L<Template> Toolkit, do that instead. You
1076might even be able to get the job done with C<sprintf> or C<printf>:
1077
1078    my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1079
1080However, for the one-off simple case where I don't want to pull out a
1081full templating system, I'll use a string that has two Perl scalar
1082variables in it. In this example, I want to expand C<$foo> and C<$bar>
1083to their variable's values:
1084
1085    my $foo = 'Fred';
1086    my $bar = 'Barney';
1087    $string = 'Say hello to $foo and $bar';
1088
1089One way I can do this involves the substitution operator and a double
1090C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1091turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1092it with its value. C<$foo>, then, turns into 'Fred', and that's finally
1093what's left in the string:
1094
1095    $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1096
1097The C</e> will also silently ignore violations of strict, replacing
1098undefined variable names with the empty string. Since I'm using the
1099C</e> flag (twice even!), I have all of the same security problems I
1100have with C<eval> in its string form. If there's something odd in
1101C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1102I could get myself in trouble.
1103
1104To get around the security problem, I could also pull the values from
1105a hash instead of evaluating variable names. Using a single C</e>, I
1106can check the hash to ensure the value exists, and if it doesn't, I
1107can replace the missing value with a marker, in this case C<???> to
1108signal that I missed something:
1109
1110    my $string = 'This has $foo and $bar';
1111
1112    my %Replacements = (
1113        foo  => 'Fred',
1114        );
1115
1116    # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1117    $string =~ s/\$(\w+)/
1118        exists $Replacements{$1} ? $Replacements{$1} : '???'
1119        /eg;
1120
1121    print $string;
1122
1123=head2 What's wrong with always quoting "$vars"?
1124
1125The problem is that those double-quotes force
1126stringification--coercing numbers and references into strings--even
1127when you don't want them to be strings. Think of it this way:
1128double-quote expansion is used to produce new strings. If you already
1129have a string, why do you need more?
1130
1131If you get used to writing odd things like these:
1132
1133    print "$var";       # BAD
1134    my $new = "$old";       # BAD
1135    somefunc("$var");    # BAD
1136
1137You'll be in trouble. Those should (in 99.8% of the cases) be
1138the simpler and more direct:
1139
1140    print $var;
1141    my $new = $old;
1142    somefunc($var);
1143
1144Otherwise, besides slowing you down, you're going to break code when
1145the thing in the scalar is actually neither a string nor a number, but
1146a reference:
1147
1148    func(\@array);
1149    sub func {
1150        my $aref = shift;
1151        my $oref = "$aref";  # WRONG
1152    }
1153
1154You can also get into subtle problems on those few operations in Perl
1155that actually do care about the difference between a string and a
1156number, such as the magical C<++> autoincrement operator or the
1157syscall() function.
1158
1159Stringification also destroys arrays.
1160
1161    my @lines = `command`;
1162    print "@lines";     # WRONG - extra blanks
1163    print @lines;       # right
1164
1165=head2 Why don't my E<lt>E<lt>HERE documents work?
1166
1167Here documents are found in L<perlop>. Check for these three things:
1168
1169=over 4
1170
1171=item There must be no space after the E<lt>E<lt> part.
1172
1173=item There (probably) should be a semicolon at the end of the opening token
1174
1175=item You can't (easily) have any space in front of the tag.
1176
1177=item There needs to be at least a line separator after the end token.
1178
1179=back
1180
1181If you want to indent the text in the here document, you
1182can do this:
1183
1184    # all in one
1185    (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1186        your text
1187        goes here
1188    HERE_TARGET
1189
1190But the HERE_TARGET must still be flush against the margin.
1191If you want that indented also, you'll have to quote
1192in the indentation.
1193
1194    (my $quote = <<'    FINIS') =~ s/^\s+//gm;
1195            ...we will have peace, when you and all your works have
1196            perished--and the works of your dark master to whom you
1197            would deliver us. You are a liar, Saruman, and a corrupter
1198            of men's hearts. --Theoden in /usr/src/perl/taint.c
1199        FINIS
1200    $quote =~ s/\s+--/\n--/;
1201
1202A nice general-purpose fixer-upper function for indented here documents
1203follows. It expects to be called with a here document as its argument.
1204It looks to see whether each line begins with a common substring, and
1205if so, strips that substring off. Otherwise, it takes the amount of leading
1206whitespace found on the first line and removes that much off each
1207subsequent line.
1208
1209    sub fix {
1210        local $_ = shift;
1211        my ($white, $leader);  # common whitespace and common leading string
1212        if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
1213            ($white, $leader) = ($2, quotemeta($1));
1214        } else {
1215            ($white, $leader) = (/^(\s+)/, '');
1216        }
1217        s/^\s*?$leader(?:$white)?//gm;
1218        return $_;
1219    }
1220
1221This works with leading special strings, dynamically determined:
1222
1223    my $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1224    @@@ int
1225    @@@ runops() {
1226    @@@     SAVEI32(runlevel);
1227    @@@     runlevel++;
1228    @@@     while ( op = (*op->op_ppaddr)() );
1229    @@@     TAINT_NOT;
1230    @@@     return 0;
1231    @@@ }
1232    MAIN_INTERPRETER_LOOP
1233
1234Or with a fixed amount of leading whitespace, with remaining
1235indentation correctly preserved:
1236
1237    my $poem = fix<<EVER_ON_AND_ON;
1238       Now far ahead the Road has gone,
1239      And I must follow, if I can,
1240       Pursuing it with eager feet,
1241      Until it joins some larger way
1242       Where many paths and errands meet.
1243      And whither then? I cannot say.
1244        --Bilbo in /usr/src/perl/pp_ctl.c
1245    EVER_ON_AND_ON
1246
1247Beginning with Perl version 5.26, a much simpler and cleaner way to
1248write indented here documents has been added to the language: the
1249tilde (~) modifier. See L<perlop/"Indented Here-docs"> for details.
1250
1251=head1 Data: Arrays
1252
1253=head2 What is the difference between a list and an array?
1254
1255(contributed by brian d foy)
1256
1257A list is a fixed collection of scalars. An array is a variable that
1258holds a variable collection of scalars. An array can supply its collection
1259for list operations, so list operations also work on arrays:
1260
1261    # slices
1262    ( 'dog', 'cat', 'bird' )[2,3];
1263    @animals[2,3];
1264
1265    # iteration
1266    foreach ( qw( dog cat bird ) ) { ... }
1267    foreach ( @animals ) { ... }
1268
1269    my @three = grep { length == 3 } qw( dog cat bird );
1270    my @three = grep { length == 3 } @animals;
1271
1272    # supply an argument list
1273    wash_animals( qw( dog cat bird ) );
1274    wash_animals( @animals );
1275
1276Array operations, which change the scalars, rearrange them, or add
1277or subtract some scalars, only work on arrays. These can't work on a
1278list, which is fixed. Array operations include C<shift>, C<unshift>,
1279C<push>, C<pop>, and C<splice>.
1280
1281An array can also change its length:
1282
1283    $#animals = 1;  # truncate to two elements
1284    $#animals = 10000; # pre-extend to 10,001 elements
1285
1286You can change an array element, but you can't change a list element:
1287
1288    $animals[0] = 'Rottweiler';
1289    qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1290
1291    foreach ( @animals ) {
1292        s/^d/fr/;  # works fine
1293    }
1294
1295    foreach ( qw( dog cat bird ) ) {
1296        s/^d/fr/;  # Error! Modification of read only value!
1297    }
1298
1299However, if the list element is itself a variable, it appears that you
1300can change a list element. However, the list element is the variable, not
1301the data. You're not changing the list element, but something the list
1302element refers to. The list element itself doesn't change: it's still
1303the same variable.
1304
1305You also have to be careful about context. You can assign an array to
1306a scalar to get the number of elements in the array. This only works
1307for arrays, though:
1308
1309    my $count = @animals;  # only works with arrays
1310
1311If you try to do the same thing with what you think is a list, you
1312get a quite different result. Although it looks like you have a list
1313on the righthand side, Perl actually sees a bunch of scalars separated
1314by a comma:
1315
1316    my $scalar = ( 'dog', 'cat', 'bird' );  # $scalar gets bird
1317
1318Since you're assigning to a scalar, the righthand side is in scalar
1319context. The comma operator (yes, it's an operator!) in scalar
1320context evaluates its lefthand side, throws away the result, and
1321evaluates it's righthand side and returns the result. In effect,
1322that list-lookalike assigns to C<$scalar> it's rightmost value. Many
1323people mess this up because they choose a list-lookalike whose
1324last element is also the count they expect:
1325
1326    my $scalar = ( 1, 2, 3 );  # $scalar gets 3, accidentally
1327
1328=head2 What is the difference between $array[1] and @array[1]?
1329
1330(contributed by brian d foy)
1331
1332The difference is the sigil, that special character in front of the
1333array name. The C<$> sigil means "exactly one item", while the C<@>
1334sigil means "zero or more items". The C<$> gets you a single scalar,
1335while the C<@> gets you a list.
1336
1337The confusion arises because people incorrectly assume that the sigil
1338denotes the variable type.
1339
1340The C<$array[1]> is a single-element access to the array. It's going
1341to return the item in index 1 (or undef if there is no item there).
1342If you intend to get exactly one element from the array, this is the
1343form you should use.
1344
1345The C<@array[1]> is an array slice, although it has only one index.
1346You can pull out multiple elements simultaneously by specifying
1347additional indices as a list, like C<@array[1,4,3,0]>.
1348
1349Using a slice on the lefthand side of the assignment supplies list
1350context to the righthand side. This can lead to unexpected results.
1351For instance, if you want to read a single line from a filehandle,
1352assigning to a scalar value is fine:
1353
1354    $array[1] = <STDIN>;
1355
1356However, in list context, the line input operator returns all of the
1357lines as a list. The first line goes into C<@array[1]> and the rest
1358of the lines mysteriously disappear:
1359
1360    @array[1] = <STDIN>;  # most likely not what you want
1361
1362Either the C<use warnings> pragma or the B<-w> flag will warn you when
1363you use an array slice with a single index.
1364
1365=head2 How can I remove duplicate elements from a list or array?
1366
1367(contributed by brian d foy)
1368
1369Use a hash. When you think the words "unique" or "duplicated", think
1370"hash keys".
1371
1372If you don't care about the order of the elements, you could just
1373create the hash then extract the keys. It's not important how you
1374create that hash: just that you use C<keys> to get the unique
1375elements.
1376
1377    my %hash   = map { $_, 1 } @array;
1378    # or a hash slice: @hash{ @array } = ();
1379    # or a foreach: $hash{$_} = 1 foreach ( @array );
1380
1381    my @unique = keys %hash;
1382
1383If you want to use a module, try the C<uniq> function from
1384L<List::MoreUtils>. In list context it returns the unique elements,
1385preserving their order in the list. In scalar context, it returns the
1386number of unique elements.
1387
1388    use List::MoreUtils qw(uniq);
1389
1390    my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1391    my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1392
1393You can also go through each element and skip the ones you've seen
1394before. Use a hash to keep track. The first time the loop sees an
1395element, that element has no key in C<%Seen>. The C<next> statement
1396creates the key and immediately uses its value, which is C<undef>, so
1397the loop continues to the C<push> and increments the value for that
1398key. The next time the loop sees that same element, its key exists in
1399the hash I<and> the value for that key is true (since it's not 0 or
1400C<undef>), so the next skips that iteration and the loop goes to the
1401next element.
1402
1403    my @unique = ();
1404    my %seen   = ();
1405
1406    foreach my $elem ( @array ) {
1407        next if $seen{ $elem }++;
1408        push @unique, $elem;
1409    }
1410
1411You can write this more briefly using a grep, which does the
1412same thing.
1413
1414    my %seen = ();
1415    my @unique = grep { ! $seen{ $_ }++ } @array;
1416
1417=head2 How can I tell whether a certain element is contained in a list or array?
1418
1419(portions of this answer contributed by Anno Siegel and brian d foy)
1420
1421Hearing the word "in" is an I<in>dication that you probably should have
1422used a hash, not a list or array, to store your data. Hashes are
1423designed to answer this question quickly and efficiently. Arrays aren't.
1424
1425That being said, there are several ways to approach this. In Perl 5.10
1426and later, you can use the smart match operator to check that an item is
1427contained in an array or a hash:
1428
1429    use 5.010;
1430
1431    if( $item ~~ @array ) {
1432        say "The array contains $item"
1433    }
1434
1435    if( $item ~~ %hash ) {
1436        say "The hash contains $item"
1437    }
1438
1439With earlier versions of Perl, you have to do a bit more work. If you
1440are going to make this query many times over arbitrary string values,
1441the fastest way is probably to invert the original array and maintain a
1442hash whose keys are the first array's values:
1443
1444    my @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1445    my %is_blue = ();
1446    for (@blues) { $is_blue{$_} = 1 }
1447
1448Now you can check whether C<$is_blue{$some_color}>. It might have
1449been a good idea to keep the blues all in a hash in the first place.
1450
1451If the values are all small integers, you could use a simple indexed
1452array. This kind of an array will take up less space:
1453
1454    my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1455    my @is_tiny_prime = ();
1456    for (@primes) { $is_tiny_prime[$_] = 1 }
1457    # or simply  @istiny_prime[@primes] = (1) x @primes;
1458
1459Now you check whether $is_tiny_prime[$some_number].
1460
1461If the values in question are integers instead of strings, you can save
1462quite a lot of space by using bit strings instead:
1463
1464    my @articles = ( 1..10, 150..2000, 2017 );
1465    undef $read;
1466    for (@articles) { vec($read,$_,1) = 1 }
1467
1468Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1469
1470These methods guarantee fast individual tests but require a re-organization
1471of the original list or array. They only pay off if you have to test
1472multiple values against the same array.
1473
1474If you are testing only once, the standard module L<List::Util> exports
1475the function C<first> for this purpose. It works by stopping once it
1476finds the element. It's written in C for speed, and its Perl equivalent
1477looks like this subroutine:
1478
1479    sub first (&@) {
1480        my $code = shift;
1481        foreach (@_) {
1482            return $_ if &{$code}();
1483        }
1484        undef;
1485    }
1486
1487If speed is of little concern, the common idiom uses grep in scalar context
1488(which returns the number of items that passed its condition) to traverse the
1489entire list. This does have the benefit of telling you how many matches it
1490found, though.
1491
1492    my $is_there = grep $_ eq $whatever, @array;
1493
1494If you want to actually extract the matching elements, simply use grep in
1495list context.
1496
1497    my @matches = grep $_ eq $whatever, @array;
1498
1499=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1500
1501Use a hash. Here's code to do both and more. It assumes that each
1502element is unique in a given array:
1503
1504    my (@union, @intersection, @difference);
1505    my %count = ();
1506    foreach my $element (@array1, @array2) { $count{$element}++ }
1507    foreach my $element (keys %count) {
1508        push @union, $element;
1509        push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1510    }
1511
1512Note that this is the I<symmetric difference>, that is, all elements
1513in either A or in B but not in both. Think of it as an xor operation.
1514
1515=head2 How do I test whether two arrays or hashes are equal?
1516
1517With Perl 5.10 and later, the smart match operator can give you the answer
1518with the least amount of work:
1519
1520    use 5.010;
1521
1522    if( @array1 ~~ @array2 ) {
1523        say "The arrays are the same";
1524    }
1525
1526    if( %hash1 ~~ %hash2 ) # doesn't check values!  {
1527        say "The hash keys are the same";
1528    }
1529
1530The following code works for single-level arrays. It uses a
1531stringwise comparison, and does not distinguish defined versus
1532undefined empty strings. Modify if you have other needs.
1533
1534    $are_equal = compare_arrays(\@frogs, \@toads);
1535
1536    sub compare_arrays {
1537        my ($first, $second) = @_;
1538        no warnings;  # silence spurious -w undef complaints
1539        return 0 unless @$first == @$second;
1540        for (my $i = 0; $i < @$first; $i++) {
1541            return 0 if $first->[$i] ne $second->[$i];
1542        }
1543        return 1;
1544    }
1545
1546For multilevel structures, you may wish to use an approach more
1547like this one. It uses the CPAN module L<FreezeThaw>:
1548
1549    use FreezeThaw qw(cmpStr);
1550    my @a = my @b = ( "this", "that", [ "more", "stuff" ] );
1551
1552    printf "a and b contain %s arrays\n",
1553        cmpStr(\@a, \@b) == 0
1554        ? "the same"
1555        : "different";
1556
1557This approach also works for comparing hashes. Here we'll demonstrate
1558two different answers:
1559
1560    use FreezeThaw qw(cmpStr cmpStrHard);
1561
1562    my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1563    $a{EXTRA} = \%b;
1564    $b{EXTRA} = \%a;
1565
1566    printf "a and b contain %s hashes\n",
1567    cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1568
1569    printf "a and b contain %s hashes\n",
1570    cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1571
1572
1573The first reports that both those the hashes contain the same data,
1574while the second reports that they do not. Which you prefer is left as
1575an exercise to the reader.
1576
1577=head2 How do I find the first array element for which a condition is true?
1578
1579To find the first array element which satisfies a condition, you can
1580use the C<first()> function in the L<List::Util> module, which comes
1581with Perl 5.8. This example finds the first element that contains
1582"Perl".
1583
1584    use List::Util qw(first);
1585
1586    my $element = first { /Perl/ } @array;
1587
1588If you cannot use L<List::Util>, you can make your own loop to do the
1589same thing. Once you find the element, you stop the loop with last.
1590
1591    my $found;
1592    foreach ( @array ) {
1593        if( /Perl/ ) { $found = $_; last }
1594    }
1595
1596If you want the array index, use the C<firstidx()> function from
1597C<List::MoreUtils>:
1598
1599    use List::MoreUtils qw(firstidx);
1600    my $index = firstidx { /Perl/ } @array;
1601
1602Or write it yourself, iterating through the indices
1603and checking the array element at each index until you find one
1604that satisfies the condition:
1605
1606    my( $found, $index ) = ( undef, -1 );
1607    for( $i = 0; $i < @array; $i++ ) {
1608        if( $array[$i] =~ /Perl/ ) {
1609            $found = $array[$i];
1610            $index = $i;
1611            last;
1612        }
1613    }
1614
1615=head2 How do I handle linked lists?
1616
1617(contributed by brian d foy)
1618
1619Perl's arrays do not have a fixed size, so you don't need linked lists
1620if you just want to add or remove items. You can use array operations
1621such as C<push>, C<pop>, C<shift>, C<unshift>, or C<splice> to do
1622that.
1623
1624Sometimes, however, linked lists can be useful in situations where you
1625want to "shard" an array so you have many small arrays instead of
1626a single big array. You can keep arrays longer than Perl's largest
1627array index, lock smaller arrays separately in threaded programs,
1628reallocate less memory, or quickly insert elements in the middle of
1629the chain.
1630
1631Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
1632Linked Lists" ( L<http://www.slideshare.net/lembark/perly-linked-lists> ),
1633although you can just use his L<LinkedList::Single> module.
1634
1635=head2 How do I handle circular lists?
1636X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1637X<cycle> X<modulus>
1638
1639(contributed by brian d foy)
1640
1641If you want to cycle through an array endlessly, you can increment the
1642index modulo the number of elements in the array:
1643
1644    my @array = qw( a b c );
1645    my $i = 0;
1646
1647    while( 1 ) {
1648        print $array[ $i++ % @array ], "\n";
1649        last if $i > 20;
1650    }
1651
1652You can also use L<Tie::Cycle> to use a scalar that always has the
1653next element of the circular array:
1654
1655    use Tie::Cycle;
1656
1657    tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1658
1659    print $cycle; # FFFFFF
1660    print $cycle; # 000000
1661    print $cycle; # FFFF00
1662
1663The L<Array::Iterator::Circular> creates an iterator object for
1664circular arrays:
1665
1666    use Array::Iterator::Circular;
1667
1668    my $color_iterator = Array::Iterator::Circular->new(
1669        qw(red green blue orange)
1670        );
1671
1672    foreach ( 1 .. 20 ) {
1673        print $color_iterator->next, "\n";
1674    }
1675
1676=head2 How do I shuffle an array randomly?
1677
1678If you either have Perl 5.8.0 or later installed, or if you have
1679Scalar-List-Utils 1.03 or later installed, you can say:
1680
1681    use List::Util 'shuffle';
1682
1683    @shuffled = shuffle(@list);
1684
1685If not, you can use a Fisher-Yates shuffle.
1686
1687    sub fisher_yates_shuffle {
1688        my $deck = shift;  # $deck is a reference to an array
1689        return unless @$deck; # must not be empty!
1690
1691        my $i = @$deck;
1692        while (--$i) {
1693            my $j = int rand ($i+1);
1694            @$deck[$i,$j] = @$deck[$j,$i];
1695        }
1696    }
1697
1698    # shuffle my mpeg collection
1699    #
1700    my @mpeg = <audio/*/*.mp3>;
1701    fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1702    print @mpeg;
1703
1704Note that the above implementation shuffles an array in place,
1705unlike the C<List::Util::shuffle()> which takes a list and returns
1706a new shuffled list.
1707
1708You've probably seen shuffling algorithms that work using splice,
1709randomly picking another element to swap the current element with
1710
1711    srand;
1712    @new = ();
1713    @old = 1 .. 10;  # just a demo
1714    while (@old) {
1715        push(@new, splice(@old, rand @old, 1));
1716    }
1717
1718This is bad because splice is already O(N), and since you do it N
1719times, you just invented a quadratic algorithm; that is, O(N**2).
1720This does not scale, although Perl is so efficient that you probably
1721won't notice this until you have rather largish arrays.
1722
1723=head2 How do I process/modify each element of an array?
1724
1725Use C<for>/C<foreach>:
1726
1727    for (@lines) {
1728        s/foo/bar/;    # change that word
1729        tr/XZ/ZX/;    # swap those letters
1730    }
1731
1732Here's another; let's compute spherical volumes:
1733
1734    my @volumes = @radii;
1735    for (@volumes) {   # @volumes has changed parts
1736        $_ **= 3;
1737        $_ *= (4/3) * 3.14159;  # this will be constant folded
1738    }
1739
1740which can also be done with C<map()> which is made to transform
1741one list into another:
1742
1743    my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1744
1745If you want to do the same thing to modify the values of the
1746hash, you can use the C<values> function. As of Perl 5.6
1747the values are not copied, so if you modify $orbit (in this
1748case), you modify the value.
1749
1750    for my $orbit ( values %orbits ) {
1751        ($orbit **= 3) *= (4/3) * 3.14159;
1752    }
1753
1754Prior to perl 5.6 C<values> returned copies of the values,
1755so older perl code often contains constructions such as
1756C<@orbits{keys %orbits}> instead of C<values %orbits> where
1757the hash is to be modified.
1758
1759=head2 How do I select a random element from an array?
1760
1761Use the C<rand()> function (see L<perlfunc/rand>):
1762
1763    my $index   = rand @array;
1764    my $element = $array[$index];
1765
1766Or, simply:
1767
1768    my $element = $array[ rand @array ];
1769
1770=head2 How do I permute N elements of a list?
1771X<List::Permutor> X<permute> X<Algorithm::Loops> X<Knuth>
1772X<The Art of Computer Programming> X<Fischer-Krause>
1773
1774Use the L<List::Permutor> module on CPAN. If the list is actually an
1775array, try the L<Algorithm::Permute> module (also on CPAN). It's
1776written in XS code and is very efficient:
1777
1778    use Algorithm::Permute;
1779
1780    my @array = 'a'..'d';
1781    my $p_iterator = Algorithm::Permute->new ( \@array );
1782
1783    while (my @perm = $p_iterator->next) {
1784       print "next permutation: (@perm)\n";
1785    }
1786
1787For even faster execution, you could do:
1788
1789    use Algorithm::Permute;
1790
1791    my @array = 'a'..'d';
1792
1793    Algorithm::Permute::permute {
1794        print "next permutation: (@array)\n";
1795    } @array;
1796
1797Here's a little program that generates all permutations of all the
1798words on each line of input. The algorithm embodied in the
1799C<permute()> function is discussed in Volume 4 (still unpublished) of
1800Knuth's I<The Art of Computer Programming> and will work on any list:
1801
1802    #!/usr/bin/perl -n
1803    # Fischer-Krause ordered permutation generator
1804
1805    sub permute (&@) {
1806        my $code = shift;
1807        my @idx = 0..$#_;
1808        while ( $code->(@_[@idx]) ) {
1809            my $p = $#idx;
1810            --$p while $idx[$p-1] > $idx[$p];
1811            my $q = $p or return;
1812            push @idx, reverse splice @idx, $p;
1813            ++$q while $idx[$p-1] > $idx[$q];
1814            @idx[$p-1,$q]=@idx[$q,$p-1];
1815        }
1816    }
1817
1818    permute { print "@_\n" } split;
1819
1820The L<Algorithm::Loops> module also provides the C<NextPermute> and
1821C<NextPermuteNum> functions which efficiently find all unique permutations
1822of an array, even if it contains duplicate values, modifying it in-place:
1823if its elements are in reverse-sorted order then the array is reversed,
1824making it sorted, and it returns false; otherwise the next
1825permutation is returned.
1826
1827C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1828you can enumerate all the permutations of C<0..9> like this:
1829
1830    use Algorithm::Loops qw(NextPermuteNum);
1831
1832    my @list= 0..9;
1833    do { print "@list\n" } while NextPermuteNum @list;
1834
1835=head2 How do I sort an array by (anything)?
1836
1837Supply a comparison function to sort() (described in L<perlfunc/sort>):
1838
1839    @list = sort { $a <=> $b } @list;
1840
1841The default sort function is cmp, string comparison, which would
1842sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
1843the numerical comparison operator.
1844
1845If you have a complicated function needed to pull out the part you
1846want to sort on, then don't do it inside the sort function. Pull it
1847out first, because the sort BLOCK can be called many times for the
1848same element. Here's an example of how to pull out the first word
1849after the first number on each item, and then sort those words
1850case-insensitively.
1851
1852    my @idx;
1853    for (@data) {
1854        my $item;
1855        ($item) = /\d+\s*(\S+)/;
1856        push @idx, uc($item);
1857    }
1858    my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1859
1860which could also be written this way, using a trick
1861that's come to be known as the Schwartzian Transform:
1862
1863    my @sorted = map  { $_->[0] }
1864        sort { $a->[1] cmp $b->[1] }
1865        map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1866
1867If you need to sort on several fields, the following paradigm is useful.
1868
1869    my @sorted = sort {
1870        field1($a) <=> field1($b) ||
1871        field2($a) cmp field2($b) ||
1872        field3($a) cmp field3($b)
1873    } @data;
1874
1875This can be conveniently combined with precalculation of keys as given
1876above.
1877
1878See the F<sort> article in the "Far More Than You Ever Wanted
1879To Know" collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for
1880more about this approach.
1881
1882See also the question later in L<perlfaq4> on sorting hashes.
1883
1884=head2 How do I manipulate arrays of bits?
1885
1886Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1887operations.
1888
1889For example, you don't have to store individual bits in an array
1890(which would mean that you're wasting a lot of space). To convert an
1891array of bits to a string, use C<vec()> to set the right bits. This
1892sets C<$vec> to have bit N set only if C<$ints[N]> was set:
1893
1894    my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1895    my $vec = '';
1896    foreach( 0 .. $#ints ) {
1897        vec($vec,$_,1) = 1 if $ints[$_];
1898    }
1899
1900The string C<$vec> only takes up as many bits as it needs. For
1901instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1902bytes to store them (not counting the scalar variable overhead).
1903
1904Here's how, given a vector in C<$vec>, you can get those bits into
1905your C<@ints> array:
1906
1907    sub bitvec_to_list {
1908        my $vec = shift;
1909        my @ints;
1910        # Find null-byte density then select best algorithm
1911        if ($vec =~ tr/\0// / length $vec > 0.95) {
1912            use integer;
1913            my $i;
1914
1915            # This method is faster with mostly null-bytes
1916            while($vec =~ /[^\0]/g ) {
1917                $i = -9 + 8 * pos $vec;
1918                push @ints, $i if vec($vec, ++$i, 1);
1919                push @ints, $i if vec($vec, ++$i, 1);
1920                push @ints, $i if vec($vec, ++$i, 1);
1921                push @ints, $i if vec($vec, ++$i, 1);
1922                push @ints, $i if vec($vec, ++$i, 1);
1923                push @ints, $i if vec($vec, ++$i, 1);
1924                push @ints, $i if vec($vec, ++$i, 1);
1925                push @ints, $i if vec($vec, ++$i, 1);
1926            }
1927        }
1928        else {
1929            # This method is a fast general algorithm
1930            use integer;
1931            my $bits = unpack "b*", $vec;
1932            push @ints, 0 if $bits =~ s/^(\d)// && $1;
1933            push @ints, pos $bits while($bits =~ /1/g);
1934        }
1935
1936        return \@ints;
1937    }
1938
1939This method gets faster the more sparse the bit vector is.
1940(Courtesy of Tim Bunce and Winfried Koenig.)
1941
1942You can make the while loop a lot shorter with this suggestion
1943from Benjamin Goldberg:
1944
1945    while($vec =~ /[^\0]+/g ) {
1946        push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1947    }
1948
1949Or use the CPAN module L<Bit::Vector>:
1950
1951    my $vector = Bit::Vector->new($num_of_bits);
1952    $vector->Index_List_Store(@ints);
1953    my @ints = $vector->Index_List_Read();
1954
1955L<Bit::Vector> provides efficient methods for bit vector, sets of
1956small integers and "big int" math.
1957
1958Here's a more extensive illustration using vec():
1959
1960    # vec demo
1961    my $vector = "\xff\x0f\xef\xfe";
1962    print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1963    unpack("N", $vector), "\n";
1964    my $is_set = vec($vector, 23, 1);
1965    print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1966    pvec($vector);
1967
1968    set_vec(1,1,1);
1969    set_vec(3,1,1);
1970    set_vec(23,1,1);
1971
1972    set_vec(3,1,3);
1973    set_vec(3,2,3);
1974    set_vec(3,4,3);
1975    set_vec(3,4,7);
1976    set_vec(3,8,3);
1977    set_vec(3,8,7);
1978
1979    set_vec(0,32,17);
1980    set_vec(1,32,17);
1981
1982    sub set_vec {
1983        my ($offset, $width, $value) = @_;
1984        my $vector = '';
1985        vec($vector, $offset, $width) = $value;
1986        print "offset=$offset width=$width value=$value\n";
1987        pvec($vector);
1988    }
1989
1990    sub pvec {
1991        my $vector = shift;
1992        my $bits = unpack("b*", $vector);
1993        my $i = 0;
1994        my $BASE = 8;
1995
1996        print "vector length in bytes: ", length($vector), "\n";
1997        @bytes = unpack("A8" x length($vector), $bits);
1998        print "bits are: @bytes\n\n";
1999    }
2000
2001=head2 Why does defined() return true on empty arrays and hashes?
2002
2003The short story is that you should probably only use defined on scalars or
2004functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
2005in the 5.004 release or later of Perl for more detail.
2006
2007=head1 Data: Hashes (Associative Arrays)
2008
2009=head2 How do I process an entire hash?
2010
2011(contributed by brian d foy)
2012
2013There are a couple of ways that you can process an entire hash. You
2014can get a list of keys, then go through each key, or grab a one
2015key-value pair at a time.
2016
2017To go through all of the keys, use the C<keys> function. This extracts
2018all of the keys of the hash and gives them back to you as a list. You
2019can then get the value through the particular key you're processing:
2020
2021    foreach my $key ( keys %hash ) {
2022        my $value = $hash{$key}
2023        ...
2024    }
2025
2026Once you have the list of keys, you can process that list before you
2027process the hash elements. For instance, you can sort the keys so you
2028can process them in lexical order:
2029
2030    foreach my $key ( sort keys %hash ) {
2031        my $value = $hash{$key}
2032        ...
2033    }
2034
2035Or, you might want to only process some of the items. If you only want
2036to deal with the keys that start with C<text:>, you can select just
2037those using C<grep>:
2038
2039    foreach my $key ( grep /^text:/, keys %hash ) {
2040        my $value = $hash{$key}
2041        ...
2042    }
2043
2044If the hash is very large, you might not want to create a long list of
2045keys. To save some memory, you can grab one key-value pair at a time using
2046C<each()>, which returns a pair you haven't seen yet:
2047
2048    while( my( $key, $value ) = each( %hash ) ) {
2049        ...
2050    }
2051
2052The C<each> operator returns the pairs in apparently random order, so if
2053ordering matters to you, you'll have to stick with the C<keys> method.
2054
2055The C<each()> operator can be a bit tricky though. You can't add or
2056delete keys of the hash while you're using it without possibly
2057skipping or re-processing some pairs after Perl internally rehashes
2058all of the elements. Additionally, a hash has only one iterator, so if
2059you mix C<keys>, C<values>, or C<each> on the same hash, you risk resetting
2060the iterator and messing up your processing. See the C<each> entry in
2061L<perlfunc> for more details.
2062
2063=head2 How do I merge two hashes?
2064X<hash> X<merge> X<slice, hash>
2065
2066(contributed by brian d foy)
2067
2068Before you decide to merge two hashes, you have to decide what to do
2069if both hashes contain keys that are the same and if you want to leave
2070the original hashes as they were.
2071
2072If you want to preserve the original hashes, copy one hash (C<%hash1>)
2073to a new hash (C<%new_hash>), then add the keys from the other hash
2074(C<%hash2> to the new hash. Checking that the key already exists in
2075C<%new_hash> gives you a chance to decide what to do with the
2076duplicates:
2077
2078    my %new_hash = %hash1; # make a copy; leave %hash1 alone
2079
2080    foreach my $key2 ( keys %hash2 ) {
2081        if( exists $new_hash{$key2} ) {
2082            warn "Key [$key2] is in both hashes!";
2083            # handle the duplicate (perhaps only warning)
2084            ...
2085            next;
2086        }
2087        else {
2088            $new_hash{$key2} = $hash2{$key2};
2089        }
2090    }
2091
2092If you don't want to create a new hash, you can still use this looping
2093technique; just change the C<%new_hash> to C<%hash1>.
2094
2095    foreach my $key2 ( keys %hash2 ) {
2096        if( exists $hash1{$key2} ) {
2097            warn "Key [$key2] is in both hashes!";
2098            # handle the duplicate (perhaps only warning)
2099            ...
2100            next;
2101        }
2102        else {
2103            $hash1{$key2} = $hash2{$key2};
2104        }
2105      }
2106
2107If you don't care that one hash overwrites keys and values from the other, you
2108could just use a hash slice to add one hash to another. In this case, values
2109from C<%hash2> replace values from C<%hash1> when they have keys in common:
2110
2111    @hash1{ keys %hash2 } = values %hash2;
2112
2113=head2 What happens if I add or remove keys from a hash while iterating over it?
2114
2115(contributed by brian d foy)
2116
2117The easy answer is "Don't do that!"
2118
2119If you iterate through the hash with each(), you can delete the key
2120most recently returned without worrying about it. If you delete or add
2121other keys, the iterator may skip or double up on them since perl
2122may rearrange the hash table. See the
2123entry for C<each()> in L<perlfunc>.
2124
2125=head2 How do I look up a hash element by value?
2126
2127Create a reverse hash:
2128
2129    my %by_value = reverse %by_key;
2130    my $key = $by_value{$value};
2131
2132That's not particularly efficient. It would be more space-efficient
2133to use:
2134
2135    while (my ($key, $value) = each %by_key) {
2136        $by_value{$value} = $key;
2137    }
2138
2139If your hash could have repeated values, the methods above will only find
2140one of the associated keys.  This may or may not worry you. If it does
2141worry you, you can always reverse the hash into a hash of arrays instead:
2142
2143    while (my ($key, $value) = each %by_key) {
2144         push @{$key_list_by_value{$value}}, $key;
2145    }
2146
2147=head2 How can I know how many entries are in a hash?
2148
2149(contributed by brian d foy)
2150
2151This is very similar to "How do I process an entire hash?", also in
2152L<perlfaq4>, but a bit simpler in the common cases.
2153
2154You can use the C<keys()> built-in function in scalar context to find out
2155have many entries you have in a hash:
2156
2157    my $key_count = keys %hash; # must be scalar context!
2158
2159If you want to find out how many entries have a defined value, that's
2160a bit different. You have to check each value. A C<grep> is handy:
2161
2162    my $defined_value_count = grep { defined } values %hash;
2163
2164You can use that same structure to count the entries any way that
2165you like. If you want the count of the keys with vowels in them,
2166you just test for that instead:
2167
2168    my $vowel_count = grep { /[aeiou]/ } keys %hash;
2169
2170The C<grep> in scalar context returns the count. If you want the list
2171of matching items, just use it in list context instead:
2172
2173    my @defined_values = grep { defined } values %hash;
2174
2175The C<keys()> function also resets the iterator, which means that you may
2176see strange results if you use this between uses of other hash operators
2177such as C<each()>.
2178
2179=head2 How do I sort a hash (optionally by value instead of key)?
2180
2181(contributed by brian d foy)
2182
2183To sort a hash, start with the keys. In this example, we give the list of
2184keys to the sort function which then compares them ASCIIbetically (which
2185might be affected by your locale settings). The output list has the keys
2186in ASCIIbetical order. Once we have the keys, we can go through them to
2187create a report which lists the keys in ASCIIbetical order.
2188
2189    my @keys = sort { $a cmp $b } keys %hash;
2190
2191    foreach my $key ( @keys ) {
2192        printf "%-20s %6d\n", $key, $hash{$key};
2193    }
2194
2195We could get more fancy in the C<sort()> block though. Instead of
2196comparing the keys, we can compute a value with them and use that
2197value as the comparison.
2198
2199For instance, to make our report order case-insensitive, we use
2200C<lc> to lowercase the keys before comparing them:
2201
2202    my @keys = sort { lc $a cmp lc $b } keys %hash;
2203
2204Note: if the computation is expensive or the hash has many elements,
2205you may want to look at the Schwartzian Transform to cache the
2206computation results.
2207
2208If we want to sort by the hash value instead, we use the hash key
2209to look it up. We still get out a list of keys, but this time they
2210are ordered by their value.
2211
2212    my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2213
2214From there we can get more complex. If the hash values are the same,
2215we can provide a secondary sort on the hash key.
2216
2217    my @keys = sort {
2218        $hash{$a} <=> $hash{$b}
2219            or
2220        "\L$a" cmp "\L$b"
2221    } keys %hash;
2222
2223=head2 How can I always keep my hash sorted?
2224X<hash tie sort DB_File Tie::IxHash>
2225
2226You can look into using the C<DB_File> module and C<tie()> using the
2227C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2228Databases">. The L<Tie::IxHash> module from CPAN might also be
2229instructive. Although this does keep your hash sorted, you might not
2230like the slowdown you suffer from the tie interface. Are you sure you
2231need to do this? :)
2232
2233=head2 What's the difference between "delete" and "undef" with hashes?
2234
2235Hashes contain pairs of scalars: the first is the key, the
2236second is the value. The key will be coerced to a string,
2237although the value can be any kind of scalar: string,
2238number, or reference. If a key C<$key> is present in
2239%hash, C<exists($hash{$key})> will return true. The value
2240for a given key can be C<undef>, in which case
2241C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2242will return true. This corresponds to (C<$key>, C<undef>)
2243being in the hash.
2244
2245Pictures help... Here's the C<%hash> table:
2246
2247      keys  values
2248    +------+------+
2249    |  a   |  3   |
2250    |  x   |  7   |
2251    |  d   |  0   |
2252    |  e   |  2   |
2253    +------+------+
2254
2255And these conditions hold
2256
2257    $hash{'a'}                       is true
2258    $hash{'d'}                       is false
2259    defined $hash{'d'}               is true
2260    defined $hash{'a'}               is true
2261    exists $hash{'a'}                is true (Perl 5 only)
2262    grep ($_ eq 'a', keys %hash)     is true
2263
2264If you now say
2265
2266    undef $hash{'a'}
2267
2268your table now reads:
2269
2270
2271      keys  values
2272    +------+------+
2273    |  a   | undef|
2274    |  x   |  7   |
2275    |  d   |  0   |
2276    |  e   |  2   |
2277    +------+------+
2278
2279and these conditions now hold; changes in caps:
2280
2281    $hash{'a'}                       is FALSE
2282    $hash{'d'}                       is false
2283    defined $hash{'d'}               is true
2284    defined $hash{'a'}               is FALSE
2285    exists $hash{'a'}                is true (Perl 5 only)
2286    grep ($_ eq 'a', keys %hash)     is true
2287
2288Notice the last two: you have an undef value, but a defined key!
2289
2290Now, consider this:
2291
2292    delete $hash{'a'}
2293
2294your table now reads:
2295
2296      keys  values
2297    +------+------+
2298    |  x   |  7   |
2299    |  d   |  0   |
2300    |  e   |  2   |
2301    +------+------+
2302
2303and these conditions now hold; changes in caps:
2304
2305    $hash{'a'}                       is false
2306    $hash{'d'}                       is false
2307    defined $hash{'d'}               is true
2308    defined $hash{'a'}               is false
2309    exists $hash{'a'}                is FALSE (Perl 5 only)
2310    grep ($_ eq 'a', keys %hash)     is FALSE
2311
2312See, the whole entry is gone!
2313
2314=head2 Why don't my tied hashes make the defined/exists distinction?
2315
2316This depends on the tied hash's implementation of EXISTS().
2317For example, there isn't the concept of undef with hashes
2318that are tied to DBM* files. It also means that exists() and
2319defined() do the same thing with a DBM* file, and what they
2320end up doing is not what they do with ordinary hashes.
2321
2322=head2 How do I reset an each() operation part-way through?
2323
2324(contributed by brian d foy)
2325
2326You can use the C<keys> or C<values> functions to reset C<each>. To
2327simply reset the iterator used by C<each> without doing anything else,
2328use one of them in void context:
2329
2330    keys %hash; # resets iterator, nothing else.
2331    values %hash; # resets iterator, nothing else.
2332
2333See the documentation for C<each> in L<perlfunc>.
2334
2335=head2 How can I get the unique keys from two hashes?
2336
2337First you extract the keys from the hashes into lists, then solve
2338the "removing duplicates" problem described above. For example:
2339
2340    my %seen = ();
2341    for my $element (keys(%foo), keys(%bar)) {
2342        $seen{$element}++;
2343    }
2344    my @uniq = keys %seen;
2345
2346Or more succinctly:
2347
2348    my @uniq = keys %{{%foo,%bar}};
2349
2350Or if you really want to save space:
2351
2352    my %seen = ();
2353    while (defined ($key = each %foo)) {
2354        $seen{$key}++;
2355    }
2356    while (defined ($key = each %bar)) {
2357        $seen{$key}++;
2358    }
2359    my @uniq = keys %seen;
2360
2361=head2 How can I store a multidimensional array in a DBM file?
2362
2363Either stringify the structure yourself (no fun), or else
2364get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2365it on top of either DB_File or GDBM_File. You might also try DBM::Deep, but
2366it can be a bit slow.
2367
2368=head2 How can I make my hash remember the order I put elements into it?
2369
2370Use the L<Tie::IxHash> from CPAN.
2371
2372    use Tie::IxHash;
2373
2374    tie my %myhash, 'Tie::IxHash';
2375
2376    for (my $i=0; $i<20; $i++) {
2377        $myhash{$i} = 2*$i;
2378    }
2379
2380    my @keys = keys %myhash;
2381    # @keys = (0,1,2,3,...)
2382
2383=head2 Why does passing a subroutine an undefined element in a hash create it?
2384
2385(contributed by brian d foy)
2386
2387Are you using a really old version of Perl?
2388
2389Normally, accessing a hash key's value for a nonexistent key will
2390I<not> create the key.
2391
2392    my %hash  = ();
2393    my $value = $hash{ 'foo' };
2394    print "This won't print\n" if exists $hash{ 'foo' };
2395
2396Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2397Since you could assign directly to C<$_[0]>, Perl had to be ready to
2398make that assignment so it created the hash key ahead of time:
2399
2400    my_sub( $hash{ 'foo' } );
2401    print "This will print before 5.004\n" if exists $hash{ 'foo' };
2402
2403    sub my_sub {
2404        # $_[0] = 'bar'; # create hash key in case you do this
2405        1;
2406    }
2407
2408Since Perl 5.004, however, this situation is a special case and Perl
2409creates the hash key only when you make the assignment:
2410
2411    my_sub( $hash{ 'foo' } );
2412    print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2413
2414    sub my_sub {
2415        $_[0] = 'bar';
2416    }
2417
2418However, if you want the old behavior (and think carefully about that
2419because it's a weird side effect), you can pass a hash slice instead.
2420Perl 5.004 didn't make this a special case:
2421
2422    my_sub( @hash{ qw/foo/ } );
2423
2424=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
2425
2426Usually a hash ref, perhaps like this:
2427
2428    $record = {
2429        NAME   => "Jason",
2430        EMPNO  => 132,
2431        TITLE  => "deputy peon",
2432        AGE    => 23,
2433        SALARY => 37_000,
2434        PALS   => [ "Norbert", "Rhys", "Phineas"],
2435    };
2436
2437References are documented in L<perlref> and L<perlreftut>.
2438Examples of complex data structures are given in L<perldsc> and
2439L<perllol>. Examples of structures and object-oriented classes are
2440in L<perlootut>.
2441
2442=head2 How can I use a reference as a hash key?
2443
2444(contributed by brian d foy and Ben Morrow)
2445
2446Hash keys are strings, so you can't really use a reference as the key.
2447When you try to do that, perl turns the reference into its stringified
2448form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2449back the reference from the stringified form, at least without doing
2450some extra work on your own.
2451
2452Remember that the entry in the hash will still be there even if
2453the referenced variable  goes out of scope, and that it is entirely
2454possible for Perl to subsequently allocate a different variable at
2455the same address. This will mean a new variable might accidentally
2456be associated with the value for an old.
2457
2458If you have Perl 5.10 or later, and you just want to store a value
2459against the reference for lookup later, you can use the core
2460Hash::Util::Fieldhash module. This will also handle renaming the
2461keys if you use multiple threads (which causes all variables to be
2462reallocated at new addresses, changing their stringification), and
2463garbage-collecting the entries when the referenced variable goes out
2464of scope.
2465
2466If you actually need to be able to get a real reference back from
2467each hash entry, you can use the Tie::RefHash module, which does the
2468required work for you.
2469
2470=head2 How can I check if a key exists in a multilevel hash?
2471
2472(contributed by brian d foy)
2473
2474The trick to this problem is avoiding accidental autovivification. If
2475you want to check three keys deep, you might naE<0xEF>vely try this:
2476
2477    my %hash;
2478    if( exists $hash{key1}{key2}{key3} ) {
2479        ...;
2480    }
2481
2482Even though you started with a completely empty hash, after that call to
2483C<exists> you've created the structure you needed to check for C<key3>:
2484
2485    %hash = (
2486              'key1' => {
2487                          'key2' => {}
2488                        }
2489            );
2490
2491That's autovivification. You can get around this in a few ways. The
2492easiest way is to just turn it off. The lexical C<autovivification>
2493pragma is available on CPAN. Now you don't add to the hash:
2494
2495    {
2496        no autovivification;
2497        my %hash;
2498        if( exists $hash{key1}{key2}{key3} ) {
2499            ...;
2500        }
2501    }
2502
2503The L<Data::Diver> module on CPAN can do it for you too. Its C<Dive>
2504subroutine can tell you not only if the keys exist but also get the
2505value:
2506
2507    use Data::Diver qw(Dive);
2508
2509    my @exists = Dive( \%hash, qw(key1 key2 key3) );
2510    if(  ! @exists  ) {
2511        ...; # keys do not exist
2512    }
2513    elsif(  ! defined $exists[0]  ) {
2514        ...; # keys exist but value is undef
2515    }
2516
2517You can easily do this yourself too by checking each level of the hash
2518before you move onto the next level. This is essentially what
2519L<Data::Diver> does for you:
2520
2521    if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2522        ...;
2523    }
2524
2525    sub check_hash {
2526       my( $hash, @keys ) = @_;
2527
2528       return unless @keys;
2529
2530       foreach my $key ( @keys ) {
2531           return unless eval { exists $hash->{$key} };
2532           $hash = $hash->{$key};
2533        }
2534
2535       return 1;
2536    }
2537
2538=head2 How can I prevent addition of unwanted keys into a hash?
2539
2540Since version 5.8.0, hashes can be I<restricted> to a fixed number
2541of given keys. Methods for creating and dealing with restricted hashes
2542are exported by the L<Hash::Util> module.
2543
2544=head1 Data: Misc
2545
2546=head2 How do I handle binary data correctly?
2547
2548Perl is binary-clean, so it can handle binary data just fine.
2549On Windows or DOS, however, you have to use C<binmode> for binary
2550files to avoid conversions for line endings. In general, you should
2551use C<binmode> any time you want to work with binary data.
2552
2553Also see L<perlfunc/"binmode"> or L<perlopentut>.
2554
2555If you're concerned about 8-bit textual data then see L<perllocale>.
2556If you want to deal with multibyte characters, however, there are
2557some gotchas. See the section on Regular Expressions.
2558
2559=head2 How do I determine whether a scalar is a number/whole/integer/float?
2560
2561Assuming that you don't care about IEEE notations like "NaN" or
2562"Infinity", you probably just want to use a regular expression (see also
2563L<perlretut> and L<perlre>):
2564
2565    use 5.010;
2566
2567    if ( /\D/ )
2568        { say "\thas nondigits"; }
2569    if ( /^\d+\z/ )
2570        { say "\tis a whole number"; }
2571    if ( /^-?\d+\z/ )
2572        { say "\tis an integer"; }
2573    if ( /^[+-]?\d+\z/ )
2574        { say "\tis a +/- integer"; }
2575    if ( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2576        { say "\tis a real number"; }
2577    if ( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i )
2578        { say "\tis a C float" }
2579
2580There are also some commonly used modules for the task.
2581L<Scalar::Util> (distributed with 5.8) provides access to perl's
2582internal function C<looks_like_number> for determining whether a
2583variable looks like a number. L<Data::Types> exports functions that
2584validate data types using both the above and other regular
2585expressions. Thirdly, there is L<Regexp::Common> which has regular
2586expressions to match various types of numbers. Those three modules are
2587available from the CPAN.
2588
2589If you're on a POSIX system, Perl supports the C<POSIX::strtod>
2590function for converting strings to doubles (and also C<POSIX::strtol>
2591for longs). Its semantics are somewhat cumbersome, so here's a
2592C<getnum> wrapper function for more convenient access. This function
2593takes a string and returns the number it found, or C<undef> for input
2594that isn't a C float. The C<is_numeric> function is a front end to
2595C<getnum> if you just want to say, "Is this a float?"
2596
2597    sub getnum {
2598        use POSIX qw(strtod);
2599        my $str = shift;
2600        $str =~ s/^\s+//;
2601        $str =~ s/\s+$//;
2602        $! = 0;
2603        my($num, $unparsed) = strtod($str);
2604        if (($str eq '') || ($unparsed != 0) || $!) {
2605                return undef;
2606        }
2607        else {
2608            return $num;
2609        }
2610    }
2611
2612    sub is_numeric { defined getnum($_[0]) }
2613
2614Or you could check out the L<String::Scanf> module on the CPAN
2615instead.
2616
2617=head2 How do I keep persistent data across program calls?
2618
2619For some specific applications, you can use one of the DBM modules.
2620See L<AnyDBM_File>. More generically, you should consult the L<FreezeThaw>
2621or L<Storable> modules from CPAN. Starting from Perl 5.8, L<Storable> is part
2622of the standard distribution. Here's one example using L<Storable>'s C<store>
2623and C<retrieve> functions:
2624
2625    use Storable;
2626    store(\%hash, "filename");
2627
2628    # later on...
2629    $href = retrieve("filename");        # by ref
2630    %hash = %{ retrieve("filename") };   # direct to hash
2631
2632=head2 How do I print out or copy a recursive data structure?
2633
2634The L<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2635for printing out data structures. The L<Storable> module on CPAN (or the
26365.8 release of Perl), provides a function called C<dclone> that recursively
2637copies its argument.
2638
2639    use Storable qw(dclone);
2640    $r2 = dclone($r1);
2641
2642Where C<$r1> can be a reference to any kind of data structure you'd like.
2643It will be deeply copied. Because C<dclone> takes and returns references,
2644you'd have to add extra punctuation if you had a hash of arrays that
2645you wanted to copy.
2646
2647    %newhash = %{ dclone(\%oldhash) };
2648
2649=head2 How do I define methods for every class/object?
2650
2651(contributed by Ben Morrow)
2652
2653You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2654be very careful to consider the consequences of doing this: adding
2655methods to every object is very likely to have unintended
2656consequences. If possible, it would be better to have all your object
2657inherit from some common base class, or to use an object system like
2658Moose that supports roles.
2659
2660=head2 How do I verify a credit card checksum?
2661
2662Get the L<Business::CreditCard> module from CPAN.
2663
2664=head2 How do I pack arrays of doubles or floats for XS code?
2665
2666The arrays.h/arrays.c code in the L<PGPLOT> module on CPAN does just this.
2667If you're doing a lot of float or double processing, consider using
2668the L<PDL> module from CPAN instead--it makes number-crunching easy.
2669
2670See L<https://metacpan.org/release/PGPLOT> for the code.
2671
2672
2673=head1 AUTHOR AND COPYRIGHT
2674
2675Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
2676other authors as noted. All rights reserved.
2677
2678This documentation is free; you can redistribute it and/or modify it
2679under the same terms as Perl itself.
2680
2681Irrespective of its distribution, all code examples in this file
2682are hereby placed into the public domain. You are permitted and
2683encouraged to use this code in your own programs for fun
2684or for profit as you see fit. A simple comment in the code giving
2685credit would be courteous but is not required.
2686