1=head1 NAME 2 3perlreftut - Mark's very short tutorial about references 4 5=head1 DESCRIPTION 6 7One of the most important new features in Perl 5 was the capability to 8manage complicated data structures like multidimensional arrays and 9nested hashes. To enable these, Perl 5 introduced a feature called 10I<references>, and using references is the key to managing complicated, 11structured data in Perl. Unfortunately, there's a lot of funny syntax 12to learn, and the main manual page can be hard to follow. The manual 13is quite complete, and sometimes people find that a problem, because 14it can be hard to tell what is important and what isn't. 15 16Fortunately, you only need to know 10% of what's in the main page to get 1790% of the benefit. This page will show you that 10%. 18 19=head1 Who Needs Complicated Data Structures? 20 21One problem that comes up all the time is needing a hash whose values are 22lists. Perl has hashes, of course, but the values have to be scalars; 23they can't be lists. 24 25Why would you want a hash of lists? Let's take a simple example: You 26have a file of city and country names, like this: 27 28 Chicago, USA 29 Frankfurt, Germany 30 Berlin, Germany 31 Washington, USA 32 Helsinki, Finland 33 New York, USA 34 35and you want to produce an output like this, with each country mentioned 36once, and then an alphabetical list of the cities in that country: 37 38 Finland: Helsinki. 39 Germany: Berlin, Frankfurt. 40 USA: Chicago, New York, Washington. 41 42The natural way to do this is to have a hash whose keys are country 43names. Associated with each country name key is a list of the cities in 44that country. Each time you read a line of input, split it into a country 45and a city, look up the list of cities already known to be in that 46country, and append the new city to the list. When you're done reading 47the input, iterate over the hash as usual, sorting each list of cities 48before you print it out. 49 50If hash values couldn't be lists, you lose. You'd probably have to 51combine all the cities into a single string somehow, and then when 52time came to write the output, you'd have to break the string into a 53list, sort the list, and turn it back into a string. This is messy 54and error-prone. And it's frustrating, because Perl already has 55perfectly good lists that would solve the problem if only you could 56use them. 57 58=head1 The Solution 59 60By the time Perl 5 rolled around, we were already stuck with this 61design: Hash values must be scalars. The solution to this is 62references. 63 64A reference is a scalar value that I<refers to> an entire array or an 65entire hash (or to just about anything else). Names are one kind of 66reference that you're already familiar with. Each human being is a 67messy, inconvenient collection of cells. But to refer to a particular 68human, for instance the first computer programmer, it isn't necessary to 69describe each of their cells; all you need is the easy, convenient 70scalar string "Ada Lovelace". 71 72References in Perl are like names for arrays and hashes. They're 73Perl's private, internal names, so you can be sure they're 74unambiguous. Unlike a human name, a reference only refers to one 75thing, and you always know what it refers to. If you have a reference 76to an array, you can recover the entire array from it. If you have a 77reference to a hash, you can recover the entire hash. But the 78reference is still an easy, compact scalar value. 79 80You can't have a hash whose values are arrays; hash values can only be 81scalars. We're stuck with that. But a single reference can refer to 82an entire array, and references are scalars, so you can have a hash of 83references to arrays, and it'll act a lot like a hash of arrays, and 84it'll be just as useful as a hash of arrays. 85 86We'll come back to this city-country problem later, after we've seen 87some syntax for managing references. 88 89 90=head1 Syntax 91 92There are just two ways to make a reference, and just two ways to use 93it once you have it. 94 95=head2 Making References 96 97=head3 B<Make Rule 1> 98 99If you put a C<\> in front of a variable, you get a 100reference to that variable. 101 102 $aref = \@array; # $aref now holds a reference to @array 103 $href = \%hash; # $href now holds a reference to %hash 104 $sref = \$scalar; # $sref now holds a reference to $scalar 105 106Once the reference is stored in a variable like $aref or $href, you 107can copy it or store it just the same as any other scalar value: 108 109 $xy = $aref; # $xy now holds a reference to @array 110 $p[3] = $href; # $p[3] now holds a reference to %hash 111 $z = $p[3]; # $z now holds a reference to %hash 112 113 114These examples show how to make references to variables with names. 115Sometimes you want to make an array or a hash that doesn't have a 116name. This is analogous to the way you like to be able to use the 117string C<"\n"> or the number 80 without having to store it in a named 118variable first. 119 120=head3 B<Make Rule 2> 121 122C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to 123that array. C<{ ITEMS }> makes a new, anonymous hash, and returns a 124reference to that hash. 125 126 $aref = [ 1, "foo", undef, 13 ]; 127 # $aref now holds a reference to an array 128 129 $href = { APR => 4, AUG => 8 }; 130 # $href now holds a reference to a hash 131 132 133The references you get from rule 2 are the same kind of 134references that you get from rule 1: 135 136 # This: 137 $aref = [ 1, 2, 3 ]; 138 139 # Does the same as this: 140 @array = (1, 2, 3); 141 $aref = \@array; 142 143 144The first line is an abbreviation for the following two lines, except 145that it doesn't create the superfluous array variable C<@array>. 146 147If you write just C<[]>, you get a new, empty anonymous array. 148If you write just C<{}>, you get a new, empty anonymous hash. 149 150 151=head2 Using References 152 153What can you do with a reference once you have it? It's a scalar 154value, and we've seen that you can store it as a scalar and get it back 155again just like any scalar. There are just two more ways to use it: 156 157=head3 B<Use Rule 1> 158 159You can always use an array reference, in curly braces, in place of 160the name of an array. For example, C<@{$aref}> instead of C<@array>. 161 162Here are some examples of that: 163 164Arrays: 165 166 167 @a @{$aref} An array 168 reverse @a reverse @{$aref} Reverse the array 169 $a[3] ${$aref}[3] An element of the array 170 $a[3] = 17; ${$aref}[3] = 17 Assigning an element 171 172 173On each line are two expressions that do the same thing. The 174left-hand versions operate on the array C<@a>. The right-hand 175versions operate on the array that is referred to by C<$aref>. Once 176they find the array they're operating on, both versions do the same 177things to the arrays. 178 179Using a hash reference is I<exactly> the same: 180 181 %h %{$href} A hash 182 keys %h keys %{$href} Get the keys from the hash 183 $h{'red'} ${$href}{'red'} An element of the hash 184 $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element 185 186Whatever you want to do with a reference, B<Use Rule 1> tells you how 187to do it. You just write the Perl code that you would have written 188for doing the same thing to a regular array or hash, and then replace 189the array or hash name with C<{$reference}>. "How do I loop over an 190array when all I have is a reference?" Well, to loop over an array, you 191would write 192 193 for my $element (@array) { 194 ... 195 } 196 197so replace the array name, C<@array>, with the reference: 198 199 for my $element (@{$aref}) { 200 ... 201 } 202 203"How do I print out the contents of a hash when all I have is a 204reference?" First write the code for printing out a hash: 205 206 for my $key (keys %hash) { 207 print "$key => $hash{$key}\n"; 208 } 209 210And then replace the hash name with the reference: 211 212 for my $key (keys %{$href}) { 213 print "$key => ${$href}{$key}\n"; 214 } 215 216=head3 B<Use Rule 2> 217 218L<B<Use Rule 1>|/B<Use Rule 1>> is all you really need, because it tells 219you how to do absolutely everything you ever need to do with references. 220But the most common thing to do with an array or a hash is to extract a 221single element, and the L<B<Use Rule 1>|/B<Use Rule 1>> notation is 222cumbersome. So there is an abbreviation. 223 224C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> 225instead. 226 227C<${$href}{red}> is too hard to read, so you can write 228C<< $href->{red} >> instead. 229 230If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is 231the fourth element of the array. Don't confuse this with C<$aref[3]>, 232which is the fourth element of a totally different array, one 233deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the 234same way that C<$item> and C<@item> are. 235 236Similarly, C<< $href->{'red'} >> is part of the hash referred to by 237the scalar variable C<$href>, perhaps even one with no name. 238C<$href{'red'}> is part of the deceptively named C<%href> hash. It's 239easy to forget to leave out the C<< -> >>, and if you do, you'll get 240bizarre results when your program gets array and hash elements out of 241totally unexpected hashes and arrays that weren't the ones you wanted 242to use. 243 244 245=head2 An Example 246 247Let's see a quick example of how all this is useful. 248 249First, remember that C<[1, 2, 3]> makes an anonymous array containing 250C<(1, 2, 3)>, and gives you a reference to that array. 251 252Now think about 253 254 @a = ( [1, 2, 3], 255 [4, 5, 6], 256 [7, 8, 9] 257 ); 258 259C<@a> is an array with three elements, and each one is a reference to 260another array. 261 262C<$a[1]> is one of these references. It refers to an array, the array 263containing C<(4, 5, 6)>, and because it is a reference to an array, 264L<B<Use Rule 2>|/B<Use Rule 2>> says that we can write C<< $a[1]->[2] >> 265to get the third element from that array. C<< $a[1]->[2] >> is the 6. 266Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a 267two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get or 268set the element in any row and any column of the array. 269 270The notation still looks a little cumbersome, so there's one more 271abbreviation: 272 273=head2 Arrow Rule 274 275In between two B<subscripts>, the arrow is optional. 276 277Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the 278same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write 279C<$a[0][1] = 23>; it means the same thing. 280 281Now it really looks like two-dimensional arrays! 282 283You can see why the arrows are important. Without them, we would have 284had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For 285three-dimensional arrays, they let us write C<$x[2][3][5]> instead of 286the unreadable C<${${$x[2]}[3]}[5]>. 287 288=head1 Solution 289 290Here's the answer to the problem I posed earlier, of reformatting a 291file of city and country names. 292 293 1 my %table; 294 295 2 while (<>) { 296 3 chomp; 297 4 my ($city, $country) = split /, /; 298 5 $table{$country} = [] unless exists $table{$country}; 299 6 push @{$table{$country}}, $city; 300 7 } 301 302 8 for my $country (sort keys %table) { 303 9 print "$country: "; 304 10 my @cities = @{$table{$country}}; 305 11 print join ', ', sort @cities; 306 12 print ".\n"; 307 13 } 308 309 310The program has two pieces: Lines 2-7 read the input and build a data 311structure, and lines 8-13 analyze the data and print out the report. 312We're going to have a hash, C<%table>, whose keys are country names, 313and whose values are references to arrays of city names. The data 314structure will look like this: 315 316 317 %table 318 +-------+---+ 319 | | | +-----------+--------+ 320 |Germany| *---->| Frankfurt | Berlin | 321 | | | +-----------+--------+ 322 +-------+---+ 323 | | | +----------+ 324 |Finland| *---->| Helsinki | 325 | | | +----------+ 326 +-------+---+ 327 | | | +---------+------------+----------+ 328 | USA | *---->| Chicago | Washington | New York | 329 | | | +---------+------------+----------+ 330 +-------+---+ 331 332We'll look at output first. Supposing we already have this structure, 333how do we print it out? 334 335 8 for my $country (sort keys %table) { 336 9 print "$country: "; 337 10 my @cities = @{$table{$country}}; 338 11 print join ', ', sort @cities; 339 12 print ".\n"; 340 13 } 341 342C<%table> is an ordinary hash, and we get a list of keys from it, sort 343the keys, and loop over the keys as usual. The only use of references 344is in line 10. C<$table{$country}> looks up the key C<$country> in the 345hash and gets the value, which is a reference to an array of cities in 346that country. L<B<Use Rule 1>|/B<Use Rule 1>> says that we can recover 347the array by saying C<@{$table{$country}}>. Line 10 is just like 348 349 @cities = @array; 350 351except that the name C<array> has been replaced by the reference 352C<{$table{$country}}>. The C<@> tells Perl to get the entire array. 353Having gotten the list of cities, we sort it, join it, and print it 354out as usual. 355 356Lines 2-7 are responsible for building the structure in the first 357place. Here they are again: 358 359 2 while (<>) { 360 3 chomp; 361 4 my ($city, $country) = split /, /; 362 5 $table{$country} = [] unless exists $table{$country}; 363 6 push @{$table{$country}}, $city; 364 7 } 365 366Lines 2-4 acquire a city and country name. Line 5 looks to see if the 367country is already present as a key in the hash. If it's not, the 368program uses the C<[]> notation (L<B<Make Rule 2>|/B<Make Rule 2>>) to 369manufacture a new, empty anonymous array of cities, and installs a 370reference to it into the hash under the appropriate key. 371 372Line 6 installs the city name into the appropriate array. 373C<$table{$country}> now holds a reference to the array of cities seen 374in that country so far. Line 6 is exactly like 375 376 push @array, $city; 377 378except that the name C<array> has been replaced by the reference 379C<{$table{$country}}>. The L<C<push>|perlfunc/push ARRAY,LIST> adds a 380city name to the end of the referred-to array. 381 382There's one fine point I skipped. Line 5 is unnecessary, and we can 383get rid of it. 384 385 2 while (<>) { 386 3 chomp; 387 4 my ($city, $country) = split /, /; 388 5 #### $table{$country} = [] unless exists $table{$country}; 389 6 push @{$table{$country}}, $city; 390 7 } 391 392If there's already an entry in C<%table> for the current C<$country>, 393then nothing is different. Line 6 will locate the value in 394C<$table{$country}>, which is a reference to an array, and push C<$city> 395into the array. But what does it do when C<$country> holds a key, say 396C<Greece>, that is not yet in C<%table>? 397 398This is Perl, so it does the exact right thing. It sees that you want 399to push C<Athens> onto an array that doesn't exist, so it helpfully 400makes a new, empty, anonymous array for you, installs it into 401C<%table>, and then pushes C<Athens> onto it. This is called 402I<autovivification>--bringing things to life automatically. Perl saw 403that the key wasn't in the hash, so it created a new hash entry 404automatically. Perl saw that you wanted to use the hash value as an 405array, so it created a new empty array and installed a reference to it 406in the hash automatically. And as usual, Perl made the array one 407element longer to hold the new city name. 408 409=head1 The Rest 410 411I promised to give you 90% of the benefit with 10% of the details, and 412that means I left out 90% of the details. Now that you have an 413overview of the important parts, it should be easier to read the 414L<perlref> manual page, which discusses 100% of the details. 415 416Some of the highlights of L<perlref>: 417 418=over 4 419 420=item * 421 422You can make references to anything, including scalars, functions, and 423other references. 424 425=item * 426 427In L<B<Use Rule 1>|/B<Use Rule 1>>, you can omit the curly brackets 428whenever the thing inside them is an atomic scalar variable like 429C<$aref>. For example, C<@$aref> is the same as C<@{$aref}>, and 430C<$$aref[1]> is the same as C<${$aref}[1]>. If you're just starting 431out, you may want to adopt the habit of always including the curly 432brackets. 433 434=item * 435 436This doesn't copy the underlying array: 437 438 $aref2 = $aref1; 439 440You get two references to the same array. If you modify 441C<< $aref1->[23] >> and then look at 442C<< $aref2->[23] >> you'll see the change. 443 444To copy the array, use 445 446 $aref2 = [@{$aref1}]; 447 448This uses C<[...]> notation to create a new anonymous array, and 449C<$aref2> is assigned a reference to the new array. The new array is 450initialized with the contents of the array referred to by C<$aref1>. 451 452Similarly, to copy an anonymous hash, you can use 453 454 $href2 = {%{$href1}}; 455 456=item * 457 458To see if a variable contains a reference, use the 459L<C<ref>|perlfunc/ref EXPR> function. It returns true if its argument 460is a reference. Actually it's a little better than that: It returns 461C<HASH> for hash references and C<ARRAY> for array references. 462 463=item * 464 465If you try to use a reference like a string, you get strings like 466 467 ARRAY(0x80f5dec) or HASH(0x826afc0) 468 469If you ever see a string that looks like this, you'll know you 470printed out a reference by mistake. 471 472A side effect of this representation is that you can use 473L<C<eq>|perlop/Equality Operators> to see if two references refer to the 474same thing. (But you should usually use 475L<C<==>|perlop/Equality Operators> instead because it's much faster.) 476 477=item * 478 479You can use a string as if it were a reference. If you use the string 480C<"foo"> as an array reference, it's taken to be a reference to the 481array C<@foo>. This is called a I<symbolic reference>. The declaration 482L<C<use strict 'refs'>|strict> disables this feature, which can cause 483all sorts of trouble if you use it by accident. 484 485=back 486 487You might prefer to go on to L<perllol> instead of L<perlref>; it 488discusses lists of lists and multidimensional arrays in detail. After 489that, you should move on to L<perldsc>; it's a Data Structure Cookbook 490that shows recipes for using and printing out arrays of hashes, hashes 491of arrays, and other kinds of data. 492 493=head1 Summary 494 495Everyone needs compound data structures, and in Perl the way you get 496them is with references. There are four important rules for managing 497references: Two for making references and two for using them. Once 498you know these rules you can do most of the important things you need 499to do with references. 500 501=head1 Credits 502 503Author: Mark Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) 504 505This article originally appeared in I<The Perl Journal> 506( L<http://www.tpj.com/> ) volume 3, #2. Reprinted with permission. 507 508The original title was I<Understand References Today>. 509 510=head2 Distribution Conditions 511 512Copyright 1998 The Perl Journal. 513 514This documentation is free; you can redistribute it and/or modify it 515under the same terms as Perl itself. 516 517Irrespective of its distribution, all code examples in these files are 518hereby placed into the public domain. You are permitted and 519encouraged to use this code in your own programs for fun or for profit 520as you see fit. A simple comment in the code giving credit would be 521courteous but is not required. 522 523 524 525 526=cut 527