1package bytes; 2 3our $VERSION = '1.04'; 4 5$bytes::hint_bits = 0x00000008; 6 7sub import { 8 $^H |= $bytes::hint_bits; 9} 10 11sub unimport { 12 $^H &= ~$bytes::hint_bits; 13} 14 15sub AUTOLOAD { 16 require "bytes_heavy.pl"; 17 goto &$AUTOLOAD if defined &$AUTOLOAD; 18 require Carp; 19 Carp::croak("Undefined subroutine $AUTOLOAD called"); 20} 21 22sub length (_); 23sub chr (_); 24sub ord (_); 25sub substr ($$;$$); 26sub index ($$;$); 27sub rindex ($$;$); 28 291; 30__END__ 31 32=head1 NAME 33 34bytes - Perl pragma to force byte semantics rather than character semantics 35 36=head1 NOTICE 37 38This pragma reflects early attempts to incorporate Unicode into perl and 39has since been superseded. It breaks encapsulation (i.e. it exposes the 40innards of how the perl executable currently happens to store a string), 41and use of this module for anything other than debugging purposes is 42strongly discouraged. If you feel that the functions here within might be 43useful for your application, this possibly indicates a mismatch between 44your mental model of Perl Unicode and the current reality. In that case, 45you may wish to read some of the perl Unicode documentation: 46L<perluniintro>, L<perlunitut>, L<perlunifaq> and L<perlunicode>. 47 48=head1 SYNOPSIS 49 50 use bytes; 51 ... chr(...); # or bytes::chr 52 ... index(...); # or bytes::index 53 ... length(...); # or bytes::length 54 ... ord(...); # or bytes::ord 55 ... rindex(...); # or bytes::rindex 56 ... substr(...); # or bytes::substr 57 no bytes; 58 59 60=head1 DESCRIPTION 61 62The C<use bytes> pragma disables character semantics for the rest of the 63lexical scope in which it appears. C<no bytes> can be used to reverse 64the effect of C<use bytes> within the current lexical scope. 65 66Perl normally assumes character semantics in the presence of character 67data (i.e. data that has come from a source that has been marked as 68being of a particular character encoding). When C<use bytes> is in 69effect, the encoding is temporarily ignored, and each string is treated 70as a series of bytes. 71 72As an example, when Perl sees C<$x = chr(400)>, it encodes the character 73in UTF-8 and stores it in $x. Then it is marked as character data, so, 74for instance, C<length $x> returns C<1>. However, in the scope of the 75C<bytes> pragma, $x is treated as a series of bytes - the bytes that make 76up the UTF8 encoding - and C<length $x> returns C<2>: 77 78 $x = chr(400); 79 print "Length is ", length $x, "\n"; # "Length is 1" 80 printf "Contents are %vd\n", $x; # "Contents are 400" 81 { 82 use bytes; # or "require bytes; bytes::length()" 83 print "Length is ", length $x, "\n"; # "Length is 2" 84 printf "Contents are %vd\n", $x; # "Contents are 198.144" 85 } 86 87chr(), ord(), substr(), index() and rindex() behave similarly. 88 89For more on the implications and differences between character 90semantics and byte semantics, see L<perluniintro> and L<perlunicode>. 91 92=head1 LIMITATIONS 93 94bytes::substr() does not work as an lvalue(). 95 96=head1 SEE ALSO 97 98L<perluniintro>, L<perlunicode>, L<utf8> 99 100=cut 101