1# PODNAME: Unicode.pod 2# ABSTRACT: Working with unicode 3 4__END__ 5 6=pod 7 8=encoding UTF-8 9 10=head1 NAME 11 12Unicode.pod - Working with unicode 13 14=head1 VERSION 15 16version 2.07 17 18=head1 DESCRIPTION 19 20Working with unicode. 21 22For a practical example, see the Catalyst application in the 23C<examples/unicode> directory in this distribution. 24 25=head1 ASSUMPTIONS 26 27In this tutorial, we're assuming that all encodings are UTF-8. It's 28relatively simple to combine different encodings from different sources, 29but that's beyond the scope of this tutorial. 30 31For simplicity, we're also going to assume that you're using L<Catalyst> 32for your web-framework, L<DBIx::Class> for your database ORM, 33L<TT|Template> for your templating system, and YAML format C<HTML::FormFu> 34configuration files, with L<YAML::XS> installed. However, the principles 35we'll cover should translate to whatever technologies you chose to work with. 36 37=head1 BASICS 38 39To make it short and sweet: you must decode all data going into your 40program, and encode all data coming from your program. 41 42Skip to L</CHANGES REQUIRED> if you want to see what you need to do without 43any other explanation. 44 45=head1 INPUT 46 47=head2 Input parameters from the browser 48 49If you're using C<Catalyst>, L<Catalyst::Plugin::Unicode> will decode all 50input parameters sent from the browser to your application - see 51L</Catalyst Configuration>. 52 53If you're using some other framework or, in any case, you need to decode 54the input parameters yourself, please take a look at 55L<HTML::FormFu::Filter::Encode>. 56 57=head2 Data from the database 58 59If you're using L<DBIx::Class>, L<DBIx::Class::UTF8Columns> is likely the 60best options, as it will decode all input retrieved from the database - 61see L</DBIx::Class Configuration>. 62 63In other cases (i.e. plain DBI), you still need to decode the string data 64coming from the database. This varies depending on the database server. 65For MySQL, for instance, you can use the C<mysql_enable_utf8> attribute: 66see L<DBD::mysql> documentation for details. 67 68=head2 Your template files 69 70Set TT to decode all template files - see L</TT Configuration>. 71 72=head2 HTML::FormFu's own template files 73 74Set C<HTML::FormFu> to decode all template files - see 75L</HTML::FormFu Template Configuration>. 76 77=head2 HTML::FormFu form configuration files 78 79If you're using C<YAML> config files, your files will automatically be 80decoded by C<load_config_file|HTML::FormFu/load_config_file> and 81C<load_config_filestem|HTML::FormFu/load_config_filestem>. 82 83If you have L<Config::General> config files, your files will automatically 84be decoded by C<load_config_file|HTML::FormFu/load_config_file> and 85C<load_config_filestem|HTML::FormFu/load_config_filestem>, which 86automatically sets L<Config::General's|Config::General> C<-UTF8> setting. 87 88=head2 Your perl source code 89 90Any perl source files which contain Unicode characters must use the 91L<utf8> module. 92 93=head1 OUTPUT 94 95=head2 Data saved to the database 96 97With C<DBIx::Class>, L<DBIx::Class::UTF8Columns> will encode all data sent 98to the database - see L</DBIx::Class Configuration>. 99 100=head2 HTML sent to the browser 101 102With C<Catalyst>, L<Catalyst::Plugin::Unicode> will encode all output sent 103from your application to the browser - see L</Catalyst Configuration>. 104 105In other circumstances you need to be sure to output your Unicode (decoded) 106strings in UTF-8. To do this you can encode your output before it's sent 107to the browser with something like: 108 109 use utf8; 110 if ( $output && utf8::is_utf8($output) ){ 111 utf8::encode( $output ); # Encodes in-place 112 } 113 114Another option is to set the C<binmode> for C<STDOUT>: 115 116 bindmode STDOUT, ':utf8'; 117 118However, be sure to do this B<only> when sending UTF-8 data: if you're 119serving images, PFD files, etc, C<binmode> should remain set to C<:raw>. 120 121=head1 CHANGES REQUIRED 122 123=head2 Catalyst Configuration 124 125Add L<Catalyst::Plugin::Unicode> to the list of Catalyst plugins: 126 127 use Catalyst qw( ConfigLoader Static::Simple Unicode ); 128 129=head2 DBIx::Class Configuration 130 131Add L<DBIx::Class::UTF8Columns> to the list of components loaded, for each 132table that has columns storing unicode: 133 134 __PACKAGE__->load_components( qw( UTF8Columns HTML::FormFu PK::Auto Core ) ); 135 136Pass each column name that will store unicode to C<utf8_columns()>: 137 138 __PACKAGE__->utf8_columns( qw( lastname firstname ) ); 139 140=head2 TT Configuration 141 142Tell TT to decode all template files, by adding the following to your 143application config in MyApp.pm 144 145 package MyApp; 146 use parent 'Catalyst'; 147 use Catalyst qw( ConfigLoader ); 148 149 MyApp->config({ 150 'View::TT' => { 151 ENCODING => 'UTF-8', 152 }, 153 }); 154 155 1; 156 157=head2 HTML::FormFu Template Configuration 158 159Make C<HTML::FormFu> tell TT to decode all template files, by adding the 160following to your C<myapp.yml> Catalyst configuration file: 161 162 package MyApp; 163 use parent 'Catalyst'; 164 use Catalyst qw( ConfigLoader ); 165 166 MyApp->config({ 167 'Controller::HTML::FormFu' => { 168 constructor => { 169 tt_args => { 170 ENCODING => 'UTF-8', 171 }, 172 }, 173 }, 174 }); 175 176 1; 177 178These above 2 examples should be combined, like so: 179 180 package MyApp; 181 use parent 'Catalyst'; 182 use Catalyst qw( ConfigLoader ); 183 184 MyApp->config({ 185 'Controller::HTML::FormFu' => { 186 constructor => { 187 tt_args => { 188 ENCODING => 'UTF-8', 189 }, 190 }, 191 }, 192 'View::TT' => { 193 ENCODING => 'UTF-8', 194 }, 195 }); 196 197 1; 198 199=head1 AUTHORS 200 201Carl Franks C<cfranks@cpan.org> 202Michele Beltrame C<arthas@cpan.org> (contributions) 203 204=head1 COPYRIGHT 205 206This document is free, you can redistribute it and/or modify it 207under the same terms as Perl itself. 208 209=head1 AUTHOR 210 211Carl Franks <cpan@fireartist.com> 212 213=head1 COPYRIGHT AND LICENSE 214 215This software is copyright (c) 2018 by Carl Franks. 216 217This is free software; you can redistribute it and/or modify it under 218the same terms as the Perl 5 programming language system itself. 219 220=cut 221