1#!/usr/bin/perl -w 2 3# Po4a::Sgml.pm 4# 5# extract and translate translatable strings from an sgml based document. 6# 7# This code is an adapted version of sgmlspl (SGML postprocessor for the 8# SGMLS and NSGMLS parsers) which was: 9# 10# Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca> 11# 12# The adaptation for po4a was done by Denis Barbier <barbier@linuxfr.org>, 13# Martin Quinson (mquinson#debian.org) and others. 14# 15# This program is free software; you can redistribute it and/or modify 16# it under the terms of the GNU General Public License as published by 17# the Free Software Foundation; either version 2 of the License, or 18# (at your option) any later version. 19# 20# This program is distributed in the hope that it will be useful, 21# but WITHOUT ANY WARRANTY; without even the implied warranty of 22# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 23# GNU General Public License for more details. 24# 25# You should have received a copy of the GNU General Public License 26# along with this program; if not, write to the Free Software 27# Foundation, Inc., 28# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 29# 30######################################################################## 31 32=encoding UTF-8 33 34=head1 NAME 35 36Locale::Po4a::Sgml - convert SGML documents from/to PO files 37 38=head1 DESCRIPTION 39 40The po4a (PO for anything) project goal is to ease translations (and more 41interestingly, the maintenance of translations) using gettext tools on 42areas where they were not expected like documentation. 43 44Locale::Po4a::Sgml is a module to help the translation of documentation in 45the SGML format into other [human] languages. 46 47This module uses B<onsgmls>(1) to parse the SGML files. Make sure it is 48installed. 49Also make sure that the DTD of the SGML files are installed in the system. 50 51=head1 OPTIONS ACCEPTED BY THIS MODULE 52 53=over 4 54 55=item B<debug> 56 57Space separated list of keywords indicating which part you want to debug. Possible values are: tag, generic, entities and refs. 58 59=item B<verbose> 60 61Give more information about what's going on. 62 63=item B<translate> 64 65Space separated list of extra tags (beside the DTD provided ones) whose 66content should form an extra msgid. 67 68=item B<section> 69 70Space separated list of extra tags (beside the DTD provided ones) 71containing other tags, some of them being of category B<translate>. 72 73=item B<indent> 74 75Space separated list of tags which increase the indentation level. 76 77=item B<verbatim> 78 79The layout within those tags should not be changed. The paragraph won't get 80wrapped, and no extra indentation space or new line will be added for 81cosmetic purpose. 82 83=item B<empty> 84 85Tags not needing to be closed. 86 87=item B<ignore> 88 89Tags ignored and considered as plain char data by po4a. That is to say that 90they can be part of an msgid. For example, E<lt>bE<gt> is a good candidate 91for this category since putting it in the translate section would create 92msgids not being whole sentences, which is bad. 93 94=item B<attributes> 95 96A space separated list of attributes that need to be translated. You can 97specify the attributes by their name (for example, "lang"), but you can also 98prefix it with a tag hierarchy, to specify that this attribute will only be 99translated when it is into the specified tag. For example: 100E<lt>bbbE<gt>E<lt>aaaE<gt>lang specifies that the lang attribute will only be 101translated if it is in an E<lt>aaaE<gt> tag, which is in a E<lt>bbbE<gt> tag. 102The tag names are actually regular expressions so you can also write things 103like E<lt>aaa|bbbbE<gt>lang to only translate lang attributes that are in 104an E<lt>aaaE<gt> or a E<lt>bbbE<gt> tag. 105 106=item B<qualify> 107 108A space separated list of attributes for which the translation must be 109qualified by the attribute name. Note that this setting automatically adds the 110given attribute into the 'attributes' list too. 111 112=item B<force> 113 114Proceed even if the DTD is unknown or if onsgmls finds errors in the input 115file. 116 117=item B<include-all> 118 119By default, msgids containing only one entity (like '&version;') are skipped 120for the translator comfort. Activating this option prevents this 121optimisation. It can be useful if the document contains a construction like 122"<title>Á</title>", even if I doubt such things to ever happen... 123 124=item B<ignore-inclusion> 125 126Space separated list of entities that won't be inlined. 127Use this option with caution: it may cause onsgmls (used internally) to add 128tags and render the output document invalid. 129 130=back 131 132=head1 STATUS OF THIS MODULE 133 134The result is perfect. I.e., the generated documents are exactly the 135same. But there are still some problems: 136 137=over 2 138 139=item * 140 141The error output of onsgmls is redirected to /dev/null by default, which is clearly 142bad. I don't know how to avoid that. 143 144The problem is that I have to "protect" the conditional inclusions (i.e. the 145C<E<lt>! [ %foo [> and C<]]E<gt>> stuff) from onsgmls. Otherwise 146onsgmls eats them, and I don't know how to restore them in the final 147document. To prevent that, I rewrite them to C<{PO4A-beg-foo}> and 148C<{PO4A-end}>. 149 150The problem with this is that the C<{PO4A-end}> and such I add are invalid in 151the document (not in a E<lt>pE<gt> tag or so). 152 153If you want to view the onsgmls output, just add the following to your command line (or po4a configuration line): 154 155 -o debug=onsgmls 156 157=item * 158 159It does work only with the DebianDoc and DocBook DTD. Adding support for a 160new DTD should be very easy. The mechanism is the same for every DTD, you just 161have to give a list of the existing tags and some of their characteristics. 162 163I agree, this needs some more documentation, but it is still considered as 164beta, and I hate to document stuff which may/will change. 165 166=item * 167 168Warning, support for DTDs is quite experimental. I did not read any 169reference manual to find the definition of every tag. I did add tag 170definition to the module 'till it works for some documents I found on the 171net. If your document use more tags than mine, it won't work. But as I said 172above, fixing that should be quite easy. 173 174I did test DocBook against the SAG (System Administrator Guide) only, but 175this document is quite big, and should use most of the DocBook 176specificities. 177 178For DebianDoc, I tested some of the manuals from the DDP, but not all yet. 179 180=item * 181 182In case of file inclusion, string reference of messages in PO files 183(i.e. lines like C<#: en/titletoc.sgml:9460>) will be wrong. 184 185This is because I preprocess the file to protect the conditional inclusion 186(i.e. the C<E<lt>! [ %foo [> and C<]]E<gt>> stuff) and some entities (like 187&version;) from onsgmls because I want them verbatim to the generated 188document. For that, I make a temp copy of the input file and do all the 189changes I want to this before passing it to onsgmls for parsing. 190 191So that it works, I replace the entities asking for a file inclusion by the 192content of the given file (so that I can protect what needs to be in a subfile 193also). But nothing is done so far to correct the references (i.e., filename 194and line number) afterward. I'm not sure what the best thing to do is. 195 196=back 197 198=cut 199 200package Locale::Po4a::Sgml; 201 202use 5.006; 203use strict; 204use warnings; 205 206require Exporter; 207use vars qw(@ISA @EXPORT); 208@ISA = qw(Locale::Po4a::TransTractor); 209@EXPORT = qw(); 210 211use Locale::Po4a::TransTractor; 212use Locale::Po4a::Common; 213 214eval qq{use SGMLS}; 215if ($@) { 216 die wrap_mod( 217 "po4a::sgml", 218 dgettext( 219 "po4a", 220 "The needed module SGMLS.pm was not found and needs to be installed. It can be found on the CPAN, in package libsgmls-perl on debian, etc." 221 ) 222 ); 223} 224 225use File::Temp; 226 227my %debug = ( 228 'tag' => 0, 229 'generic' => 0, 230 'entities' => 0, 231 'refs' => 0, 232 'onsgmls' => 0 233); 234 235my $xmlprolog = undef; # the '<?xml ... ?>' line if existing 236 237sub initialize { 238 my $self = shift; 239 my %options = @_; 240 241 $self->{options}{'translate'} = ''; 242 $self->{options}{'section'} = ''; 243 $self->{options}{'indent'} = ''; 244 $self->{options}{'empty'} = ''; 245 $self->{options}{'verbatim'} = ''; 246 $self->{options}{'ignore'} = ''; 247 $self->{options}{'ignore-inclusion'} = ''; 248 249 $self->{options}{'include-all'} = ''; 250 251 $self->{options}{'force'} = ''; 252 253 $self->{options}{'verbose'} = ''; 254 $self->{options}{'debug'} = ''; 255 256 foreach my $opt ( keys %options ) { 257 if ( $options{$opt} ) { 258 die wrap_mod( "po4a::sgml", dgettext( "po4a", "Unknown option: %s" ), $opt ) 259 unless exists $self->{options}{$opt}; 260 $self->{options}{$opt} = $options{$opt}; 261 } 262 } 263 if ( $options{'debug'} ) { 264 foreach ( split /\s+/, $options{'debug'} ) { 265 die wrap_mod( "po4a::sgml", dgettext( "po4a", "Unknown debug category: %s. Known categories:\n%s" ), 266 $_, join( " ", keys %debug ) ) 267 unless exists $debug{$_}; 268 $debug{$_} = 1; 269 } 270 } 271} 272 273sub read { 274 my ( $self, $filename, $refname ) = @_; 275 276 push @{ $self->{DOCPOD}{infile} }, $filename; 277 $self->Locale::Po4a::TransTractor::read( $filename, $refname ); 278} 279 280sub parse { 281 my $self = shift; 282 map { $self->parse_file($_) } @{ $self->{DOCPOD}{infile} }; 283} 284 285# 286# Filter out some uninteresting strings for translation 287# 288sub translate { 289 my ($self) = (shift); 290 my ( $string, $ref, $type ) = ( shift, shift, shift ); 291 my (%options) = @_; 292 293 # don't translate entries composed of one entity 294 if ( ( ( $string =~ /^&[^;]*;$/ ) || ( $options{'wrap'} && $string =~ /^\s*&[^;]*;\s*$/ ) ) 295 && !( $self->{options}{'include-all'} ) ) 296 { 297 warn wrap_mod( "po4a::sgml", dgettext( "po4a", "msgid skipped to help translators (contains only an entity)" ) ) 298 unless $self->verbose() <= 0; 299 return $string . ( $options{'wrap'} ? "\n" : "" ); 300 } 301 302 # don't translate entries composed of tags only 303 if ( $string =~ /^(((<[^>]*>)|\s)*)$/ 304 && !( $self->{options}{'include-all'} ) ) 305 { 306 warn wrap_mod( "po4a::sgml", dgettext( "po4a", "msgid skipped to help translators (contains only tags)" ) ) 307 unless $self->verbose() <= 0; 308 return $string . ( $options{'wrap'} ? "\n" : "" ); 309 } 310 311 # don't translate entries composed of marked section tags only 312 if ( ( $string =~ /^(?:<!\s*\[\s*[^\[]+\s*\[|\]\s*]\s*>|\s)*$/ ) 313 && !( $self->{options}{'include-all'} ) ) 314 { 315 warn wrap_mod( 316 "po4a::sgml", 317 dgettext( 318 "po4a", 319 "msgid skipped to " 320 . "help translators (contains only opening or closing " 321 . "tags of marked sections)" 322 ), 323 $string 324 ) unless $self->verbose() <= 0; 325 return $string . ( $options{'wrap'} ? "\n" : "" ); 326 } 327 328 $string = $self->SUPER::translate( $string, $ref, $type, %options ); 329 330 $string = $self->post_trans( $string, $ref, $type ); 331 332 return $string; 333} 334 335sub post_trans { 336 my ( $self, $str, $ref, $type ) = @_; 337 338 # Change ascii non-breaking space to an 339 my $nbs_out = "\xA0"; 340 my $enc_length = Encode::from_to( $nbs_out, "latin1", $self->get_out_charset ); 341 $str =~ s/\Q$nbs_out/ /g if defined $enc_length; 342 343 return $str; 344} 345 346# 347# Make sure our cruft is removed from the file 348# 349sub pushline { 350 my ( $self, $line ) = @_; 351 $line =~ s/{PO4A-amp}/&/g; 352 $self->SUPER::pushline($line); 353} 354 355sub set_tags_kind { 356 my $self = shift; 357 my (%kinds) = @_; 358 359 foreach (qw(translate empty section verbatim ignore attributes qualify)) { 360 $self->{SGML}->{k}{$_} = $self->{options}{$_} ? $self->{options}{$_} . ' ' : ''; 361 362 # Remove the default behavior for the tags defined with the 363 # options. 364 foreach my $k ( keys %kinds ) { 365 foreach my $t ( split( " ", $self->{SGML}->{k}{$_} ) ) { 366 $kinds{$k} =~ s/\b$t\b//; 367 } 368 } 369 } 370 371 foreach ( keys %kinds ) { 372 die "po4a::sgml: internal error: set_tags_kind called with unrecognized arg $_" 373 if ( $_ !~ /^(translate|empty|verbatim|ignore|indent|attributes|qualify)$/ ); 374 375 $self->{SGML}->{k}{$_} .= $kinds{$_}; 376 } 377} 378 379# 380# Do the actual work, using the SGMLS package and settings done elsewhere. 381# 382sub parse_file { 383 my ( $self, $mastername ) = @_; 384 my ($prolog); 385 386 # Rewrite the file to: 387 # - protect optional inclusion marker (i.e. "<![ %str [" and "]]>") 388 # - protect entities from expansion (ie "&release;") 389 my $origfile = ""; 390 my $i = 0; 391 while ( $i < @{ $self->{TT}{doc_in} } ) { 392 $origfile .= ${ $self->{TT}{doc_in} }[$i]; 393 $i += 2; 394 } 395 396 unless ( $self->{options}{'force'} ) { 397 398 # Detect if we can find the DTD 399 my ( $tmpfh, $tmpfile ) = File::Temp::tempfile( 400 "po4a-XXXX", 401 SUFFIX => ".sgml", 402 DIR => $ENV{TMPDIR} || "/tmp", 403 UNLINK => 0 404 ); 405 print $tmpfh $origfile; 406 close $tmpfh 407 or die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot close tempfile: %s" ), $! ); 408 if ( system("onsgmls -p < $tmpfile") ) { 409 unlink($tmpfile); 410 die wrap_mod( 411 "po4a::sgml", 412 dgettext( 413 "po4a", "Error while running onsgmls -p. Please check if onsgmls and the DTD are installed." 414 ) 415 ); 416 } 417 unlink($tmpfile); 418 } 419 420 # Detect the XML pre-prolog 421 if ( $origfile =~ s/^(\s*<\?xml[^?]*\?>)// ) { 422 warn wrap_mod( 423 "po4a::sgml", 424 dgettext( 425 "po4a", 426 "Trying to handle a XML document as a SGML one. " 427 . "Feel lucky if it works, help us implementing a proper XML backend if it does not." 428 ), 429 $mastername 430 ) unless $self->verbose() <= 0; 431 $xmlprolog = $1; 432 } 433 434 # Get the prolog 435 { 436 $prolog = $origfile; 437 my $lvl; # number of '<' seen without matching '>' 438 my $pos = 0; # where in the document (in chars) while detecting prolog boundaries 439 440 unless ( $prolog =~ s/^(.*<!DOCTYPE).*$/$1/is ) { 441 die wrap_mod( 442 "po4a::sgml", 443 dgettext( 444 "po4a", 445 "This file is not a master SGML document (no DOCTYPE). " 446 . "It may be a file to be included by another one, in which case it should not be passed to po4a directly. Text from included files is extracted/translated when handling the master file including them." 447 ) 448 ); 449 } 450 $pos += length($prolog); 451 $lvl = 1; 452 while ( $lvl != 0 ) { 453 454 # Eat comments in the prolog, since there may be some '>' or '<' in them. 455 if ( $origfile =~ m/^.{$pos}(<!--.*?-->)/s ) { 456 print "Found a comment in the prolog: $1\n" if ( $debug{'generic'} ); 457 $pos += length($1); 458 459 # take care of the line numbers 460 my @a = split( /\n/, $1 ); 461 shift @a; # nb line - 1 462 while ( defined( shift @a ) ) { 463 $prolog .= "\n"; 464 } 465 next; 466 } 467 468 # Search the closing '>' 469 my ($c) = substr( $origfile, $pos, 1 ); 470 $lvl++ if ( $c eq '<' ); 471 $lvl-- if ( $c eq '>' ); 472 $prolog = "$prolog$c"; 473 $pos++; 474 } 475 } 476 477 # Add the definition of new tags that will be used for the 478 # conditionnal inclusions 479 if ( $origfile =~ /^.*<!DOCTYPE[^[>]*\[/is ) { 480 $origfile =~ 481 s/^(.*<!DOCTYPE[^[>]*\[)/$1 <!ELEMENT PO4ABEG - o empty> <!ATTLIST PO4ABEG name CDATA #REQUIRED> <!ELEMENT PO4AEND - o empty>/is; 482 } 483 484 print STDERR "PROLOG=$prolog\n------------\n" if ( $debug{'generic'} ); 485 486 # Configure the tags for this dtd 487 if ( $prolog =~ /debiandoc/i ) { 488 $self->set_tags_kind( 489 "translate" => "author version abstract title" . "date copyrightsummary heading p " . "example tag title", 490 "empty" => "date ref manref url toc", 491 "verbatim" => "example", 492 "ignore" => "package prgn file tt em var " 493 . "name email footnote po4aend po4abeg " 494 . "strong ftpsite ftppath qref", 495 "indent" => "appendix " . "book " 496 . "chapt copyright " 497 . "debiandoc " 498 . "enumlist " . "item " . "list " 499 . "sect sect1 sect2 sect3 sect4 " 500 . "tag taglist titlepag toc" 501 ); 502 503 } elsif ( $prolog =~ /docbook/i ) { 504 $self->set_tags_kind( 505 "translate" => "abbrev appendixinfo artheader attribution " 506 . "biblioentry biblioset " 507 . "chapterinfo collab collabname confdates confgroup conftitle " . "date " 508 . "edition editor entry example " 509 . "figure " 510 . "glosssee glossseealso glossterm " 511 . "holder " 512 . "member msgaud msglevel msgorig " 513 . "orgdiv orgname othername " 514 . "pagenums para phrase pubdate publishername primary " 515 . "refclass refdescriptor refentrytitle refmiscinfo refname refpurpose releaseinfo remark revnumber revremark " 516 . "screeninfo seg secondary see seealso segtitle simpara substeps subtitle synopfragmentref synopsis " 517 . "term tertiary title titleabbrev " 518 . "contrib epigraph", 519 "empty" => "audiodata colspec graphic imagedata textdata sbr spanspec videodata xref", 520 "indent" => "abstract answer appendix article articleinfo audioobject author authorgroup " 521 . "bibliodiv bibliography blockquote blockinfo book bookinfo bridgehead " 522 . "callout calloutlist caption caution chapter copyright " 523 . "dedication docinfo " 524 . "entry " 525 . "formalpara " 526 . "glossary glossdef glossdiv glossentry glosslist group " 527 . "imageobject important index indexterm informaltable itemizedlist " 528 . "keyword keywordset " 529 . "legalnotice listitem lot " 530 . "mediaobject msg msgentry msginfo msgexplan msgmain msgrel msgsub msgtext " . "note " 531 . "objectinfo orderedlist " 532 . "part partintro preface procedure publisher " 533 . "qandadiv qandaentry qandaset question " 534 . "reference refentry refentryinfo refmeta refnamediv refsect1 refsect1info refsect2 refsect2info refsect3 refsect3info refsection refsectioninfo refsynopsisdiv refsynopsisdivinfo revision revdescription row " 535 . "screenshot sect1 sect1info sect2 sect2info sect3 sect3info sect4 sect4info sect5 sect5info section sectioninfo seglistitem segmentedlist set setindex setinfo shortcut simplelist simplemsgentry simplesect step synopfragment " 536 . "table tbody textobject tgroup thead tip toc " 537 . "variablelist varlistentry videoobject " 538 . "warning", 539 "verbatim" => "address cmdsynopsis holder literallayout programlisting " 540 . "refentrytitle refname refpurpose screen term title", 541 "ignore" => "acronym action affiliation anchor application arg author authorinitials " 542 . "city citation citerefentry citetitle classname co command computeroutput constant corpauthor country " 543 . "database po4abeg po4aend " 544 . "email emphasis envar errorcode errorname errortext errortype exceptionname " 545 . "filename firstname firstterm footnote footnoteref foreignphrase function " 546 . "glossterm guibutton guiicon guilabel guimenu guimenuitem guisubmenu " 547 . "hardware " 548 . "indexterm informalexample inlineequation inlinegraphic inlinemediaobject interface interfacename isbn " 549 . "keycap keycode keycombo keysym " 550 . "link lineannotation literal " 551 . "manvolnum markup medialabel menuchoice methodname modespec mousebutton " 552 . "nonterminal " 553 . "olink ooclass ooexception oointerface option optional othercredit " 554 . "parameter personname phrase productname productnumber prompt property pubsnumber " 555 . "quote " 556 . "remark replaceable returnvalue revhistory " 557 . "sgmltag sidebar structfield structname subscript superscript surname symbol systemitem " 558 . "token trademark type " 559 . "ulink userinput " 560 . "varname volumenum " 561 . "wordasword " . "xref " . "year", 562 "attributes" => "<(article|book)>lang" 563 ); 564 565 } else { 566 if ( $self->{options}{'force'} ) { 567 warn wrap_mod( "po4a::sgml", 568 dgettext( "po4a", "DTD of this file is unknown, but proceeding as requested." ) ); 569 $self->set_tags_kind(); 570 } else { 571 die wrap_mod( "po4a::sgml", 572 dgettext( "po4a", "DTD of this file is unknown. (supported: DebianDoc, DocBook). The prolog follows:" ) 573 . "\n$prolog" ); 574 } 575 } 576 577 # Hash of the file entities that won't be included 578 my %ignored_inclusion = (); 579 foreach ( split / /, $self->{options}{'ignore-inclusion'} ) { 580 $ignored_inclusion{$_} = 1; 581 } 582 583 # Prepare the reference indirection stuff 584 my @refs; 585 my $length = ( $origfile =~ tr/\n/\n/ ); 586 print "XX Prepare reference indirection stuff\n" if $debug{'refs'}; 587 for ( my $i = 1 ; $i <= $length ; $i++ ) { 588 push @refs, "$mastername:$i"; 589 print "$mastername:$i\n" if $debug{'refs'}; 590 } 591 592 # protect the conditional inclusions in the file 593 $origfile =~ s/<!\[\s*IGNORE\s*\[/{PO4A-beg-IGNORE}/g; # cond. incl. starts 594 $origfile =~ s/<!\[\s*CDATA\s*\[/{PO4A-beg-CDATA}/g; # cond. incl. starts 595 $origfile =~ s/<!\[\s*RCDATA\s*\[/{PO4A-beg-RCDATA}/g; # cond. incl. starts 596 $origfile =~ s/<!\[\s*([^\[\s]+)\s*\[/<po4abeg name="$1">/g; # cond. incl. starts 597 $origfile =~ s/\]\]>/<po4aend>/g; # cond. incl. end 598 599 # Remove <![ IGNORE [ sections 600 # FIXME: we don't support included PO4A-beg- 601 my $tmp1 = $origfile; 602 while ( $tmp1 =~ m/^(.*?)(\{PO4A-beg-\s*IGNORE\s*}(?:.+?)<po4aend>)(.*)$/s ) { 603 my ( $begin, $ignored, $end ) = ( $1, $2, $3 ); 604 my @begin = split( /\n/, $begin ); 605 my @ignored = split( /\n/, $ignored ); 606 my $pre = scalar @begin; 607 my $len = ( scalar @ignored ) - 1; 608 $pre++ if ( $begin =~ /\n$/s ); 609 $len++ if ( $end =~ /^\n/s ); 610 611 # remove the references of the ignored lines 612 splice @refs, $pre + 1, $len - 1; 613 614 # remove the lines 615 $tmp1 = $begin . $end; 616 } 617 $origfile = $tmp1; 618 619 # The <, >, and & in a CDATA must be escaped because they do not 620 # correspond to tags or entities delimiters. 621 $tmp1 = $origfile; 622 $origfile = ""; 623 while ( $tmp1 =~ m/^(.*?{PO4A-beg-\s*(?:CDATA|RCDATA)\s*})(.+?)(<po4aend>.*)$/s ) { 624 my ( $begin, $tmp ) = ( $1, $2 ); 625 $tmp1 = $3; 626 $tmp =~ s/</{PO4A-lt}/gs; 627 $tmp =~ s/>/{PO4A-gt}/gs; 628 $tmp =~ s/&/{PO4A-amp}/gs; 629 $origfile .= $begin . $tmp; 630 } 631 $origfile .= $tmp1; 632 633 # Deal with the %entities; in the prolog. God damn it, this code is gross! 634 # Try hard not to change the number of lines to not fuck up the references 635 my %prologentincl; 636 my $moretodo = 1; 637 PROLOGENTITY: while ($moretodo) { # non trivial loop to deal with recursive inclusion 638 $moretodo = 0; 639 640 # Unprotect not yet defined inclusions 641 $prolog =~ s/{PO4A-percent}/%/sg; 642 print STDERR "prolog=>>>>$prolog<<<<\n" 643 if ( $debug{'entities'} ); 644 while ( $prolog =~ /(.*?)<!ENTITY\s*%\s*(\S*)\s+SYSTEM\s*"([^>"]*)"\s*>(.*)$/is ) { #})"{ (Stupid editor) 645 print STDERR "Seen the definition entity of prolog inclusion '$2' (=$3)\n" 646 if ( $debug{'entities'} ); 647 648 # Preload the content of the entity. 649 my $key = $2; 650 my $filename = $3; 651 my $origfilename = $filename; 652 my ( $begin, $end ) = ( $1, $4 ); 653 if ( $filename !~ m%^/% && $mastername =~ m%/% ) { 654 my $dir = $mastername; 655 $dir =~ s%/[^/]*$%%; 656 $filename = "$dir/$filename"; 657 658 # origfile also needs to be fixed otherwise onsgmls won't 659 # find the file. 660 $origfile =~ s/(<!ENTITY\s*%\s*\Q$key\E\s+SYSTEM\s*")\Q$origfilename\E("\s*>)/$1$filename$2/gsi; 661 } 662 if ( defined $ignored_inclusion{$key} or !-e $filename ) { 663 664 # We won't expand this entity. 665 # And we avoid onsgmls to do so. 666 $prolog = "$begin<!--{PO4A-ent-beg-$key}$filename" . "{PO4A-ent-end}-->$end"; 667 } else { 668 $prolog = $begin . $end; 669 ( -e $filename && open IN, "<$filename" ) 670 || die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot open %s (content of entity %s%s;): %s" ), 671 $filename, '%', $key, $! ); 672 local $/ = undef; 673 $prologentincl{$key} = <IN>; 674 close IN; 675 print STDERR "Content of \%$key; is $filename (" 676 . ( $prologentincl{$key} =~ tr/\n/\n/ ) 677 . " lines long)\n" 678 if ( $debug{'entities'} ); 679 print STDERR "content: " . $prologentincl{$key} . "\n" 680 if ( $debug{'entities'} ); 681 $moretodo = 1; 682 next PROLOGENTITY; 683 } 684 } 685 while ( $prolog =~ /(.*?)<!ENTITY\s*%\s*(\S*)\s*"([^>"]*)"\s*>(.*)$/is ) { #})"{ (Stupid editor) 686 print STDERR "Seen the definition entity of prolog definition '$2' (=$3)\n" 687 if ( $debug{'entities'} ); 688 689 # Preload the content of the entity. 690 my $key = $2; 691 $prolog = $1 . $4; 692 $prologentincl{$key} = $3; 693 print STDERR "content: " . $prologentincl{$key} . "\n" 694 if ( $debug{'entities'} ); 695 $moretodo = 1; 696 next PROLOGENTITY; 697 } 698 while ( $prolog =~ /^(.*?)%([^;\s]*);(.*)$/s ) { 699 my ( $pre, $ent, $post ) = ( $1, $2, $3 ); 700 701 # Yeah, right, the content of the entity can be defined in a not yet loaded entity 702 # It's easy to build a weird case where all that shit collapses poorly. But why the 703 # hell are you using those strange constructs in your document, damn it? 704 print STDERR "Seen prolog inclusion $ent\n" if ( $debug{'entities'} ); 705 if ( defined( $prologentincl{$ent} ) ) { 706 $prolog = $pre . $prologentincl{$ent} . $post; 707 print STDERR "Change \%$ent; to its content in the prolog\n" 708 if $debug{'entities'}; 709 $moretodo = 1; 710 } else { 711 712 # AAAARGH stupid document using %bla; and having then defined in another inclusion! 713 # Protect it for this pass, and unprotect it on next one 714 print STDERR "entity $ent not defined yet ?!\n" 715 if $debug{'entities'}; 716 $prolog = "$pre" . '{PO4A-percent}' . "$ent;$post"; 717 } 718 } 719 } 720 $prolog =~ s/<!--\{PO4A-ent-beg-(.*?)\}(.*?)\{PO4A-ent-end\}-->/<!ENTITY % $1 SYSTEM "$2">/g; 721 722 # Unprotect undefined inclusions, and die of them 723 $prolog =~ s/\{PO4A-percent\}/%/sg; 724 if ( $prolog =~ /%([^;\s]*);/ ) { 725 die wrap_mod( "po4a::sgml", dgettext( "po4a", "unrecognized prolog inclusion entity: %%%s;" ), $1 ) 726 unless ( $ignored_inclusion{$1} ); 727 } 728 729 # Protect &entities; (all but the ones asking for a file inclusion) 730 # search the file inclusion entities 731 my %entincl; 732 my $searchprolog = $prolog; 733 while ( $searchprolog =~ /(.*?)<!ENTITY\s+(\S*)\s+SYSTEM\s*"([^>"]*)"\s*>(.*)$/is ) { #})"{ 734 print STDERR "Seen the entity of inclusion $2 (=$3)\n" 735 if ( $debug{'entities'} ); 736 my $key = $2; 737 my $filename = $3; 738 my $origfilename = $filename; 739 $searchprolog = $4; 740 if ( $filename !~ m%^/% && $mastername =~ m%/% ) { 741 my $dir = $mastername; 742 $dir =~ s%/[^/]*$%%; 743 $filename = "$dir/$filename"; 744 745 # origfile also needs to be fixed otherwise onsgmls won't find 746 # the file. 747 $origfile =~ s/(<!ENTITY\s+$key\s+SYSTEM\s*")\Q$origfilename\E("\s*>)/$1$filename$2/gsi; 748 } 749 if ( ( not defined $ignored_inclusion{$2} ) and ( -e $filename ) ) { 750 $entincl{$key}{'filename'} = $filename; 751 752 # Preload the content of the entity 753 ( -e $filename && open IN, "<$filename" ) 754 || die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot open %s (content of entity %s%s;): %s" ), 755 $filename, '&', $key, $! ); 756 local $/ = undef; 757 $entincl{$key}{'content'} = <IN>; 758 close IN; 759 $entincl{$key}{'length'} = ( $entincl{$key}{'content'} =~ tr/\n/\n/ ); 760 print STDERR "read $filename (content of \&$key;, $entincl{$key}{'length'} lines long)\n" 761 if ( $debug{'entities'} ); 762 } 763 } 764 765 # Change the entities including files in the document 766 my $dosubstitution = 1; 767 while ($dosubstitution) { 768 $dosubstitution = 0; 769 foreach my $key ( keys %entincl ) { 770 771 # The external entity can be referenced as &key; or &key 772 # In the second case, we must differentiate &key and &key2 773 while ( $origfile =~ /^(.*?)&$key(;.*$|[^-_:.A-Za-z0-9].*$|$)/s ) { 774 775 # Since we will include a new file, we 776 # must do a new round of substitutions. 777 $dosubstitution = 1; 778 my ( $begin, $end ) = ( $1, $2 ); 779 $end = "" unless ( defined $end ); 780 $end =~ s/^;//s; 781 782 if ( $begin =~ m/.*<!--(.*?)$/s and $1 !~ m/-->/s ) { 783 784 # This entity is commented. Just remove it. 785 $origfile = $begin . $end; 786 next; 787 } 788 789 # add the refs 790 my $len = $entincl{$key}{'length'}; # number added by the inclusion 791 my $pre = ( $begin =~ tr/\n/\n/ ); # number of \n 792 my $post = ( $end =~ tr/\n/\n/ ); 793 print "XX Add a ref. pre=$pre; len=$len; post=$post\n" 794 if $debug{'refs'}; 795 796 # Keep a reference of inclusion position in main file 797 my $main = $refs[$pre]; 798 799 # Remove the references for the lines after the inclusion 800 # point. 801 my @endrefs = splice @refs, $pre + 1; 802 803 # Add the references of the added lines 804 my $i; 805 for ( $i = 0 ; $i < $len ; $i++ ) { 806 $refs[ $i + $pre ] = "$main $entincl{$key}{'filename'}:" . ( $i + 1 ); 807 } 808 809 if ( $begin !~ m/\n[ \t]*$/s ) { 810 if ( $entincl{$key}{'content'} =~ m/^[ \t]*\n/s ) { 811 812 # There is nothing in the first line of the 813 # included file, and something on the line before 814 # the inclusion The line reference will be more 815 # informative like this: 816 $refs[$pre] = $main; 817 } 818 } 819 if ( $end !~ s/^[ \t]*\n//s ) { 820 if ( $entincl{$key}{'content'} =~ m/\n[ \t]*$/s ) { 821 822 # There is something on the line after the 823 # inclusion, and there is an end of line at the 824 # end of the included file. We must add the line 825 # reference of the remainder on the line: 826 push @refs, $main; 827 } 828 } 829 830 # Append the references removed earlier (lines after the 831 # inclusion point). 832 push @refs, @endrefs; 833 834 # Do the substitution 835 $origfile = "$begin" . $entincl{$key}{'content'} . "$end"; 836 print STDERR "substitute $key\n" if ( $debug{'entities'} ); 837 } 838 } 839 } 840 $origfile =~ s/\G(.*?)&([A-Za-z_:][-_:.A-Za-z0-9]*|#[0-9]+|#x[0-9a-fA-F]+)\b/$1\{PO4A-amp\}$2/gs; 841 if ( defined($xmlprolog) && length($xmlprolog) ) { 842 $origfile =~ s/\/>/\{PO4A-close\}>/gs; 843 } 844 845 if ( $debug{'refs'} ) { 846 print "XX Resulting shifts\n"; 847 for ( my $i = 0 ; $i < scalar @refs ; $i++ ) { 848 print "$mastername:" . ( $i + 1 ) . " -> $refs[$i]\n"; 849 } 850 } 851 852 my ( $tmpfh, $tmpfile ) = File::Temp::tempfile( 853 "po4a-XXXX", 854 SUFFIX => ".sgml", 855 DIR => $ENV{TMPDIR} || "/tmp", 856 UNLINK => 0 857 ); 858 print $tmpfh $origfile; 859 close $tmpfh or die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot close tempfile: %s" ), $! ); 860 861 my $cmd = "onsgmls -l -E 0 -wno-valid < $tmpfile" . ( $debug{'onsgmls'} ? "" : " 2>/dev/null" ) . " |"; 862 print STDERR "CMD=$cmd\n" if ( $debug{'generic'} or $debug{'onsgmls'} ); 863 864 open( IN, $cmd ) || die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot run onsgmls: %s" ), $! ); 865 866 # The kind of tags 867 my ( %translate, %empty, %verbatim, %indent, %exist, %attribute, %qualify ); 868 foreach ( split( / /, ( $self->{SGML}->{k}{'translate'} || '' ) ) ) { 869 $translate{ uc $_ } = 1; 870 $indent{ uc $_ } = 1; 871 $exist{ uc $_ } = 1; 872 } 873 foreach ( split( / /, ( $self->{SGML}->{k}{'empty'} || '' ) ) ) { 874 $empty{ uc $_ } = 1; 875 $exist{ uc $_ } = 1; 876 } 877 foreach ( split( / /, ( $self->{SGML}->{k}{'verbatim'} || '' ) ) ) { 878 $translate{ uc $_ } = 1; 879 $verbatim{ uc $_ } = 1; 880 $exist{ uc $_ } = 1; 881 } 882 foreach ( split( / /, ( $self->{SGML}->{k}{'indent'} || '' ) ) ) { 883 $translate{ uc $_ } = 1; 884 $indent{ uc $_ } = 1; 885 $exist{ uc $_ } = 1; 886 } 887 foreach ( split( / /, ( $self->{SGML}->{k}{'ignore'} ) || '' ) ) { 888 $exist{ uc $_ } = 1; 889 } 890 foreach ( split( / /, ( $self->{SGML}->{k}{'attributes'} || '' ) ) ) { 891 my ( $attr, $tags ); 892 if (m/(^.*>)(\w+)/) { 893 $attr = uc $2; 894 $tags = $1; 895 } else { 896 $attr = uc $_; 897 $tags = ".*"; 898 } 899 if ( exists $attribute{$attr} ) { 900 $attribute{$attr} .= "|$tags"; 901 } else { 902 $attribute{$attr} = $tags; 903 } 904 } 905 foreach ( split( / /, ( $self->{SGML}->{k}{'qualify'} ) || '' ) ) { 906 $qualify{ uc $_ } = 1; 907 $attribute{ uc $_ } = '.*' unless exists $attribute{ uc $_ }; 908 } 909 910 # What to do before parsing 911 912 # push the XML prolog if existing 913 $self->pushline( $xmlprolog . "\n" ) if ( defined($xmlprolog) && length($xmlprolog) ); 914 915 # Put the prolog into the file, allowing for entity definition translation 916 # <!ENTITY myentity "definition_of_my_entity"> 917 # and push("<!ENTITY myentity \"".$self->translate("definition_of_my_entity") 918 if ( $prolog =~ m/(.*?\[)(.*)(\]>)/s ) { 919 warn "Pre=~~$1~~;Post=~~$3~~\n" if ( $debug{'entities'} ); 920 $self->pushline( $1 . "\n" ) if ( length($1) ); 921 $prolog = $2; 922 my ($post) = $3; 923 while ( $prolog =~ m/^(.*?)<!ENTITY\s+(\S*)\s+"([^"]*)"\s*>(.*)$/is ) { #" ){ 924 $self->pushline($1) if length($1); 925 $self->pushline( "<!ENTITY $2 \"" . $self->translate( $3, "", "definition of entity \&$2;" ) . "\">" ); 926 warn "Seen text entity $2\n" if ( $debug{'entities'} ); 927 $prolog = $4; 928 } 929 $prolog .= $post; 930 $self->pushline( $prolog . "\n" ) if ( length($prolog) ); 931 } else { 932 warn "No entity declaration detected in ~~$prolog~~...\n" if ( $debug{'entities'} ); 933 $self->pushline($prolog) if length($prolog); 934 } 935 936 # The parse object. 937 # Damn SGMLS. It makes me do crude things. 938 no strict "subs"; 939 my $parse = new SGMLS(IN); 940 use strict; 941 942 # Some values for the parsing 943 my @open = (); # opened translation container tags 944 my $verb = 0; # can we wrap or not 945 my $verb_last_ref; 946 my $seenfootnote = 0; 947 my $indent = 0; # indent level 948 my $lastchar = ''; # 949 my $buffer = ""; # what we will soon handle 950 951 # Keep a reference to the last line indicated by onsgmls 952 my $line = 0; 953 954 # Unfortunately, onsgmls do not mention all the line changes. We have 955 # to keep track of the number of lines seen in the "record ends". 956 my $adds = 0; 957 958 # If the last line received contains only spaces, do not take it into 959 # account for the line reference of the paragraph. 960 my $empty_last_cdata = 0; 961 962 # run the appropriate handler for each event 963 EVENT: while ( my $event = $parse->next_event ) { 964 965 # get the line reference to build po entries 966 if ( $line != $parse->line ) { 967 968 # onsgmls informs us of that the line changed. Reset $adds and 969 # $empty_last_cdata 970 $adds = 0; 971 $empty_last_cdata = 0; 972 $line = $parse->line; 973 } 974 my $ref = $refs[ $parse->line - 1 + $adds - $empty_last_cdata ]; 975 976 # In verbatim mode, keep the current line reference. 977 if ($verb) { 978 $ref = $refs[ $parse->line - 1 ]; 979 } 980 my $type; 981 982 if ( $event->type eq 'start_element' ) { 983 die wrap_ref_mod( $ref, "po4a::sgml", dgettext( "po4a", "Unknown tag %s" ), $event->data->name ) 984 unless $exist{ $event->data->name }; 985 986 $lastchar = ">"; 987 988 # Which tag did we see? 989 my $tag = ''; 990 $tag .= '<' . lc( $event->data->name() ); 991 foreach my $attr ( sort $event->data->attribute_names() ) { 992 993 my $val = ${ $event->data->attributes() }{$attr}; 994 my $value = $val->value(); 995 996 # if ($val->type() eq 'IMPLIED') { 997 # $tag .= ' '.lc($attr).'="'.lc($attr).'"'; 998 # } els 999 if ( $val->type() eq 'CDATA' 1000 || $val->type() eq 'IMPLIED' ) 1001 { 1002 if ( defined $value && length($value) ) { 1003 my $lattr = lc $attr; 1004 my $uattr = uc $attr; 1005 if ( exists $attribute{$uattr} ) { 1006 my $context = ""; 1007 foreach my $o (@open) { 1008 next if ( !defined $o or $o =~ m%^</% ); 1009 $o =~ s/ .*/>/; 1010 $context .= $o; 1011 } 1012 $context = join( "", $context, "<", lc( $event->data->name() ), ">" ); 1013 if ( $context =~ /^($attribute{$uattr})$/ ) { 1014 if ( $qualify{$uattr} ) { 1015 my $translated = 1016 $self->translate( "$lattr=$value", $ref, "attribute $context$lattr" ); 1017 if ( $translated =~ s/^$lattr=// ) { 1018 $value = $translated; 1019 } else { 1020 die wrap_mod( "po4a::sgml", 1021 dgettext( "po4a", "bad translation '%s' for '%s' in '%s'" ), 1022 $translated, $context . $lattr, $ref ); 1023 } 1024 } else { 1025 $value = $self->translate( $value, $ref, "attribute $context$lattr" ); 1026 } 1027 } 1028 } 1029 if ( $value =~ m/\"/ ) { 1030 $value = "'" . $value . "'"; 1031 } else { 1032 $value = '"' . $value . '"'; 1033 } 1034 $tag .= " $lattr=$value"; 1035 } 1036 } elsif ( $val->type() eq 'NOTATION' ) { 1037 } else { 1038 $tag .= ' ' . lc($attr) . '="' . lc($value) . '"' 1039 if ( defined $value && length($value) ); 1040 } 1041 } 1042 $tag .= '>'; 1043 1044 # debug 1045 print STDERR "Seen $tag, open level=" . ( scalar @open ) . ", verb=$verb\n" 1046 if ( $debug{'tag'} ); 1047 1048 if ( $event->data->name() eq 'FOOTNOTE' ) { 1049 1050 # we want to put the <para> inside the <footnote> in the same msgid 1051 $seenfootnote = 1; 1052 } 1053 1054 if ($seenfootnote) { 1055 $buffer .= $tag; 1056 next EVENT; 1057 } 1058 if ( $translate{ $event->data->name() } ) { 1059 1060 # Build the type 1061 if ( scalar @open > 0 ) { 1062 $type = $open[$#open] . $tag; 1063 } else { 1064 $type = $tag; 1065 } 1066 1067 # do the job 1068 if ( @open > 0 ) { 1069 $self->end_paragraph( $buffer, $ref, $type, $verb, $indent, @open ); 1070 } else { 1071 $self->pushline($buffer) if $buffer; 1072 } 1073 $buffer = ""; 1074 push @open, $tag; 1075 } elsif ( $indent{ $event->data->name() } ) { 1076 die wrap_ref_mod( $ref, "po4a::sgml", 1077 dgettext( "po4a", "Closing tag for a translation container missing before %s" ), $tag ) 1078 if ( scalar @open ); 1079 } 1080 1081 if ( $verbatim{ $event->data->name() } ) { 1082 $verb++; 1083 1084 # Keep a reference to the line that openned the verbatim 1085 # section. This is needed to check if its data starts on 1086 # the same line. 1087 $verb_last_ref = $ref; 1088 } 1089 if ($verb) { 1090 1091 # Tag in a verbatim section. Check if it appeared at 1092 # the same line than the previous data. If not, it 1093 # means that an end of line must be added to the 1094 # buffer. 1095 if ( $ref ne $verb_last_ref ) { 1096 1097 # FIXME: Does it work if $verb > 1 1098 $buffer .= "\n"; 1099 $verb_last_ref = $ref; 1100 } 1101 } 1102 1103 if ( $indent{ $event->data->name() } ) { 1104 1105 # push the indenting space only if not in verb before that tag 1106 # push trailing "\n" only if not in verbose afterward 1107 $self->pushline( ( $verb > 1 ? "" : ( " " x $indent ) ) . $tag . ( $verb ? "" : "\n" ) ); 1108 $indent++ unless $empty{ $event->data->name() }; 1109 } else { 1110 $tag =~ s/<po4abeg name="([^"]+)">/<![ $1 [/; #"; Stupid emacs 1111 $tag =~ s/<po4aend>/]]>/; 1112 $buffer .= $tag; 1113 } 1114 } # end of type eq 'start_element' 1115 1116 elsif ( $event->type eq 'end_element' ) { 1117 my $tag = ( 1118 $empty{ $event->data->name() } 1119 ? '' 1120 : '</' . lc( $event->data->name() ) . '>' 1121 ); 1122 1123 if ($verb) { 1124 1125 # Tag in a verbatim section. Check if it appeared at 1126 # the same line than the previous data. If not, it 1127 # means that an end of line must be added to the 1128 # buffer. 1129 if ( $ref ne $verb_last_ref ) { 1130 1131 # FIXME: Does it work if $verb > 1 1132 $buffer .= "\n"; 1133 $verb_last_ref = $ref; 1134 } 1135 } 1136 print STDERR "Seen $tag, level=" . ( scalar @open ) . ", verb=$verb\n" 1137 if ( $debug{'tag'} ); 1138 1139 $lastchar = ">"; 1140 1141 if ( $event->data->name() eq 'FOOTNOTE' ) { 1142 1143 # we want to put the <para> inside the <footnote> in the same msgid 1144 $seenfootnote = 0; 1145 } 1146 1147 if ($seenfootnote) { 1148 $buffer .= $tag; 1149 next EVENT; 1150 } 1151 if ( $translate{ $event->data->name() } ) { 1152 $type = $open[$#open] . $tag; 1153 $self->end_paragraph( $buffer, $ref, $type, $verb, $indent, @open ); 1154 $buffer = ""; 1155 pop @open; 1156 if ( @open > 0 ) { 1157 pop @open; 1158 push @open, $tag; 1159 } 1160 } elsif ( $indent{ $event->data->name() } ) { 1161 die wrap_ref_mod( $ref, "po4a::sgml", 1162 dgettext( "po4a", "Closing tag for a translation container missing before %s" ), $tag ) 1163 if ( scalar @open ); 1164 } 1165 1166 unless ( $event->data->name() =~ m/^(PO4ABEG|PO4AEND)$/si ) { 1167 if ( $indent{ $event->data->name() } ) { 1168 $indent--; 1169 1170 # add indenting space only when not in verbatim 1171 # add the tailing \n only if out of verbatim after that tag 1172 $self->pushline( ( $verb ? "" : ( " " x $indent ) ) . $tag . ( $verb > 1 ? "" : "\n" ) ); 1173 } else { 1174 $buffer .= $tag; 1175 } 1176 $verb-- if $verbatim{ $event->data->name() }; 1177 } 1178 } # end of type eq 'end_element' 1179 1180 elsif ( $event->type eq 'cdata' ) { 1181 my $cdata = $event->data; 1182 $empty_last_cdata = ( $cdata =~ m/^\s*$/ ); 1183 $cdata =~ s/{PO4A-lt}/</g; 1184 $cdata =~ s/{PO4A-gt}/>/g; 1185 $cdata =~ s/{PO4A-amp}/&/g; 1186 $cdata =~ s/{PO4A-end}/\]\]>/g; 1187 $cdata =~ s/{PO4A-beg-([^\}]+)}/<!\[$1\[/g; 1188 if ($verb) { 1189 1190 # Check if this line of data appear on the same line 1191 # than the previous tag. If not, append an end of line 1192 # to the buffer. 1193 if ( $ref ne $verb_last_ref ) { 1194 $buffer .= "\n"; 1195 $verb_last_ref = $ref; 1196 } 1197 } else { 1198 $cdata =~ s/\\t/ /g; 1199 $cdata =~ s/\s+/ /g; 1200 $cdata =~ s/^\s//s if $lastchar eq ' '; 1201 } 1202 $lastchar = substr( $cdata, -1, 1 ); 1203 $buffer .= $cdata; 1204 if ( defined($xmlprolog) && length($xmlprolog) ) { 1205 $buffer =~ s/>PO4A-close\}>/\/>/sg; 1206 $buffer =~ s/PO4A-close\}>//sg; # This should not be necessary 1207 } 1208 } # end of type eq 'cdata' 1209 1210 elsif ( $event->type eq 'sdata' ) { 1211 my $sdata = $event->data; 1212 $sdata =~ s/^\[//; 1213 $sdata =~ s/\s*\]$//; 1214 $lastchar = substr( $sdata, -1, 1 ); 1215 $buffer .= '&' . $sdata . ';'; 1216 } # end of type eq 'sdata' 1217 1218 elsif ( $event->type eq 're' ) { 1219 1220 # End of record, the line reference shall be incremented. 1221 $adds += 1; 1222 if ($verb) { 1223 1224 # Check if this line of data appear on the same line 1225 # than the previous tag. If not, append an end of line 1226 # to the buffer. 1227 if ( $ref ne $verb_last_ref ) { 1228 $buffer .= "\n"; 1229 $verb_last_ref = $ref; 1230 } 1231 $buffer .= "\n"; 1232 } elsif ( $lastchar ne ' ' ) { 1233 $buffer .= " "; 1234 } 1235 $lastchar = ' '; 1236 } #end of type eq 're' 1237 1238 elsif ( $event->type eq 'conforming' ) { 1239 1240 } elsif ( $event->type eq 'pi' ) { 1241 my $pi = $event->data; 1242 $buffer .= "<?$pi>"; 1243 1244 } else { 1245 die wrap_ref_mod( 1246 $refs[ $parse->line ], 1247 "po4a::sgml", dgettext( "po4a", "Unknown SGML event type: %s" ), 1248 $event->type 1249 ); 1250 } 1251 } 1252 1253 # What to do after parsing 1254 $self->pushline($buffer); 1255 close(IN); 1256 if ( $? != 0 and $self->verbose() > 0 ) { 1257 warn wrap_mod( 1258 "po4a::sgml", 1259 dgettext( 1260 "po4a", 1261 "Warning: onsgmls produced some errors. " 1262 . "This is usually caused by po4a, which modifies the input " 1263 . "and restores it afterwards, causing the input of onsgmls " 1264 . "to be invalid. This is usually safe, but you may wish " 1265 . "to verify the generated document with onsgmls -wno-valid." 1266 ) 1267 ); 1268 unless ( $debug{'onsgmls'} ) { 1269 warn wrap_mod( 1270 "po4a::sgml", 1271 dgettext( 1272 "po4a", 1273 "To see the error message, " 1274 . "rerun po4a with this additional argument:\n" 1275 . " -o debug=onsgmls" 1276 ) 1277 ); 1278 } 1279 } 1280 unlink($tmpfile) unless ( $debug{'refs'} or $debug{'onsgmls'} ); 1281} 1282 1283sub end_paragraph { 1284 my ( $self, $para, $ref, $type, $verb, $indent ) = ( shift, shift, shift, shift, shift, shift ); 1285 my (@open) = @_; 1286 die "Internal error: no paragraph to end here!!" 1287 unless scalar @open; 1288 1289 return unless defined($para) && length($para); 1290 1291 if ( ( $para =~ m/^\s*$/s ) and ( not $verb ) ) { 1292 1293 # In non-verbatim environments, a paragraph with only spaces is 1294 # like an empty paragraph 1295 return; 1296 } 1297 1298 # unprotect &entities; 1299 $para =~ s/{PO4A-amp}/&/g; 1300 1301 # remove the name"\|\|" onsgmls added as attributes 1302 $para =~ s/ name=\"\\\|\\\|\"//g; 1303 $para =~ s/ moreinfo=\"none\"//g; 1304 1305 # Extract the leading and trailing spaces. They will be restored only 1306 # in verbatim environments. 1307 my ( $leading_spaces, $trailing_spaces ) = ( "", "" ); 1308 if ($verb) { 1309 1310 # In the verbatim mode, we can ignore empty lines, but not the 1311 # leading spaces or tabulations. Otherwise, the PO will look 1312 # weird. 1313 if ( $para =~ m/^(\s*\n)(.*?)(\s*)$/s ) { 1314 $leading_spaces = $1; 1315 $para = $2; 1316 $trailing_spaces = $3; 1317 } 1318 } else { 1319 if ( $para =~ m/^(\s*)(.*?)(\s*)$/s ) { 1320 $leading_spaces = $1; 1321 $para = $2; 1322 $trailing_spaces = $3; 1323 } 1324 } 1325 1326 $para = $self->translate( 1327 $para, $ref, $type, 1328 'wrap' => !$verb, 1329 'wrapcol' => ( 75 - $indent ) 1330 ); 1331 1332 if ($verb) { 1333 $para = $leading_spaces . $para . $trailing_spaces; 1334 } else { 1335 $para =~ s/^\s+//s; 1336 my $toadd = " " x ( $indent + 1 ); 1337 $para =~ s/^/$toadd/mg; 1338 $para .= "\n"; 1339 } 1340 1341 $self->pushline($para); 1342} 1343 13441; 1345 1346=head1 AUTHORS 1347 1348This module is an adapted version of sgmlspl (SGML postprocessor for the 1349ONSGMLS parser) which was: 1350 1351 Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca> 1352 1353The adaptation for po4a was done by: 1354 1355 Denis Barbier <barbier@linuxfr.org> 1356 Martin Quinson (mquinson#debian.org) 1357 1358=head1 COPYRIGHT AND LICENSE 1359 1360 Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca>. 1361 Copyright © 2002-2005 SPI, Inc. 1362 1363This program is free software; you may redistribute it and/or modify it 1364under the terms of GPL (see the COPYING file). 1365