1#!/usr/bin/perl -w
2
3# Po4a::Sgml.pm
4#
5# extract and translate translatable strings from an sgml based document.
6#
7# This code is an adapted version of sgmlspl (SGML postprocessor for the
8#   SGMLS and NSGMLS parsers) which was:
9#
10# Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca>
11#
12# The adaptation for po4a was done by Denis Barbier <barbier@linuxfr.org>,
13# Martin Quinson (mquinson#debian.org) and others.
14#
15# This program is free software; you can redistribute it and/or modify
16# it under the terms of the GNU General Public License as published by
17# the Free Software Foundation; either version 2 of the License, or
18# (at your option) any later version.
19#
20# This program is distributed in the hope that it will be useful,
21# but WITHOUT ANY WARRANTY; without even the implied warranty of
22# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
23# GNU General Public License for more details.
24#
25# You should have received a copy of the GNU General Public License
26# along with this program; if not, write to the Free Software
27# Foundation, Inc.,
28# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
29#
30########################################################################
31
32=encoding UTF-8
33
34=head1 NAME
35
36Locale::Po4a::Sgml - convert SGML documents from/to PO files
37
38=head1 DESCRIPTION
39
40The po4a (PO for anything) project goal is to ease translations (and more
41interestingly, the maintenance of translations) using gettext tools on
42areas where they were not expected like documentation.
43
44Locale::Po4a::Sgml is a module to help the translation of documentation in
45the SGML format into other [human] languages.
46
47This module uses B<onsgmls>(1) to parse the SGML files. Make sure it is
48installed.
49Also make sure that the DTD of the SGML files are installed in the system.
50
51=head1 OPTIONS ACCEPTED BY THIS MODULE
52
53=over 4
54
55=item B<debug>
56
57Space separated list of keywords indicating which part you want to debug. Possible values are: tag, generic, entities and refs.
58
59=item B<verbose>
60
61Give more information about what's going on.
62
63=item B<translate>
64
65Space separated list of extra tags (beside the DTD provided ones) whose
66content should form an extra msgid.
67
68=item B<section>
69
70Space separated list of extra tags (beside the DTD provided ones)
71containing other tags, some of them being of category B<translate>.
72
73=item B<indent>
74
75Space separated list of tags which increase the indentation level.
76
77=item B<verbatim>
78
79The layout within those tags should not be changed. The paragraph won't get
80wrapped, and no extra indentation space or new line will be added for
81cosmetic purpose.
82
83=item B<empty>
84
85Tags not needing to be closed.
86
87=item B<ignore>
88
89Tags ignored and considered as plain char data by po4a. That is to say that
90they can be part of an msgid. For example, E<lt>bE<gt> is a good candidate
91for this category since putting it in the translate section would create
92msgids not being whole sentences, which is bad.
93
94=item B<attributes>
95
96A space separated list of attributes that need to be translated. You can
97specify the attributes by their name (for example, "lang"), but you can also
98prefix it with a tag hierarchy, to specify that this attribute will only be
99translated when it is into the specified tag. For example:
100E<lt>bbbE<gt>E<lt>aaaE<gt>lang specifies that the lang attribute will only be
101translated if it is in an E<lt>aaaE<gt> tag, which is in a E<lt>bbbE<gt> tag.
102The tag names are actually regular expressions so you can also write things
103like E<lt>aaa|bbbbE<gt>lang to only translate lang attributes that are in
104an E<lt>aaaE<gt> or a E<lt>bbbE<gt> tag.
105
106=item B<qualify>
107
108A space separated list of attributes for which the translation must be
109qualified by the attribute name. Note that this setting automatically adds the
110given attribute into the 'attributes' list too.
111
112=item B<force>
113
114Proceed even if the DTD is unknown or if onsgmls finds errors in the input
115file.
116
117=item B<include-all>
118
119By default, msgids containing only one entity (like '&version;') are skipped
120for the translator comfort. Activating this option prevents this
121optimisation. It can be useful if the document contains a construction like
122"<title>&Aacute;</title>", even if I doubt such things to ever happen...
123
124=item B<ignore-inclusion>
125
126Space separated list of entities that won't be inlined.
127Use this option with caution: it may cause onsgmls (used internally) to add
128tags and render the output document invalid.
129
130=back
131
132=head1 STATUS OF THIS MODULE
133
134The result is perfect. I.e., the generated documents are exactly the
135same. But there are still some problems:
136
137=over 2
138
139=item *
140
141The error output of onsgmls is redirected to /dev/null by default, which is clearly
142bad. I don't know how to avoid that.
143
144The problem is that I have to "protect" the conditional inclusions (i.e. the
145C<E<lt>! [ %foo [> and C<]]E<gt>> stuff) from onsgmls. Otherwise
146onsgmls eats them, and I don't know how to restore them in the final
147document. To prevent that, I rewrite them to C<{PO4A-beg-foo}> and
148C<{PO4A-end}>.
149
150The problem with this is that the C<{PO4A-end}> and such I add are invalid in
151the document (not in a E<lt>pE<gt> tag or so).
152
153If you want to view the onsgmls output, just add the following to your command line (or po4a configuration line):
154
155  -o debug=onsgmls
156
157=item *
158
159It does work only with the DebianDoc and DocBook DTD. Adding support for a
160new DTD should be very easy. The mechanism is the same for every DTD, you just
161have to give a list of the existing tags and some of their characteristics.
162
163I agree, this needs some more documentation, but it is still considered as
164beta, and I hate to document stuff which may/will change.
165
166=item *
167
168Warning, support for DTDs is quite experimental. I did not read any
169reference manual to find the definition of every tag. I did add tag
170definition to the module 'till it works for some documents I found on the
171net. If your document use more tags than mine, it won't work. But as I said
172above, fixing that should be quite easy.
173
174I did test DocBook against the SAG (System Administrator Guide) only, but
175this document is quite big, and should use most of the DocBook
176specificities.
177
178For DebianDoc, I tested some of the manuals from the DDP, but not all yet.
179
180=item *
181
182In case of file inclusion, string reference of messages in PO files
183(i.e. lines like C<#: en/titletoc.sgml:9460>) will be wrong.
184
185This is because I preprocess the file to protect the conditional inclusion
186(i.e. the C<E<lt>! [ %foo [> and C<]]E<gt>> stuff) and some entities (like
187&version;) from onsgmls because I want them verbatim to the generated
188document. For that, I make a temp copy of the input file and do all the
189changes I want to this before passing it to onsgmls for parsing.
190
191So that it works, I replace the entities asking for a file inclusion by the
192content of the given file (so that I can protect what needs to be in a subfile
193also). But nothing is done so far to correct the references (i.e., filename
194and line number) afterward. I'm not sure what the best thing to do is.
195
196=back
197
198=cut
199
200package Locale::Po4a::Sgml;
201
202use 5.006;
203use strict;
204use warnings;
205
206require Exporter;
207use vars qw(@ISA @EXPORT);
208@ISA    = qw(Locale::Po4a::TransTractor);
209@EXPORT = qw();
210
211use Locale::Po4a::TransTractor;
212use Locale::Po4a::Common;
213
214eval qq{use SGMLS};
215if ($@) {
216    die wrap_mod(
217        "po4a::sgml",
218        dgettext(
219            "po4a",
220            "The needed module SGMLS.pm was not found and needs to be installed. It can be found on the CPAN, in package libsgmls-perl on debian, etc."
221        )
222    );
223}
224
225use File::Temp;
226
227my %debug = (
228    'tag'      => 0,
229    'generic'  => 0,
230    'entities' => 0,
231    'refs'     => 0,
232    'onsgmls'  => 0
233);
234
235my $xmlprolog = undef;    # the '<?xml ... ?>' line if existing
236
237sub initialize {
238    my $self    = shift;
239    my %options = @_;
240
241    $self->{options}{'translate'}        = '';
242    $self->{options}{'section'}          = '';
243    $self->{options}{'indent'}           = '';
244    $self->{options}{'empty'}            = '';
245    $self->{options}{'verbatim'}         = '';
246    $self->{options}{'ignore'}           = '';
247    $self->{options}{'ignore-inclusion'} = '';
248
249    $self->{options}{'include-all'} = '';
250
251    $self->{options}{'force'} = '';
252
253    $self->{options}{'verbose'} = '';
254    $self->{options}{'debug'}   = '';
255
256    foreach my $opt ( keys %options ) {
257        if ( $options{$opt} ) {
258            die wrap_mod( "po4a::sgml", dgettext( "po4a", "Unknown option: %s" ), $opt )
259              unless exists $self->{options}{$opt};
260            $self->{options}{$opt} = $options{$opt};
261        }
262    }
263    if ( $options{'debug'} ) {
264        foreach ( split /\s+/, $options{'debug'} ) {
265            die wrap_mod( "po4a::sgml", dgettext( "po4a", "Unknown debug category: %s. Known categories:\n%s" ),
266                $_, join( " ", keys %debug ) )
267              unless exists $debug{$_};
268            $debug{$_} = 1;
269        }
270    }
271}
272
273sub read {
274    my ( $self, $filename, $refname ) = @_;
275
276    push @{ $self->{DOCPOD}{infile} }, $filename;
277    $self->Locale::Po4a::TransTractor::read( $filename, $refname );
278}
279
280sub parse {
281    my $self = shift;
282    map { $self->parse_file($_) } @{ $self->{DOCPOD}{infile} };
283}
284
285#
286# Filter out some uninteresting strings for translation
287#
288sub translate {
289    my ($self) = (shift);
290    my ( $string, $ref, $type ) = ( shift, shift, shift );
291    my (%options) = @_;
292
293    # don't translate entries composed of one entity
294    if ( ( ( $string =~ /^&[^;]*;$/ ) || ( $options{'wrap'} && $string =~ /^\s*&[^;]*;\s*$/ ) )
295        && !( $self->{options}{'include-all'} ) )
296    {
297        warn wrap_mod( "po4a::sgml", dgettext( "po4a", "msgid skipped to help translators (contains only an entity)" ) )
298          unless $self->verbose() <= 0;
299        return $string . ( $options{'wrap'} ? "\n" : "" );
300    }
301
302    # don't translate entries composed of tags only
303    if ( $string =~ /^(((<[^>]*>)|\s)*)$/
304        && !( $self->{options}{'include-all'} ) )
305    {
306        warn wrap_mod( "po4a::sgml", dgettext( "po4a", "msgid skipped to help translators (contains only tags)" ) )
307          unless $self->verbose() <= 0;
308        return $string . ( $options{'wrap'} ? "\n" : "" );
309    }
310
311    # don't translate entries composed of marked section tags only
312    if ( ( $string =~ /^(?:<!\s*\[\s*[^\[]+\s*\[|\]\s*]\s*>|\s)*$/ )
313        && !( $self->{options}{'include-all'} ) )
314    {
315        warn wrap_mod(
316            "po4a::sgml",
317            dgettext(
318                "po4a",
319                "msgid skipped to "
320                  . "help translators (contains only opening or closing "
321                  . "tags of marked sections)"
322            ),
323            $string
324        ) unless $self->verbose() <= 0;
325        return $string . ( $options{'wrap'} ? "\n" : "" );
326    }
327
328    $string = $self->SUPER::translate( $string, $ref, $type, %options );
329
330    $string = $self->post_trans( $string, $ref, $type );
331
332    return $string;
333}
334
335sub post_trans {
336    my ( $self, $str, $ref, $type ) = @_;
337
338    # Change ascii non-breaking space to an &nbsp;
339    my $nbs_out    = "\xA0";
340    my $enc_length = Encode::from_to( $nbs_out, "latin1", $self->get_out_charset );
341    $str =~ s/\Q$nbs_out/&nbsp;/g if defined $enc_length;
342
343    return $str;
344}
345
346#
347# Make sure our cruft is removed from the file
348#
349sub pushline {
350    my ( $self, $line ) = @_;
351    $line =~ s/{PO4A-amp}/&/g;
352    $self->SUPER::pushline($line);
353}
354
355sub set_tags_kind {
356    my $self = shift;
357    my (%kinds) = @_;
358
359    foreach (qw(translate empty section verbatim ignore attributes qualify)) {
360        $self->{SGML}->{k}{$_} = $self->{options}{$_} ? $self->{options}{$_} . ' ' : '';
361
362        # Remove the default behavior for the tags defined with the
363        # options.
364        foreach my $k ( keys %kinds ) {
365            foreach my $t ( split( " ", $self->{SGML}->{k}{$_} ) ) {
366                $kinds{$k} =~ s/\b$t\b//;
367            }
368        }
369    }
370
371    foreach ( keys %kinds ) {
372        die "po4a::sgml: internal error: set_tags_kind called with unrecognized arg $_"
373          if ( $_ !~ /^(translate|empty|verbatim|ignore|indent|attributes|qualify)$/ );
374
375        $self->{SGML}->{k}{$_} .= $kinds{$_};
376    }
377}
378
379#
380# Do the actual work, using the SGMLS package and settings done elsewhere.
381#
382sub parse_file {
383    my ( $self, $mastername ) = @_;
384    my ($prolog);
385
386    # Rewrite the file to:
387    #   - protect optional inclusion marker (i.e. "<![ %str [" and "]]>")
388    #   - protect entities from expansion (ie "&release;")
389    my $origfile = "";
390    my $i        = 0;
391    while ( $i < @{ $self->{TT}{doc_in} } ) {
392        $origfile .= ${ $self->{TT}{doc_in} }[$i];
393        $i += 2;
394    }
395
396    unless ( $self->{options}{'force'} ) {
397
398        # Detect if we can find the DTD
399        my ( $tmpfh, $tmpfile ) = File::Temp::tempfile(
400            "po4a-XXXX",
401            SUFFIX => ".sgml",
402            DIR    => $ENV{TMPDIR} || "/tmp",
403            UNLINK => 0
404        );
405        print $tmpfh $origfile;
406        close $tmpfh
407          or die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot close tempfile: %s" ), $! );
408        if ( system("onsgmls -p < $tmpfile") ) {
409            unlink($tmpfile);
410            die wrap_mod(
411                "po4a::sgml",
412                dgettext(
413                    "po4a", "Error while running onsgmls -p.  Please check if onsgmls and the DTD are installed."
414                )
415            );
416        }
417        unlink($tmpfile);
418    }
419
420    # Detect the XML pre-prolog
421    if ( $origfile =~ s/^(\s*<\?xml[^?]*\?>)// ) {
422        warn wrap_mod(
423            "po4a::sgml",
424            dgettext(
425                "po4a",
426                "Trying to handle a XML document as a SGML one. "
427                  . "Feel lucky if it works, help us implementing a proper XML backend if it does not."
428            ),
429            $mastername
430        ) unless $self->verbose() <= 0;
431        $xmlprolog = $1;
432    }
433
434    # Get the prolog
435    {
436        $prolog = $origfile;
437        my $lvl;    # number of '<' seen without matching '>'
438        my $pos = 0;    # where in the document (in chars) while detecting prolog boundaries
439
440        unless ( $prolog =~ s/^(.*<!DOCTYPE).*$/$1/is ) {
441            die wrap_mod(
442                "po4a::sgml",
443                dgettext(
444                    "po4a",
445                    "This file is not a master SGML document (no DOCTYPE). "
446                      . "It may be a file to be included by another one, in which case it should not be passed to po4a directly. Text from included files is extracted/translated when handling the master file including them."
447                )
448            );
449        }
450        $pos += length($prolog);
451        $lvl = 1;
452        while ( $lvl != 0 ) {
453
454            # Eat comments in the prolog, since there may be some '>' or '<' in them.
455            if ( $origfile =~ m/^.{$pos}(<!--.*?-->)/s ) {
456                print "Found a comment in the prolog: $1\n" if ( $debug{'generic'} );
457                $pos += length($1);
458
459                # take care of the line numbers
460                my @a = split( /\n/, $1 );
461                shift @a;    # nb line - 1
462                while ( defined( shift @a ) ) {
463                    $prolog .= "\n";
464                }
465                next;
466            }
467
468            # Search the closing '>'
469            my ($c) = substr( $origfile, $pos, 1 );
470            $lvl++ if ( $c eq '<' );
471            $lvl-- if ( $c eq '>' );
472            $prolog = "$prolog$c";
473            $pos++;
474        }
475    }
476
477    # Add the definition of new tags that will be used for the
478    # conditionnal inclusions
479    if ( $origfile =~ /^.*<!DOCTYPE[^[>]*\[/is ) {
480        $origfile =~
481          s/^(.*<!DOCTYPE[^[>]*\[)/$1 <!ELEMENT PO4ABEG - o empty> <!ATTLIST PO4ABEG name CDATA #REQUIRED> <!ELEMENT PO4AEND - o empty>/is;
482    }
483
484    print STDERR "PROLOG=$prolog\n------------\n" if ( $debug{'generic'} );
485
486    # Configure the tags for this dtd
487    if ( $prolog =~ /debiandoc/i ) {
488        $self->set_tags_kind(
489            "translate" => "author version abstract title" . "date copyrightsummary heading p " . "example tag title",
490            "empty"     => "date ref manref url toc",
491            "verbatim"  => "example",
492            "ignore"    => "package prgn file tt em var "
493              . "name email footnote po4aend po4abeg "
494              . "strong ftpsite ftppath qref",
495            "indent" => "appendix " . "book "
496              . "chapt copyright "
497              . "debiandoc "
498              . "enumlist " . "item " . "list "
499              . "sect sect1 sect2 sect3 sect4 "
500              . "tag taglist titlepag toc"
501        );
502
503    } elsif ( $prolog =~ /docbook/i ) {
504        $self->set_tags_kind(
505            "translate" => "abbrev appendixinfo artheader attribution "
506              . "biblioentry biblioset "
507              . "chapterinfo collab collabname confdates confgroup conftitle " . "date "
508              . "edition editor entry example "
509              . "figure "
510              . "glosssee glossseealso glossterm "
511              . "holder "
512              . "member msgaud msglevel msgorig "
513              . "orgdiv orgname othername "
514              . "pagenums para phrase pubdate publishername primary "
515              . "refclass refdescriptor refentrytitle refmiscinfo refname refpurpose releaseinfo remark revnumber revremark "
516              . "screeninfo seg secondary see seealso segtitle simpara substeps subtitle synopfragmentref synopsis "
517              . "term tertiary title titleabbrev "
518              . "contrib epigraph",
519            "empty"  => "audiodata colspec graphic imagedata textdata sbr spanspec videodata xref",
520            "indent" => "abstract answer appendix article articleinfo audioobject author authorgroup "
521              . "bibliodiv bibliography blockquote blockinfo book bookinfo bridgehead "
522              . "callout calloutlist caption caution chapter copyright "
523              . "dedication docinfo "
524              . "entry "
525              . "formalpara "
526              . "glossary glossdef glossdiv glossentry glosslist group "
527              . "imageobject important index indexterm informaltable itemizedlist "
528              . "keyword keywordset "
529              . "legalnotice listitem lot "
530              . "mediaobject msg msgentry msginfo msgexplan msgmain msgrel msgsub msgtext " . "note "
531              . "objectinfo orderedlist "
532              . "part partintro preface procedure publisher "
533              . "qandadiv qandaentry qandaset question "
534              . "reference refentry refentryinfo refmeta refnamediv refsect1 refsect1info refsect2 refsect2info refsect3 refsect3info refsection refsectioninfo refsynopsisdiv refsynopsisdivinfo revision revdescription row "
535              . "screenshot sect1 sect1info sect2 sect2info sect3 sect3info sect4 sect4info sect5 sect5info section sectioninfo seglistitem segmentedlist set setindex setinfo shortcut simplelist simplemsgentry simplesect step synopfragment "
536              . "table tbody textobject tgroup thead tip toc "
537              . "variablelist varlistentry videoobject "
538              . "warning",
539            "verbatim" => "address cmdsynopsis holder literallayout programlisting "
540              . "refentrytitle refname refpurpose screen term title",
541            "ignore" => "acronym action affiliation anchor application arg author authorinitials "
542              . "city citation citerefentry citetitle classname co command computeroutput constant corpauthor country "
543              . "database po4abeg po4aend "
544              . "email emphasis envar errorcode errorname errortext errortype exceptionname "
545              . "filename firstname firstterm footnote footnoteref foreignphrase function "
546              . "glossterm guibutton guiicon guilabel guimenu guimenuitem guisubmenu "
547              . "hardware "
548              . "indexterm informalexample inlineequation inlinegraphic inlinemediaobject interface interfacename isbn "
549              . "keycap keycode keycombo keysym "
550              . "link lineannotation literal "
551              . "manvolnum markup medialabel menuchoice methodname modespec mousebutton "
552              . "nonterminal "
553              . "olink ooclass ooexception oointerface option optional othercredit "
554              . "parameter personname phrase productname productnumber prompt property pubsnumber "
555              . "quote "
556              . "remark replaceable returnvalue revhistory "
557              . "sgmltag sidebar structfield structname subscript superscript surname symbol systemitem "
558              . "token trademark type "
559              . "ulink userinput "
560              . "varname volumenum "
561              . "wordasword " . "xref " . "year",
562            "attributes" => "<(article|book)>lang"
563        );
564
565    } else {
566        if ( $self->{options}{'force'} ) {
567            warn wrap_mod( "po4a::sgml",
568                dgettext( "po4a", "DTD of this file is unknown, but proceeding as requested." ) );
569            $self->set_tags_kind();
570        } else {
571            die wrap_mod( "po4a::sgml",
572                dgettext( "po4a", "DTD of this file is unknown. (supported: DebianDoc, DocBook). The prolog follows:" )
573                  . "\n$prolog" );
574        }
575    }
576
577    # Hash of the file entities that won't be included
578    my %ignored_inclusion = ();
579    foreach ( split / /, $self->{options}{'ignore-inclusion'} ) {
580        $ignored_inclusion{$_} = 1;
581    }
582
583    # Prepare the reference indirection stuff
584    my @refs;
585    my $length = ( $origfile =~ tr/\n/\n/ );
586    print "XX Prepare reference indirection stuff\n" if $debug{'refs'};
587    for ( my $i = 1 ; $i <= $length ; $i++ ) {
588        push @refs, "$mastername:$i";
589        print "$mastername:$i\n" if $debug{'refs'};
590    }
591
592    # protect the conditional inclusions in the file
593    $origfile =~ s/<!\[\s*IGNORE\s*\[/{PO4A-beg-IGNORE}/g;          # cond. incl. starts
594    $origfile =~ s/<!\[\s*CDATA\s*\[/{PO4A-beg-CDATA}/g;            # cond. incl. starts
595    $origfile =~ s/<!\[\s*RCDATA\s*\[/{PO4A-beg-RCDATA}/g;          # cond. incl. starts
596    $origfile =~ s/<!\[\s*([^\[\s]+)\s*\[/<po4abeg name="$1">/g;    # cond. incl. starts
597    $origfile =~ s/\]\]>/<po4aend>/g;                               # cond. incl. end
598
599    # Remove <![ IGNORE [ sections
600    # FIXME: we don't support included PO4A-beg-
601    my $tmp1 = $origfile;
602    while ( $tmp1 =~ m/^(.*?)(\{PO4A-beg-\s*IGNORE\s*}(?:.+?)<po4aend>)(.*)$/s ) {
603        my ( $begin, $ignored, $end ) = ( $1, $2, $3 );
604        my @begin   = split( /\n/, $begin );
605        my @ignored = split( /\n/, $ignored );
606        my $pre     = scalar @begin;
607        my $len     = ( scalar @ignored ) - 1;
608        $pre++ if ( $begin =~ /\n$/s );
609        $len++ if ( $end   =~ /^\n/s );
610
611        # remove the references of the ignored lines
612        splice @refs, $pre + 1, $len - 1;
613
614        # remove the lines
615        $tmp1 = $begin . $end;
616    }
617    $origfile = $tmp1;
618
619    # The <, >, and & in a CDATA must be escaped because they do not
620    # correspond to tags or entities delimiters.
621    $tmp1     = $origfile;
622    $origfile = "";
623    while ( $tmp1 =~ m/^(.*?{PO4A-beg-\s*(?:CDATA|RCDATA)\s*})(.+?)(<po4aend>.*)$/s ) {
624        my ( $begin, $tmp ) = ( $1, $2 );
625        $tmp1 = $3;
626        $tmp =~ s/</{PO4A-lt}/gs;
627        $tmp =~ s/>/{PO4A-gt}/gs;
628        $tmp =~ s/&/{PO4A-amp}/gs;
629        $origfile .= $begin . $tmp;
630    }
631    $origfile .= $tmp1;
632
633    # Deal with the %entities; in the prolog. God damn it, this code is gross!
634    # Try hard not to change the number of lines to not fuck up the references
635    my %prologentincl;
636    my $moretodo = 1;
637  PROLOGENTITY: while ($moretodo) {    # non trivial loop to deal with recursive inclusion
638        $moretodo = 0;
639
640        # Unprotect not yet defined inclusions
641        $prolog =~ s/{PO4A-percent}/%/sg;
642        print STDERR "prolog=>>>>$prolog<<<<\n"
643          if ( $debug{'entities'} );
644        while ( $prolog =~ /(.*?)<!ENTITY\s*%\s*(\S*)\s+SYSTEM\s*"([^>"]*)"\s*>(.*)$/is ) {    #})"{ (Stupid editor)
645            print STDERR "Seen the definition entity of prolog inclusion '$2' (=$3)\n"
646              if ( $debug{'entities'} );
647
648            # Preload the content of the entity.
649            my $key          = $2;
650            my $filename     = $3;
651            my $origfilename = $filename;
652            my ( $begin, $end ) = ( $1, $4 );
653            if ( $filename !~ m%^/% && $mastername =~ m%/% ) {
654                my $dir = $mastername;
655                $dir =~ s%/[^/]*$%%;
656                $filename = "$dir/$filename";
657
658                # origfile also needs to be fixed otherwise onsgmls won't
659                # find the file.
660                $origfile =~ s/(<!ENTITY\s*%\s*\Q$key\E\s+SYSTEM\s*")\Q$origfilename\E("\s*>)/$1$filename$2/gsi;
661            }
662            if ( defined $ignored_inclusion{$key} or !-e $filename ) {
663
664                # We won't expand this entity.
665                # And we avoid onsgmls to do so.
666                $prolog = "$begin<!--{PO4A-ent-beg-$key}$filename" . "{PO4A-ent-end}-->$end";
667            } else {
668                $prolog = $begin . $end;
669                ( -e $filename && open IN, "<$filename" )
670                  || die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot open %s (content of entity %s%s;): %s" ),
671                    $filename, '%', $key, $! );
672                local $/ = undef;
673                $prologentincl{$key} = <IN>;
674                close IN;
675                print STDERR "Content of \%$key; is $filename ("
676                  . ( $prologentincl{$key} =~ tr/\n/\n/ )
677                  . " lines long)\n"
678                  if ( $debug{'entities'} );
679                print STDERR "content: " . $prologentincl{$key} . "\n"
680                  if ( $debug{'entities'} );
681                $moretodo = 1;
682                next PROLOGENTITY;
683            }
684        }
685        while ( $prolog =~ /(.*?)<!ENTITY\s*%\s*(\S*)\s*"([^>"]*)"\s*>(.*)$/is ) {    #})"{ (Stupid editor)
686            print STDERR "Seen the definition entity of prolog definition '$2' (=$3)\n"
687              if ( $debug{'entities'} );
688
689            # Preload the content of the entity.
690            my $key = $2;
691            $prolog = $1 . $4;
692            $prologentincl{$key} = $3;
693            print STDERR "content: " . $prologentincl{$key} . "\n"
694              if ( $debug{'entities'} );
695            $moretodo = 1;
696            next PROLOGENTITY;
697        }
698        while ( $prolog =~ /^(.*?)%([^;\s]*);(.*)$/s ) {
699            my ( $pre, $ent, $post ) = ( $1, $2, $3 );
700
701            # Yeah, right, the content of the entity can be defined in a not yet loaded entity
702            # It's easy to build a weird case where all that shit collapses poorly. But why the
703            # hell are you using those strange constructs in your document, damn it?
704            print STDERR "Seen prolog inclusion $ent\n" if ( $debug{'entities'} );
705            if ( defined( $prologentincl{$ent} ) ) {
706                $prolog = $pre . $prologentincl{$ent} . $post;
707                print STDERR "Change \%$ent; to its content in the prolog\n"
708                  if $debug{'entities'};
709                $moretodo = 1;
710            } else {
711
712                # AAAARGH stupid document using %bla; and having then defined in another inclusion!
713                # Protect it for this pass, and unprotect it on next one
714                print STDERR "entity $ent not defined yet ?!\n"
715                  if $debug{'entities'};
716                $prolog = "$pre" . '{PO4A-percent}' . "$ent;$post";
717            }
718        }
719    }
720    $prolog =~ s/<!--\{PO4A-ent-beg-(.*?)\}(.*?)\{PO4A-ent-end\}-->/<!ENTITY % $1 SYSTEM "$2">/g;
721
722    # Unprotect undefined inclusions, and die of them
723    $prolog =~ s/\{PO4A-percent\}/%/sg;
724    if ( $prolog =~ /%([^;\s]*);/ ) {
725        die wrap_mod( "po4a::sgml", dgettext( "po4a", "unrecognized prolog inclusion entity: %%%s;" ), $1 )
726          unless ( $ignored_inclusion{$1} );
727    }
728
729    # Protect &entities; (all but the ones asking for a file inclusion)
730    #   search the file inclusion entities
731    my %entincl;
732    my $searchprolog = $prolog;
733    while ( $searchprolog =~ /(.*?)<!ENTITY\s+(\S*)\s+SYSTEM\s*"([^>"]*)"\s*>(.*)$/is ) {    #})"{
734        print STDERR "Seen the entity of inclusion $2 (=$3)\n"
735          if ( $debug{'entities'} );
736        my $key          = $2;
737        my $filename     = $3;
738        my $origfilename = $filename;
739        $searchprolog = $4;
740        if ( $filename !~ m%^/% && $mastername =~ m%/% ) {
741            my $dir = $mastername;
742            $dir =~ s%/[^/]*$%%;
743            $filename = "$dir/$filename";
744
745            # origfile also needs to be fixed otherwise onsgmls won't find
746            # the file.
747            $origfile =~ s/(<!ENTITY\s+$key\s+SYSTEM\s*")\Q$origfilename\E("\s*>)/$1$filename$2/gsi;
748        }
749        if ( ( not defined $ignored_inclusion{$2} ) and ( -e $filename ) ) {
750            $entincl{$key}{'filename'} = $filename;
751
752            # Preload the content of the entity
753            ( -e $filename && open IN, "<$filename" )
754              || die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot open %s (content of entity %s%s;): %s" ),
755                $filename, '&', $key, $! );
756            local $/ = undef;
757            $entincl{$key}{'content'} = <IN>;
758            close IN;
759            $entincl{$key}{'length'} = ( $entincl{$key}{'content'} =~ tr/\n/\n/ );
760            print STDERR "read $filename (content of \&$key;, $entincl{$key}{'length'} lines long)\n"
761              if ( $debug{'entities'} );
762        }
763    }
764
765    #   Change the entities including files in the document
766    my $dosubstitution = 1;
767    while ($dosubstitution) {
768        $dosubstitution = 0;
769        foreach my $key ( keys %entincl ) {
770
771            # The external entity can be referenced as &key; or &key
772            # In the second case, we must differentiate &key and &key2
773            while ( $origfile =~ /^(.*?)&$key(;.*$|[^-_:.A-Za-z0-9].*$|$)/s ) {
774
775                # Since we will include a new file, we
776                # must do a new round of substitutions.
777                $dosubstitution = 1;
778                my ( $begin, $end ) = ( $1, $2 );
779                $end = "" unless ( defined $end );
780                $end =~ s/^;//s;
781
782                if ( $begin =~ m/.*<!--(.*?)$/s and $1 !~ m/-->/s ) {
783
784                    # This entity is commented. Just remove it.
785                    $origfile = $begin . $end;
786                    next;
787                }
788
789                # add the refs
790                my $len  = $entincl{$key}{'length'};    # number added by the inclusion
791                my $pre  = ( $begin =~ tr/\n/\n/ );     # number of \n
792                my $post = ( $end =~ tr/\n/\n/ );
793                print "XX Add a ref. pre=$pre; len=$len; post=$post\n"
794                  if $debug{'refs'};
795
796                # Keep a reference of inclusion position in main file
797                my $main = $refs[$pre];
798
799                # Remove the references for the lines after the inclusion
800                # point.
801                my @endrefs = splice @refs, $pre + 1;
802
803                # Add the references of the added lines
804                my $i;
805                for ( $i = 0 ; $i < $len ; $i++ ) {
806                    $refs[ $i + $pre ] = "$main $entincl{$key}{'filename'}:" . ( $i + 1 );
807                }
808
809                if ( $begin !~ m/\n[ \t]*$/s ) {
810                    if ( $entincl{$key}{'content'} =~ m/^[ \t]*\n/s ) {
811
812                        # There is nothing in the first line of the
813                        # included file, and something on the line before
814                        # the inclusion The line reference will be more
815                        # informative like this:
816                        $refs[$pre] = $main;
817                    }
818                }
819                if ( $end !~ s/^[ \t]*\n//s ) {
820                    if ( $entincl{$key}{'content'} =~ m/\n[ \t]*$/s ) {
821
822                        # There is something on the line after the
823                        # inclusion, and there is an end of line at the
824                        # end of the included file. We must add the line
825                        # reference of the remainder on the line:
826                        push @refs, $main;
827                    }
828                }
829
830                # Append the references removed earlier (lines after the
831                # inclusion point).
832                push @refs, @endrefs;
833
834                # Do the substitution
835                $origfile = "$begin" . $entincl{$key}{'content'} . "$end";
836                print STDERR "substitute $key\n" if ( $debug{'entities'} );
837            }
838        }
839    }
840    $origfile =~ s/\G(.*?)&([A-Za-z_:][-_:.A-Za-z0-9]*|#[0-9]+|#x[0-9a-fA-F]+)\b/$1\{PO4A-amp\}$2/gs;
841    if ( defined($xmlprolog) && length($xmlprolog) ) {
842        $origfile =~ s/\/>/\{PO4A-close\}>/gs;
843    }
844
845    if ( $debug{'refs'} ) {
846        print "XX Resulting shifts\n";
847        for ( my $i = 0 ; $i < scalar @refs ; $i++ ) {
848            print "$mastername:" . ( $i + 1 ) . " -> $refs[$i]\n";
849        }
850    }
851
852    my ( $tmpfh, $tmpfile ) = File::Temp::tempfile(
853        "po4a-XXXX",
854        SUFFIX => ".sgml",
855        DIR    => $ENV{TMPDIR} || "/tmp",
856        UNLINK => 0
857    );
858    print $tmpfh $origfile;
859    close $tmpfh or die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot close tempfile: %s" ), $! );
860
861    my $cmd = "onsgmls -l -E 0 -wno-valid < $tmpfile" . ( $debug{'onsgmls'} ? "" : " 2>/dev/null" ) . " |";
862    print STDERR "CMD=$cmd\n" if ( $debug{'generic'} or $debug{'onsgmls'} );
863
864    open( IN, $cmd ) || die wrap_mod( "po4a::sgml", dgettext( "po4a", "Cannot run onsgmls: %s" ), $! );
865
866    # The kind of tags
867    my ( %translate, %empty, %verbatim, %indent, %exist, %attribute, %qualify );
868    foreach ( split( / /, ( $self->{SGML}->{k}{'translate'} || '' ) ) ) {
869        $translate{ uc $_ } = 1;
870        $indent{ uc $_ }    = 1;
871        $exist{ uc $_ }     = 1;
872    }
873    foreach ( split( / /, ( $self->{SGML}->{k}{'empty'} || '' ) ) ) {
874        $empty{ uc $_ } = 1;
875        $exist{ uc $_ } = 1;
876    }
877    foreach ( split( / /, ( $self->{SGML}->{k}{'verbatim'} || '' ) ) ) {
878        $translate{ uc $_ } = 1;
879        $verbatim{ uc $_ }  = 1;
880        $exist{ uc $_ }     = 1;
881    }
882    foreach ( split( / /, ( $self->{SGML}->{k}{'indent'} || '' ) ) ) {
883        $translate{ uc $_ } = 1;
884        $indent{ uc $_ }    = 1;
885        $exist{ uc $_ }     = 1;
886    }
887    foreach ( split( / /, ( $self->{SGML}->{k}{'ignore'} ) || '' ) ) {
888        $exist{ uc $_ } = 1;
889    }
890    foreach ( split( / /, ( $self->{SGML}->{k}{'attributes'} || '' ) ) ) {
891        my ( $attr, $tags );
892        if (m/(^.*>)(\w+)/) {
893            $attr = uc $2;
894            $tags = $1;
895        } else {
896            $attr = uc $_;
897            $tags = ".*";
898        }
899        if ( exists $attribute{$attr} ) {
900            $attribute{$attr} .= "|$tags";
901        } else {
902            $attribute{$attr} = $tags;
903        }
904    }
905    foreach ( split( / /, ( $self->{SGML}->{k}{'qualify'} ) || '' ) ) {
906        $qualify{ uc $_ }   = 1;
907        $attribute{ uc $_ } = '.*' unless exists $attribute{ uc $_ };
908    }
909
910    # What to do before parsing
911
912    # push the XML prolog if existing
913    $self->pushline( $xmlprolog . "\n" ) if ( defined($xmlprolog) && length($xmlprolog) );
914
915    # Put the prolog into the file, allowing for entity definition translation
916    #  <!ENTITY myentity "definition_of_my_entity">
917    # and push("<!ENTITY myentity \"".$self->translate("definition_of_my_entity")
918    if ( $prolog =~ m/(.*?\[)(.*)(\]>)/s ) {
919        warn "Pre=~~$1~~;Post=~~$3~~\n" if ( $debug{'entities'} );
920        $self->pushline( $1 . "\n" )    if ( length($1) );
921        $prolog = $2;
922        my ($post) = $3;
923        while ( $prolog =~ m/^(.*?)<!ENTITY\s+(\S*)\s+"([^"]*)"\s*>(.*)$/is ) {    #" ){
924            $self->pushline($1) if length($1);
925            $self->pushline( "<!ENTITY $2 \"" . $self->translate( $3, "", "definition of entity \&$2;" ) . "\">" );
926            warn "Seen text entity $2\n" if ( $debug{'entities'} );
927            $prolog = $4;
928        }
929        $prolog .= $post;
930        $self->pushline( $prolog . "\n" ) if ( length($prolog) );
931    } else {
932        warn "No entity declaration detected in ~~$prolog~~...\n" if ( $debug{'entities'} );
933        $self->pushline($prolog)                                  if length($prolog);
934    }
935
936    # The parse object.
937    # Damn SGMLS. It makes me do crude things.
938    no strict "subs";
939    my $parse = new SGMLS(IN);
940    use strict;
941
942    # Some values for the parsing
943    my @open = ();    # opened translation container tags
944    my $verb = 0;     # can we wrap or not
945    my $verb_last_ref;
946    my $seenfootnote = 0;
947    my $indent       = 0;     # indent level
948    my $lastchar     = '';    #
949    my $buffer       = "";    # what we will soon handle
950
951    # Keep a reference to the last line indicated by onsgmls
952    my $line = 0;
953
954    # Unfortunately, onsgmls do not mention all the line changes.  We have
955    # to keep track of the number of lines seen in the "record ends".
956    my $adds = 0;
957
958    # If the last line received contains only spaces, do not take it into
959    # account for the line reference of the paragraph.
960    my $empty_last_cdata = 0;
961
962    # run the appropriate handler for each event
963  EVENT: while ( my $event = $parse->next_event ) {
964
965        # get the line reference to build po entries
966        if ( $line != $parse->line ) {
967
968            # onsgmls informs us of that the line changed. Reset $adds and
969            # $empty_last_cdata
970            $adds             = 0;
971            $empty_last_cdata = 0;
972            $line             = $parse->line;
973        }
974        my $ref = $refs[ $parse->line - 1 + $adds - $empty_last_cdata ];
975
976        # In verbatim mode, keep the current line reference.
977        if ($verb) {
978            $ref = $refs[ $parse->line - 1 ];
979        }
980        my $type;
981
982        if ( $event->type eq 'start_element' ) {
983            die wrap_ref_mod( $ref, "po4a::sgml", dgettext( "po4a", "Unknown tag %s" ), $event->data->name )
984              unless $exist{ $event->data->name };
985
986            $lastchar = ">";
987
988            # Which tag did we see?
989            my $tag = '';
990            $tag .= '<' . lc( $event->data->name() );
991            foreach my $attr ( sort $event->data->attribute_names() ) {
992
993                my $val   = ${ $event->data->attributes() }{$attr};
994                my $value = $val->value();
995
996                #                if ($val->type() eq 'IMPLIED') {
997                #                    $tag .= ' '.lc($attr).'="'.lc($attr).'"';
998                #                } els
999                if (   $val->type() eq 'CDATA'
1000                    || $val->type() eq 'IMPLIED' )
1001                {
1002                    if ( defined $value && length($value) ) {
1003                        my $lattr = lc $attr;
1004                        my $uattr = uc $attr;
1005                        if ( exists $attribute{$uattr} ) {
1006                            my $context = "";
1007                            foreach my $o (@open) {
1008                                next if ( !defined $o or $o =~ m%^</% );
1009                                $o =~ s/ .*/>/;
1010                                $context .= $o;
1011                            }
1012                            $context = join( "", $context, "<", lc( $event->data->name() ), ">" );
1013                            if ( $context =~ /^($attribute{$uattr})$/ ) {
1014                                if ( $qualify{$uattr} ) {
1015                                    my $translated =
1016                                      $self->translate( "$lattr=$value", $ref, "attribute $context$lattr" );
1017                                    if ( $translated =~ s/^$lattr=// ) {
1018                                        $value = $translated;
1019                                    } else {
1020                                        die wrap_mod( "po4a::sgml",
1021                                            dgettext( "po4a", "bad translation '%s' for '%s' in '%s'" ),
1022                                            $translated, $context . $lattr, $ref );
1023                                    }
1024                                } else {
1025                                    $value = $self->translate( $value, $ref, "attribute $context$lattr" );
1026                                }
1027                            }
1028                        }
1029                        if ( $value =~ m/\"/ ) {
1030                            $value = "'" . $value . "'";
1031                        } else {
1032                            $value = '"' . $value . '"';
1033                        }
1034                        $tag .= " $lattr=$value";
1035                    }
1036                } elsif ( $val->type() eq 'NOTATION' ) {
1037                } else {
1038                    $tag .= ' ' . lc($attr) . '="' . lc($value) . '"'
1039                      if ( defined $value && length($value) );
1040                }
1041            }
1042            $tag .= '>';
1043
1044            # debug
1045            print STDERR "Seen $tag, open level=" . ( scalar @open ) . ", verb=$verb\n"
1046              if ( $debug{'tag'} );
1047
1048            if ( $event->data->name() eq 'FOOTNOTE' ) {
1049
1050                # we want to put the <para> inside the <footnote> in the same msgid
1051                $seenfootnote = 1;
1052            }
1053
1054            if ($seenfootnote) {
1055                $buffer .= $tag;
1056                next EVENT;
1057            }
1058            if ( $translate{ $event->data->name() } ) {
1059
1060                # Build the type
1061                if ( scalar @open > 0 ) {
1062                    $type = $open[$#open] . $tag;
1063                } else {
1064                    $type = $tag;
1065                }
1066
1067                # do the job
1068                if ( @open > 0 ) {
1069                    $self->end_paragraph( $buffer, $ref, $type, $verb, $indent, @open );
1070                } else {
1071                    $self->pushline($buffer) if $buffer;
1072                }
1073                $buffer = "";
1074                push @open, $tag;
1075            } elsif ( $indent{ $event->data->name() } ) {
1076                die wrap_ref_mod( $ref, "po4a::sgml",
1077                    dgettext( "po4a", "Closing tag for a translation container missing before %s" ), $tag )
1078                  if ( scalar @open );
1079            }
1080
1081            if ( $verbatim{ $event->data->name() } ) {
1082                $verb++;
1083
1084                # Keep a reference to the line that openned the verbatim
1085                # section. This is needed to check if its data starts on
1086                # the same line.
1087                $verb_last_ref = $ref;
1088            }
1089            if ($verb) {
1090
1091                # Tag in a verbatim section. Check if it appeared at
1092                # the same line than the previous data. If not, it
1093                # means that an end of line must be added to the
1094                # buffer.
1095                if ( $ref ne $verb_last_ref ) {
1096
1097                    # FIXME: Does it work if $verb > 1
1098                    $buffer .= "\n";
1099                    $verb_last_ref = $ref;
1100                }
1101            }
1102
1103            if ( $indent{ $event->data->name() } ) {
1104
1105                # push the indenting space only if not in verb before that tag
1106                # push trailing "\n" only if not in verbose afterward
1107                $self->pushline( ( $verb > 1 ? "" : ( " " x $indent ) ) . $tag . ( $verb ? "" : "\n" ) );
1108                $indent++ unless $empty{ $event->data->name() };
1109            } else {
1110                $tag =~ s/<po4abeg name="([^"]+)">/<![ $1 [/;    #"; Stupid emacs
1111                $tag =~ s/<po4aend>/]]>/;
1112                $buffer .= $tag;
1113            }
1114        }    # end of type eq 'start_element'
1115
1116        elsif ( $event->type eq 'end_element' ) {
1117            my $tag = (
1118                $empty{ $event->data->name() }
1119                ? ''
1120                : '</' . lc( $event->data->name() ) . '>'
1121            );
1122
1123            if ($verb) {
1124
1125                # Tag in a verbatim section. Check if it appeared at
1126                # the same line than the previous data. If not, it
1127                # means that an end of line must be added to the
1128                # buffer.
1129                if ( $ref ne $verb_last_ref ) {
1130
1131                    # FIXME: Does it work if $verb > 1
1132                    $buffer .= "\n";
1133                    $verb_last_ref = $ref;
1134                }
1135            }
1136            print STDERR "Seen $tag, level=" . ( scalar @open ) . ", verb=$verb\n"
1137              if ( $debug{'tag'} );
1138
1139            $lastchar = ">";
1140
1141            if ( $event->data->name() eq 'FOOTNOTE' ) {
1142
1143                # we want to put the <para> inside the <footnote> in the same msgid
1144                $seenfootnote = 0;
1145            }
1146
1147            if ($seenfootnote) {
1148                $buffer .= $tag;
1149                next EVENT;
1150            }
1151            if ( $translate{ $event->data->name() } ) {
1152                $type = $open[$#open] . $tag;
1153                $self->end_paragraph( $buffer, $ref, $type, $verb, $indent, @open );
1154                $buffer = "";
1155                pop @open;
1156                if ( @open > 0 ) {
1157                    pop @open;
1158                    push @open, $tag;
1159                }
1160            } elsif ( $indent{ $event->data->name() } ) {
1161                die wrap_ref_mod( $ref, "po4a::sgml",
1162                    dgettext( "po4a", "Closing tag for a translation container missing before %s" ), $tag )
1163                  if ( scalar @open );
1164            }
1165
1166            unless ( $event->data->name() =~ m/^(PO4ABEG|PO4AEND)$/si ) {
1167                if ( $indent{ $event->data->name() } ) {
1168                    $indent--;
1169
1170                    # add indenting space only when not in verbatim
1171                    # add the tailing \n only if out of verbatim after that tag
1172                    $self->pushline( ( $verb ? "" : ( " " x $indent ) ) . $tag . ( $verb > 1 ? "" : "\n" ) );
1173                } else {
1174                    $buffer .= $tag;
1175                }
1176                $verb-- if $verbatim{ $event->data->name() };
1177            }
1178        }    # end of type eq 'end_element'
1179
1180        elsif ( $event->type eq 'cdata' ) {
1181            my $cdata = $event->data;
1182            $empty_last_cdata = ( $cdata =~ m/^\s*$/ );
1183            $cdata =~ s/{PO4A-lt}/</g;
1184            $cdata =~ s/{PO4A-gt}/>/g;
1185            $cdata =~ s/{PO4A-amp}/&/g;
1186            $cdata =~ s/{PO4A-end}/\]\]>/g;
1187            $cdata =~ s/{PO4A-beg-([^\}]+)}/<!\[$1\[/g;
1188            if ($verb) {
1189
1190                # Check if this line of data appear on the same line
1191                # than the previous tag. If not, append an end of line
1192                # to the buffer.
1193                if ( $ref ne $verb_last_ref ) {
1194                    $buffer .= "\n";
1195                    $verb_last_ref = $ref;
1196                }
1197            } else {
1198                $cdata =~ s/\\t/ /g;
1199                $cdata =~ s/\s+/ /g;
1200                $cdata =~ s/^\s//s if $lastchar eq ' ';
1201            }
1202            $lastchar = substr( $cdata, -1, 1 );
1203            $buffer .= $cdata;
1204            if ( defined($xmlprolog) && length($xmlprolog) ) {
1205                $buffer =~ s/>PO4A-close\}>/\/>/sg;
1206                $buffer =~ s/PO4A-close\}>//sg;       # This should not be necessary
1207            }
1208        }    # end of type eq 'cdata'
1209
1210        elsif ( $event->type eq 'sdata' ) {
1211            my $sdata = $event->data;
1212            $sdata =~ s/^\[//;
1213            $sdata =~ s/\s*\]$//;
1214            $lastchar = substr( $sdata, -1, 1 );
1215            $buffer .= '&' . $sdata . ';';
1216        }    # end of type eq 'sdata'
1217
1218        elsif ( $event->type eq 're' ) {
1219
1220            # End of record, the line reference shall be incremented.
1221            $adds += 1;
1222            if ($verb) {
1223
1224                # Check if this line of data appear on the same line
1225                # than the previous tag. If not, append an end of line
1226                # to the buffer.
1227                if ( $ref ne $verb_last_ref ) {
1228                    $buffer .= "\n";
1229                    $verb_last_ref = $ref;
1230                }
1231                $buffer .= "\n";
1232            } elsif ( $lastchar ne ' ' ) {
1233                $buffer .= " ";
1234            }
1235            $lastchar = ' ';
1236        }    #end of type eq 're'
1237
1238        elsif ( $event->type eq 'conforming' ) {
1239
1240        } elsif ( $event->type eq 'pi' ) {
1241            my $pi = $event->data;
1242            $buffer .= "<?$pi>";
1243
1244        } else {
1245            die wrap_ref_mod(
1246                $refs[ $parse->line ],
1247                "po4a::sgml", dgettext( "po4a", "Unknown SGML event type: %s" ),
1248                $event->type
1249            );
1250        }
1251    }
1252
1253    # What to do after parsing
1254    $self->pushline($buffer);
1255    close(IN);
1256    if ( $? != 0 and $self->verbose() > 0 ) {
1257        warn wrap_mod(
1258            "po4a::sgml",
1259            dgettext(
1260                "po4a",
1261                "Warning: onsgmls produced some errors.  "
1262                  . "This is usually caused by po4a, which modifies the input "
1263                  . "and restores it afterwards, causing the input of onsgmls "
1264                  . "to be invalid.  This is usually safe, but you may wish "
1265                  . "to verify the generated document with onsgmls -wno-valid."
1266            )
1267        );
1268        unless ( $debug{'onsgmls'} ) {
1269            warn wrap_mod(
1270                "po4a::sgml",
1271                dgettext(
1272                    "po4a",
1273                    "To see the error message, "
1274                      . "rerun po4a with this additional argument:\n"
1275                      . "   -o debug=onsgmls"
1276                )
1277            );
1278        }
1279    }
1280    unlink($tmpfile) unless ( $debug{'refs'} or $debug{'onsgmls'} );
1281}
1282
1283sub end_paragraph {
1284    my ( $self, $para, $ref, $type, $verb, $indent ) = ( shift, shift, shift, shift, shift, shift );
1285    my (@open) = @_;
1286    die "Internal error: no paragraph to end here!!"
1287      unless scalar @open;
1288
1289    return unless defined($para) && length($para);
1290
1291    if ( ( $para =~ m/^\s*$/s ) and ( not $verb ) ) {
1292
1293        # In non-verbatim environments, a paragraph with only spaces is
1294        # like an empty paragraph
1295        return;
1296    }
1297
1298    # unprotect &entities;
1299    $para =~ s/{PO4A-amp}/&/g;
1300
1301    # remove the name"\|\|" onsgmls added as attributes
1302    $para =~ s/ name=\"\\\|\\\|\"//g;
1303    $para =~ s/ moreinfo=\"none\"//g;
1304
1305    # Extract the leading and trailing spaces. They will be restored only
1306    # in verbatim environments.
1307    my ( $leading_spaces, $trailing_spaces ) = ( "", "" );
1308    if ($verb) {
1309
1310        # In the verbatim mode, we can ignore empty lines, but not the
1311        # leading spaces or tabulations. Otherwise, the PO will look
1312        # weird.
1313        if ( $para =~ m/^(\s*\n)(.*?)(\s*)$/s ) {
1314            $leading_spaces  = $1;
1315            $para            = $2;
1316            $trailing_spaces = $3;
1317        }
1318    } else {
1319        if ( $para =~ m/^(\s*)(.*?)(\s*)$/s ) {
1320            $leading_spaces  = $1;
1321            $para            = $2;
1322            $trailing_spaces = $3;
1323        }
1324    }
1325
1326    $para = $self->translate(
1327        $para, $ref, $type,
1328        'wrap'    => !$verb,
1329        'wrapcol' => ( 75 - $indent )
1330    );
1331
1332    if ($verb) {
1333        $para = $leading_spaces . $para . $trailing_spaces;
1334    } else {
1335        $para =~ s/^\s+//s;
1336        my $toadd = " " x ( $indent + 1 );
1337        $para =~ s/^/$toadd/mg;
1338        $para .= "\n";
1339    }
1340
1341    $self->pushline($para);
1342}
1343
13441;
1345
1346=head1 AUTHORS
1347
1348This module is an adapted version of sgmlspl (SGML postprocessor for the
1349ONSGMLS parser) which was:
1350
1351 Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca>
1352
1353The adaptation for po4a was done by:
1354
1355 Denis Barbier <barbier@linuxfr.org>
1356 Martin Quinson (mquinson#debian.org)
1357
1358=head1 COPYRIGHT AND LICENSE
1359
1360 Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca>.
1361 Copyright © 2002-2005 SPI, Inc.
1362
1363This program is free software; you may redistribute it and/or modify it
1364under the terms of GPL (see the COPYING file).
1365