1package Geo::BUFR;
2
3# Copyright (C) 2010-2020 MET Norway
4#
5# This module is free software; you can redistribute it and/or
6# modify it under the same terms as Perl itself.
7
8=begin General_remarks
9
10Some general remarks on variables
11---------------------------------
12
13@data = data array
14@desc = descriptor array
15
16These 2 arrays are in one to one correspondence, but note that some C
17descriptors (2.....) are included in @desc even though there is no
18associated data value in message (the corresponding element in @data
19is set to ''). These descriptors without value are printed in
20dumpsection4 without line number, to distinguish them from 'real' data
21descriptors.
22
23$idesc = index of descriptor in @desc (and @data)
24$bm_idesc = index of bit mapped descriptor in @data (and @desc, see below)
25
26Variables related to bit maps:
27
28$self->{BUILD_BITMAP}
29$self->{BITMAP_INDEX}
30$self->{NUM_BITMAPS}
31$self->{BACKWARD_DATA_REFERENCE}
32
33These are explained in sub new
34
35$self->{BITMAP_OPERATORS}
36
37Reference to an array containing operators in BUFR table C which are
38associated with bit maps, i.e. one of 22[2-5]000 and 232000; the
39operator being added when it is met in section 3 in message. Note that
40an operator may occur multiple times, which is why we have to use an
41array, not a hash.
42
43$self->{CURRENT_BITMAP}
44
45Reference to an array which contains the indexes of data values for
46which data is marked as present in 031031 in the current used bit map.
47E.g. [2,3,6] if bitmap = 1100110.
48
49$self->{BITMAP_START}
50
51Array containing for each bit map the index of the first element
52descriptor for which the bit map relates.
53
54$self->{BITMAPS}
55
56Reference to an array, one element added for each bit map operator in
57$self->{BITMAP_OPERATORS} and each subset (although for compression we
58assume all subset have identical bitmaps and operate with subset 0
59only, i.e. $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[0] instead of
60...->[$isub]), the element being a reference to an array containing
61consecutive pairs of indexes ($idesc, $bm_idesc), used to look up in
62@data and @desc arrays for the value/descriptor and corresponding bit
63mapped value/descriptor.
64
65$self->{REUSE_BITMAP}
66
67Gets defined when 237000 is met, undefined if 237255 or 235000 is met.
68Originally for each subset (but defined for subset 0 only if
69compression) set to reference an array of the indexes of data values
70to which the last used bitmap relates (fetched from $self->{BITMAPS}),
71then shifted as the new element in $self->{BITMAPS} is built up.
72
73For operator 222000 ('Quality information follows') the bit mapped
74descriptor should be a 033-descriptor. For 22[3-5]/232 the bit mapped
75value should be the data value of the 22[3-5]255/232255 descriptors
76following the operator in BUFR section 3, with bit mapped descriptor
77$desc[bm_idesc] equal to $desc[$idesc] (with data width and reference
78value changed for 225255)
79
80=end General_remarks
81
82=cut
83
84require 5.006;
85use strict;
86use warnings;
87use Carp;
88use Cwd qw(getcwd);
89use FileHandle;
90use File::Spec::Functions qw(catfile);
91use Scalar::Util qw(looks_like_number);
92use Time::Local qw(timegm);
93# Also requires Storable if sub copy_from() is called
94
95require DynaLoader;
96our @ISA = qw(DynaLoader);
97our $VERSION = '1.38';
98
99# This loads BUFR.so, the compiled version of BUFR.xs, which
100# contains bitstream2dec, bitstream2ascii, dec2bitstream,
101# ascii2bitstream and null2bitstream
102bootstrap Geo::BUFR $VERSION;
103
104
105# Some package globals
106our $Verbose = 0;
107
108# $Verbose or $self->{VERBOSE} > 0 leads to the following output, all
109# except for level 6 on lines starting with 'BUFR.pm: ':
110# 1 -> B,C,D tables used (full path)
111# 2 -> Identifying stages of processing, displaying length of sections
112#      and some additional data from section 1 and 3
113# 3 -> All descriptors and values extracted
114# 4 -> Operator specific information, including delayed replication
115#      and repetition
116# 5 -> BUFR compression specific information
117# 6 -> Calling dumpsection0,1,3
118
119our $Spew = 0; # To avoid the overhead of subroutine calls to _spew
120               # (which is called a lot), $Spew is set to 1 if global
121               # $Verbose or at least one object VERBOSE is set > 1.
122               # This should speed up execution a bit in the common
123               # situation when no verbose output (except possibly
124               # the BUFR tables used) is requested
125our $Nodata = 0; # If set to true will prevent decoding of section 4
126our $Noqc = 0; # If set to true will prevent decoding (or encoding) of
127               # any descriptors after 222000 is met
128our $Reuse_current_ahl = 0;
129               # If set to true will cause cet_current_ahl() to return
130               # last AHL extracted and not undef if currently
131               # processed BUFR message has no (immediately preceding)
132               # AHL
133our $Strict_checking = 0; # Ignore recoverable errors in BUFR format
134                          # met during decoding. User might set
135                          # $Strict_checking to 1: Issue warning
136                          # (carp) but continue decoding, or to 2:
137                          # Croak instead of carp
138
139# The next 2 operators are separated for readability. Public interface should
140# provide only set_show_all_operators() to set both of these (to the same value)
141our $Show_all_operators = 0; # = 0: show just the most informative C operators in dumpsection4
142                             # = 1: show all operators (as far as possible)
143our $Show_replication = 0; # = 0: don't include replication descriptors (F=1) in dumpsection4
144                           # = 1: include replication descriptors(F=1) in dumpsection4,
145                           #  with X in FXY replaced with actual number X' of replicated descriptors.
146                           #  X' is replaced by 0 if X' > 99
147
148our %BUFR_table;
149# Keys: PATH      -> full path to the chosen directory of BUFR tables
150#       FORMAT    -> supported formats are BUFRDC and ECCODES
151#       B$version -> hash containing the B table $BUFR_table/B$version
152#                    key: element descriptor (6 digits)
153#                    value: a \0 separated string containing the B table fields
154#                            $name, $unit, $scale, $refval, $bits
155#       C$version -> hash containing the C table $BUFR_table/C$version
156#                    key: table B descriptor (6 digits) of the code/flag table
157#                    value: a new hash, with keys the possible values listed in
158#                           the code table, the value the corresponding text
159#       D$version -> hash containing the D table $BUFR_table/D$version
160#                    key: sequence descriptor
161#                    value: a space separated string containing the element
162#                    descriptors (6 digits) the sequence descriptor expands to
163$BUFR_table{FORMAT} = 'BUFRDC'; # Default. Might in the future be changed to ECCODES
164
165our %Descriptors_already_expanded;
166# Keys: Text string "$table_version $unexpanded_descriptors"
167# Values: Space separated string of expanded descriptors
168
169sub _croak {
170    my $msg = shift;
171    croak "BUFR.pm ERROR: $msg";
172}
173
174## Carp or croak (or ignore) according to value of $Strict_checking
175sub _complain {
176    my $msg = shift;
177    if ($Strict_checking == 1) {
178        carp "BUFR.pm WARNING: $msg";
179    } elsif ($Strict_checking > 1) {
180        croak "BUFR.pm ERROR: $msg";
181    }
182    return;
183}
184
185sub _spew {
186    my $self = shift;
187    my $level = shift;
188    if (ref($self)) {
189        # Global $Verbose overrides object VERBOSE
190        return if $level > $self->{VERBOSE} && $level > $Verbose;
191    } else {
192        return if $level > $Verbose;
193    }
194    my $format = shift;
195    if (@_) {
196        printf "BUFR.pm: $format\n", @_;
197    } else {
198        print "BUFR.pm: $format\n";
199    }
200    return;
201}
202
203## Object constructor
204sub new {
205    my $class = shift;
206    my $self = {};
207    $self->{VERBOSE} = 0;
208    $self->{CURRENT_MESSAGE} = 0;
209    $self->{CURRENT_SUBSET} = 0;
210    $self->{BUILD_BITMAP} = 0; # Will be set to 1 if a bit map needs to
211                               # be built
212    $self->{BITMAP_INDEX} = 0; # Used for building up bit maps; will
213                               # be incremented for each 031031
214                               # encountered, then reset to 0 when bit
215                               # map is finished built
216    $self->{NUM_BITMAPS} = 0;  # Will be incremented each time an
217                               # operator descriptor which uses a bit
218                               # map is encountered in section 3
219    $self->{BACKWARD_DATA_REFERENCE} = 1; # Number the first bitmap in
220                               # a possible sequence of bitmaps which
221                               # relate to the same scope of data
222                               # descriptors. Starts as 1 when (or
223                               # rather before) the first bitmap is
224                               # constructed, will then be reset to
225                               # the number of the next bitmap to be
226                               # constructed each time 235000 is met
227    $self->{NUM_CHANGE_OPERATORS} = 0; # Will be incremented for
228                               # each of the operators CHANGE_WIDTH,
229                               # CHANGE_CCITTIA5_WIDTH, CHANGE_SCALE,
230                               # CHANGE_REFERENCE_VALUE (actually
231                               # NEW_REFVAL_OF), CHANGE_SRW and
232                               # DIFFERENCE_STATISTICAL_VALUE in effect
233
234    # If number of arguments is odd, first argument is expected to be
235    # a string containing the BUFR message(s)
236    if (@_ % 2) {
237        $self->{IN_BUFFER} = shift;
238    }
239
240    # This part is not documented in the POD. Better to remove it?
241    while (@_) {
242        my $parameter = shift;
243        my $value = shift;
244        $self->{$parameter} = $value;
245    }
246    bless $self, ref($class) || $class;
247    return $self;
248}
249
250## Copy contents of the bufr object in first argument. With no extra
251## arguments, will copy (clone) everything. With 'metadata' as second
252## argument, will copy just the metadata in section 0, 1 and 3 (and
253## all of section 2 if present)
254sub copy_from {
255    my $self = shift;
256    my $bufr = shift;
257    _croak("First argument to copy_from must be a Geo::BUFR object")
258        unless ref($bufr) eq 'Geo::BUFR';
259    my $what = shift || 'all';
260    if ($what eq 'metadata') {
261        for (qw(
262            BUFR_EDITION
263            MASTER_TABLE CENTRE SUBCENTRE UPDATE_NUMBER OPTIONAL_SECTION
264            DATA_CATEGORY INT_DATA_SUBCATEGORY LOC_DATA_SUBCATEGORY
265            MASTER_TABLE_VERSION LOCAL_TABLE_VERSION YEAR MONTH DAY
266            HOUR MINUTE SECOND LOCAL_USE DATA_SUBCATEGORY YEAR_OF_CENTURY
267            NUM_SUBSETS OBSERVED_DATA COMPRESSED_DATA DESCRIPTORS_UNEXPANDED
268            SEC2_STREAM
269            )) {
270            if (exists $bufr->{$_}) {
271                $self->{$_} = $bufr->{$_};
272            } else {
273                # This cleanup might be necessary if BUFR edition changes
274                delete $self->{$_} if exists $self->{$_};
275            }
276        }
277    } elsif ($what eq 'all') {
278        %$self = ();
279        while (my ($key, $value) = each %{$bufr}) {
280            if ($key eq 'FILEHANDLE') {
281                # If a file has been associated with the copied
282                # object, make a new filehandle rather than just
283                # copying the reference
284                $self->fopen($bufr->{FILENAME});
285            } elsif (ref($value) and $key !~ /[BCD]_TABLE/) {
286                # Copy the whole structure, not merely the reference.
287                # Using Clone would be cheaper, but unfortunately
288                # Clone is not a core module, while Storable is
289                require Storable;
290                import Storable qw(dclone);
291                $self->{$key} = dclone($value);
292            } else {
293                $self->{$key} = $value;
294            }
295        }
296    } else {
297        _croak("Don't recognize second argument '$what' to copy_from()");
298    }
299    return 1;
300}
301
302
303##  Set debug level. Also set $Spew to true if debug level > 1 is set
304##  (we don't bother to reset $Spew to 0 if all debug levels later are
305##  reset to 0 or 1)
306sub set_verbose {
307    my $self = shift;
308    my $verbose = shift;
309    if (ref($self)) {
310        # Just myself
311        $self->{VERBOSE} = $verbose;
312        $self->_spew(2, "Verbosity level for object set to %d", $verbose);
313    } else {
314        # Whole class
315        $Verbose = $verbose;
316        Geo::BUFR->_spew(2, "Verbosity level for class set to %d", $verbose);
317    }
318    $Spew = $verbose if $verbose > 1;
319    return 1;
320}
321
322##  Turn off (or on) decoding of section 4
323sub set_nodata {
324    my $self = shift;
325    my $n = shift;
326    $Nodata = defined $n ? $n : 1; # Default is 1
327    Geo::BUFR->_spew(2, "Nodata set to %d", $Nodata);
328    return 1;
329}
330
331##  Turn off (or on) decoding of quality information
332sub set_noqc {
333    my $self = shift;
334    my $n = shift;
335    $Noqc = defined $n ? $n : 1; # Default is 1
336    Geo::BUFR->_spew(2, "Noqc set to %d", $Noqc);
337    return 1;
338}
339
340##  Require strict checking of BUFR format
341sub set_strict_checking {
342    my $self = shift;
343    my $n = shift;
344    _croak "Value for strict checking not provided"
345        unless defined $n;
346    $Strict_checking = $n;
347    Geo::BUFR->_spew(2, "Strict_checking set to %d", $Strict_checking);
348    return 1;
349}
350
351## Show replication descriptors (with X in FXY replaced by actual
352## number of descriptors replicated, adjusted to 0 if > 99) and all
353## data description operators when calling dumpsection4
354sub set_show_all_operators {
355    my $self = shift;
356    my $n = shift;
357    $Show_all_operators = defined $n ? $n : 1; # Default is 1
358    $Show_replication = $Show_all_operators;
359    Geo::BUFR->_spew(2, "Show_all_operators set to %d", $Show_all_operators);
360    return 1;
361}
362
363## Accessor methods for BUFR sec0-3 ##
364sub get_bufr_length {
365    my $self = shift;
366    return defined $self->{BUFR_LENGTH} ? $self->{BUFR_LENGTH} : undef;
367}
368sub set_bufr_edition {
369    my ($self, $bufr_edition) = @_;
370    _croak "BUFR edition number not provided in set_bufr_edition"
371        unless defined $bufr_edition;
372    _croak "BUFR edition number must be an integer, is '$bufr_edition'"
373        unless $bufr_edition =~ /^\d+$/;
374    _croak "Not an allowed value for BUFR edition number: $bufr_edition"
375        unless $bufr_edition >= 0 and $bufr_edition < 5;
376        # BUFR edition 0 is in fact in use in ECMWF MARS archive
377    $self->{BUFR_EDITION} = $bufr_edition;
378    return 1;
379}
380sub get_bufr_edition {
381    my $self = shift;
382    return defined $self->{BUFR_EDITION} ? $self->{BUFR_EDITION} : undef;
383}
384sub set_master_table {
385    my ($self, $master_table) = @_;
386    _croak "BUFR master table not provided in set_master_table"
387        unless defined $master_table;
388    _croak "BUFR master table must be an integer, is '$master_table'"
389        unless $master_table =~ /^\d+$/;
390    # Max value that can be stored in 1 byte is 255
391    _croak "BUFR master table exceeds limit 255, is '$master_table'"
392        if $master_table > 255;
393    $self->{MASTER_TABLE} = $master_table;
394    return 1;
395}
396sub get_master_table {
397    my $self = shift;
398    return defined $self->{MASTER_TABLE} ? $self->{MASTER_TABLE} : undef;
399}
400sub set_centre {
401    my ($self, $centre) = @_;
402    _croak "Originating/generating centre not provided in set_centre"
403        unless defined $centre;
404    _croak "Originating/generating centre must be an integer, is '$centre'"
405        unless $centre =~ /^\d+$/;
406    # Max value that can be stored in 2 bytes is 65535
407    _croak "Originating/generating centre exceeds limit 65535, is '$centre'"
408        if $centre > 65535;
409    $self->{CENTRE} = $centre;
410    return 1;
411}
412sub get_centre {
413    my $self = shift;
414    return defined $self->{CENTRE} ? $self->{CENTRE} : undef;
415}
416sub set_subcentre {
417    my ($self, $subcentre) = @_;
418    _croak "Originating/generating subcentre not provided in set_subcentre"
419        unless defined $subcentre;
420    _croak "Originating/generating subcentre must be an integer, is '$subcentre'"
421        unless $subcentre =~ /^\d+$/;
422    _croak "Originating/generating subcentre exceeds limit 65535, is '$subcentre'"
423        if $subcentre > 65535;
424    $self->{SUBCENTRE} = $subcentre;
425    return 1;
426}
427sub get_subcentre {
428    my $self = shift;
429    return defined $self->{SUBCENTRE} ? $self->{SUBCENTRE} : undef;
430}
431sub set_update_sequence_number {
432    my ($self, $update_number) = @_;
433    _croak "Update sequence number not provided in set_update_sequence_number"
434        unless defined $update_number;
435    _croak "Update sequence number must be a nonnegative integer, is '$update_number'"
436        unless $update_number =~ /^\d+$/;
437    _croak "Update sequence number exceeds limit 255, is '$update_number'"
438        if $update_number > 255;
439    $self->{UPDATE_NUMBER} = $update_number;
440    return 1;
441}
442sub get_update_sequence_number {
443    my $self = shift;
444    return defined $self->{UPDATE_NUMBER} ? $self->{UPDATE_NUMBER} : undef;
445}
446sub set_optional_section {
447    my ($self, $optional_section) = @_;
448    _croak "Optional section (0 or 1) not provided in set_optional_section"
449        unless defined $optional_section;
450    _croak "Optional section must be 0 or 1, is '$optional_section'"
451        unless $optional_section eq '0' or $optional_section eq '1';
452    $self->{OPTIONAL_SECTION} = $optional_section;
453    return 1;
454}
455sub get_optional_section {
456    my $self = shift;
457    return defined $self->{OPTIONAL_SECTION} ? $self->{OPTIONAL_SECTION} : undef;
458}
459sub set_data_category {
460    my ($self, $data_category) = @_;
461    _croak "Data category not provided in set_data_category"
462        unless defined $data_category;
463    _croak "Data category must be an integer, is '$data_category'"
464        unless $data_category =~ /^\d+$/;
465    _croak "Data category exceeds limit 255, is '$data_category'"
466        if $data_category > 255;
467    $self->{DATA_CATEGORY} = $data_category;
468    return 1;
469}
470sub get_data_category {
471    my $self = shift;
472    return defined $self->{DATA_CATEGORY} ? $self->{DATA_CATEGORY} : undef;
473}
474sub set_int_data_subcategory {
475    my ($self, $int_data_subcategory) = @_;
476    _croak "International data subcategory not provided in set_int_data_subcategory"
477        unless defined $int_data_subcategory;
478    _croak "International data subcategory must be an integer, is '$int_data_subcategory'"
479        unless $int_data_subcategory =~ /^\d+$/;
480    _croak "International data subcategory exceeds limit 255, is '$int_data_subcategory'"
481        if $int_data_subcategory > 255;
482    $self->{INT_DATA_SUBCATEGORY} = $int_data_subcategory;
483    return 1;
484}
485sub get_int_data_subcategory {
486    my $self = shift;
487    return defined $self->{INT_DATA_SUBCATEGORY} ? $self->{INT_DATA_SUBCATEGORY} : undef;
488}
489sub set_loc_data_subcategory {
490    my ($self, $loc_data_subcategory) = @_;
491    _croak "Local subcategory not provided in set_loc_data_subcategory"
492        unless defined $loc_data_subcategory;
493    _croak "Local data subcategory must be an integer, is '$loc_data_subcategory'"
494        unless $loc_data_subcategory =~ /^\d+$/;
495    _croak "Local data subcategory exceeds limit 255, is '$loc_data_subcategory'"
496        if $loc_data_subcategory > 255;
497    $self->{LOC_DATA_SUBCATEGORY} = $loc_data_subcategory;
498    return 1;
499}
500sub get_loc_data_subcategory {
501    my $self = shift;
502    return defined $self->{LOC_DATA_SUBCATEGORY} ? $self->{LOC_DATA_SUBCATEGORY} : undef;
503}
504sub set_data_subcategory {
505    my ($self, $data_subcategory) = @_;
506    _croak "Data subcategory not provided in set_data_subcategory"
507        unless defined $data_subcategory;
508    _croak "Data subcategory must be an integer, is '$data_subcategory'"
509        unless $data_subcategory =~ /^\d+$/;
510    _croak "Data subcategory exceeds limit 255, is '$data_subcategory'"
511        if $data_subcategory > 255;
512    $self->{DATA_SUBCATEGORY} = $data_subcategory;
513    return 1;
514}
515sub get_data_subcategory {
516    my $self = shift;
517    return defined $self->{DATA_SUBCATEGORY} ? $self->{DATA_SUBCATEGORY} : undef;
518}
519sub set_master_table_version {
520    my ($self, $master_table_version) = @_;
521    _croak "Master table version not provided in set_master_table_version"
522        unless defined $master_table_version;
523    _croak "BUFR master table version must be an integer, is '$master_table_version'"
524        unless $master_table_version =~ /^\d+$/;
525    _croak "BUFR master table version exceeds limit 255, is '$master_table_version'"
526        if $master_table_version > 255;
527    $self->{MASTER_TABLE_VERSION} = $master_table_version;
528    return 1;
529}
530sub get_master_table_version {
531    my $self = shift;
532    return defined $self->{MASTER_TABLE_VERSION}
533        ? $self->{MASTER_TABLE_VERSION} : undef;
534}
535sub set_local_table_version {
536    my ($self, $local_table_version) = @_;
537    _croak "Local table version not provided in set_local_table_version"
538        unless defined $local_table_version;
539    _croak "Local table version must be an integer, is '$local_table_version'"
540        unless $local_table_version =~ /^\d+$/;
541    _croak "Local table version exceeds limit 255, is '$local_table_version'"
542        if $local_table_version > 255;
543    $self->{LOCAL_TABLE_VERSION} = $local_table_version;
544    return 1;
545}
546sub get_local_table_version {
547    my $self = shift;
548    return defined $self->{LOCAL_TABLE_VERSION}
549        ? $self->{LOCAL_TABLE_VERSION} : undef;
550}
551sub set_year_of_century {
552    my ($self, $year_of_century) = @_;
553    _croak "Year of century not provided in set_year_of_century"
554        unless defined $year_of_century;
555    _croak "Year of century must be an integer, is '$year_of_century'"
556        unless $year_of_century =~ /^\d+$/;
557    _complain "year_of_century > 100 in set_year_of_century: $year_of_century"
558        if $year_of_century > 100;
559    # A common mistake is to set year_of_century for year 2000 to 0, should be 100
560    $self->{YEAR_OF_CENTURY} = $year_of_century == 0 ? 100 : $year_of_century;
561    return 1;
562}
563sub get_year_of_century {
564    my $self = shift;
565    if (defined $self->{YEAR_OF_CENTURY}) {
566        return $self->{YEAR_OF_CENTURY};
567    } elsif (defined $self->{YEAR}) {
568        my $yy = $self->{YEAR} % 100;
569        return $yy == 0 ? 100 : $yy;
570    } else {
571        return undef;
572    }
573}
574sub set_year {
575    my ($self, $year) = @_;
576    _croak "Year not provided in set_year"
577        unless defined $year;
578    _croak "Year must be an integer, is '$year'"
579        unless $year =~ /^\d+$/;
580    _croak "Year exceeds limit 65535, is '$year'"
581        if $year > 65535;
582    $self->{YEAR} = $year;
583    return 1;
584}
585sub get_year {
586    my $self = shift;
587    return defined $self->{YEAR} ? $self->{YEAR} : undef;
588}
589sub set_month {
590    my ($self, $month) = @_;
591    _croak "Month not provided in set_month"
592        unless defined $month;
593    _croak "Month must be an integer, is '$month'"
594        unless $month =~ /^\d+$/;
595    _complain "Month must be 1-12 in set_month, is '$month'"
596        if $month == 0 || $month > 12;
597    $self->{MONTH} = $month;
598    return 1;
599}
600sub get_month {
601    my $self = shift;
602    return defined $self->{MONTH} ? $self->{MONTH} : undef;
603}
604sub set_day {
605    my ($self, $day) = @_;
606    _croak "Day not provided in set_day"
607        unless defined $day;
608    _croak "Day must be an integer, is '$day'"
609        unless $day =~ /^\d+$/;
610    _complain "Day must be 1-31 in set_day, is '$day'"
611        if $day == 0 || $day > 31;
612    $self->{DAY} = $day;
613    return 1;
614}
615sub get_day {
616    my $self = shift;
617    return defined $self->{DAY} ? $self->{DAY} : undef;
618}
619sub set_hour {
620    my ($self, $hour) = @_;
621    _croak "Hour not provided in set_hour"
622        unless defined $hour;
623    _croak "Hour must be an integer, is '$hour'"
624        unless $hour =~ /^\d+$/;
625    _complain "Hour must be 0-23 in set_hour, is '$hour'"
626        if $hour > 23;
627    $self->{HOUR} = $hour;
628    return 1;
629}
630sub get_hour {
631    my $self = shift;
632    return defined $self->{HOUR} ? $self->{HOUR} : undef;
633}
634sub set_minute {
635    my ($self, $minute) = @_;
636    _croak "Minute not provided in set_minute"
637        unless defined $minute;
638    _croak "Minute must be an integer, is '$minute'"
639        unless $minute =~ /^\d+$/;
640    _complain "Minute must be 0-59 in set_minute, is '$minute'"
641        if $minute > 59;
642    $self->{MINUTE} = $minute;
643    return 1;
644}
645sub get_minute {
646    my $self = shift;
647    return defined $self->{MINUTE} ? $self->{MINUTE} : undef;
648}
649sub set_second {
650    my ($self, $second) = @_;
651    _croak "Second not provided in set_second"
652        unless defined $second;
653    _croak "Second must be an integer, is '$second'"
654        unless $second =~ /^\d+$/;
655    _complain "Second must be 0-59 in set_second, is '$second'"
656        if $second > 59;
657    $self->{SECOND} = $second;
658    return 1;
659}
660sub get_second {
661    my $self = shift;
662    return defined $self->{SECOND} ? $self->{SECOND} : undef;
663}
664sub set_local_use {
665    my ($self, $local_use) = @_;
666    _croak "Local use not provided in set_local use"
667        unless defined $local_use;
668    $self->{LOCAL_USE} = $local_use;
669    return 1;
670}
671sub get_local_use {
672    my $self = shift;
673    return defined $self->{LOCAL_USE} ? $self->{LOCAL_USE} : undef;
674}
675sub set_number_of_subsets {
676    my ($self, $number_of_subsets) = @_;
677    _croak "Number of subsets not provided in set_number_of_subsets"
678        unless defined $number_of_subsets;
679    _croak "Number of subsets must be an integer, is '$number_of_subsets'"
680        unless $number_of_subsets =~ /^\d+$/;
681    _croak "Number of subsets exceeds limit 65535, is '$number_of_subsets'"
682        if $number_of_subsets > 65535;
683    $self->{NUM_SUBSETS} = $number_of_subsets;
684    return 1;
685}
686sub get_number_of_subsets {
687    my $self = shift;
688    return defined $self->{NUM_SUBSETS} ? $self->{NUM_SUBSETS} : undef;
689}
690sub set_observed_data {
691    my ($self, $observed_data) = @_;
692    _croak "Observed data (0 or 1) not provided in set_observed_data"
693        unless defined $observed_data;
694    _croak "Observed data must be 0 or 1, is '$observed_data'"
695        unless $observed_data eq '0' or $observed_data eq '1';
696    $self->{OBSERVED_DATA} = $observed_data;
697    return 1;
698}
699sub get_observed_data {
700    my $self = shift;
701    return defined $self->{OBSERVED_DATA} ? $self->{OBSERVED_DATA} : undef;
702}
703sub set_compressed_data {
704    my ($self, $compressed_data) = @_;
705    _croak "Compressed data (0 or 1) not provided in set_compressed_data"
706        unless defined $compressed_data;
707    _croak "Compressed data must be 0 or 1, is '$compressed_data'"
708        unless $compressed_data eq '0' or $compressed_data eq '1';
709    _complain "Not allowed to use compression for one subset messages!"
710        if $compressed_data
711            and defined $self->{NUM_SUBSETS} and $self->{NUM_SUBSETS} == 1;
712    $self->{COMPRESSED_DATA} = $compressed_data;
713    return 1;
714}
715sub get_compressed_data {
716    my $self = shift;
717    return defined $self->{COMPRESSED_DATA} ? $self->{COMPRESSED_DATA} : undef;
718}
719sub set_descriptors_unexpanded {
720    my ($self, $descriptors_unexpanded) = @_;
721    _croak "Unexpanded descriptors not provided in set_descriptors_unexpanded"
722        unless defined $descriptors_unexpanded;
723    $self->{DESCRIPTORS_UNEXPANDED} = $descriptors_unexpanded;
724    return 1;
725}
726sub get_descriptors_unexpanded {
727    my $self = shift;
728    return defined $self->{DESCRIPTORS_UNEXPANDED}
729        ? $self->{DESCRIPTORS_UNEXPANDED} : undef;
730}
731#############################################
732## End of accessor methods for BUFR sec0-3 ##
733#############################################
734
735sub get_current_subset_number {
736    my $self = shift;
737    return defined $self->{CURRENT_SUBSET} ? $self->{CURRENT_SUBSET} : undef;
738}
739
740sub get_current_message_number {
741    my $self = shift;
742    return defined $self->{CURRENT_MESSAGE} ? $self->{CURRENT_MESSAGE} : undef;
743}
744
745sub get_current_ahl {
746    my $self = shift;
747    return defined $self->{CURRENT_AHL} ? $self->{CURRENT_AHL} : undef;
748}
749
750sub reuse_current_ahl {
751    my $self = shift;
752    my $n = shift;
753    $Reuse_current_ahl = defined $n ? $n : 1; # Default is 1
754    Geo::BUFR->_spew(2, "Reuse_current_ahl set to %d", $Reuse_current_ahl);
755    return 1;
756}
757
758sub ahl_is_reused {
759    my $self = shift;
760    return defined $self->{REUSED_CURRENT_AHL} ? $self->{REUSED_CURRENT_AHL} : undef;
761}
762
763sub set_filter_cb {
764    my $self = shift;
765    my $cb   = shift;
766
767    if (ref $cb eq 'CODE') {
768        $self->{FILTER_CB} = $cb;
769        @{$self->{FILTER_ARGS}} = ($self, @_);
770    } else {
771        $self->{FILTER_CB} = undef;
772        delete $self->{FILTER_ARGS};
773    }
774    return 1;
775}
776
777sub is_filtered {
778    my $self = shift;
779    return defined $self->{IS_FILTERED} ? $self->{IS_FILTERED} : undef;
780}
781
782sub bad_bufrlength {
783    my $self = shift;
784    return defined $self->{BAD_LENGTH} ? $self->{BAD_LENGTH} : undef;
785}
786
787sub set_tableformat {
788    my $self = shift;
789
790    my $format = shift;
791    _croak "Table format not provided. Possible values are BUFRDC and ECCODES"
792        unless defined $format;
793    _croak "Supported table formats are BUFRDC and ECCODES"
794        unless uc($format) eq 'BUFRDC' || uc($format) eq 'ECCODES';
795    $BUFR_table{FORMAT} = uc($format);
796    Geo::BUFR->_spew(2, "BUFR table format set to %s", $BUFR_table{FORMAT});
797    return 1;
798}
799
800sub get_tableformat {
801    my $self = shift;
802    return exists $BUFR_table{FORMAT} ? $BUFR_table{FORMAT} : '';
803}
804
805##  Set the path for BUFR table files
806##  Usage: Geo::BUFR->set_tablepath(directory_list)
807##         where directory_list is a list of colon-separated strings.
808##  Example: Geo::BUFR->set_tablepath("/foo/bar:/foo/baz", "/some/where/else")
809sub set_tablepath {
810    my $self = shift;
811
812    $BUFR_table{PATH} = join ":", map {split /:/} @_;
813    Geo::BUFR->_spew(2, "BUFR table path set to %s", $BUFR_table{PATH});
814    return 1;
815}
816
817sub get_tablepath {
818    my $self = shift;
819
820    if (exists $BUFR_table{PATH}) {
821        return wantarray ? split(/:/, $BUFR_table{PATH}) : $BUFR_table{PATH};
822    } else {
823        return '';
824    }
825}
826
827## Return table version from table if provided, or else from section 1
828## information in BUFR message. For BUFRDC, this is a stripped down
829## version of table name. For ECCODES, this is last path of table
830## location (e.g. '0/wmo/29'), and a stringified list of two such
831## paths (master and local) if local tables are used
832## (e.g. '0/wmo/29,0/local/8/78/236'). Returns undef/empty list if
833## impossible to determine table version.
834sub get_table_version {
835    my $self = shift;
836    my $table = shift;
837
838    if ($table) {
839        if ($BUFR_table{FORMAT} eq 'BUFRDC') {
840            # First check if this actually is an attempt to load an ECCODES table
841            if ($table =~ /wmo/ || $table =~ /local/) {
842                _croak("$table cannot be a BUFRDC table. "
843                       . "Did you forget to set tableformat to ECCODES?");
844            }
845            (my $version = $table) =~ s/^(?:[BCD]?)(.*?)(?:\.TXT)?$/$1/;
846            return $version;
847        } elsif ($BUFR_table{FORMAT} eq 'ECCODES')  {
848            # Mainly meant to catch attempts to load a BUFRDC table
849            # with tableformat mistakingly set to ECCODES
850            _croak("$table cannot be an ecCodes table")
851                unless ($table =~ /wmo/ || $table =~ /local/);
852            return $table;
853        }
854    }
855
856    # No table provided. Decide version from section 1 information.
857    # First check that the necessary metadata exist
858    foreach my $metadata (qw(MASTER_TABLE LOCAL_TABLE_VERSION
859                             CENTRE SUBCENTRE)) {
860        return undef if ! defined $self->{$metadata};
861    }
862
863    # If master table version, use centre 0 and subcentre 0 (in ECMWF
864    # BUFRDC this is the convention from version 320 onwards)
865    my $centre = $self->{CENTRE};
866    my $subcentre = $self->{SUBCENTRE};
867    my $local_table_version = $self->{LOCAL_TABLE_VERSION};
868    if ($local_table_version == 0 || $local_table_version == 255) {
869        $centre = 0;
870        $subcentre = 0;
871        $local_table_version = 0;
872    }
873
874    my $master_table = $self->{MASTER_TABLE};
875    my $master_table_version = $self->{MASTER_TABLE_VERSION};
876    if ($BUFR_table{FORMAT} eq 'BUFRDC') {
877        # naming convention used in BUFRDC version >= 000270
878        return sprintf "%03d%05d%05d%03d%03d",
879               $master_table,$subcentre,$centre,$master_table_version,$local_table_version;
880    } elsif ($BUFR_table{FORMAT} eq 'ECCODES')  {
881        if ($local_table_version == 0) {
882            return catfile($master_table,'wmo',$master_table_version);
883        } else {
884            return catfile($master_table,'wmo',$master_table_version) . ',' .
885                   catfile($master_table,'local',$local_table_version,$centre,$subcentre);
886        }
887    }
888}
889
890# Search through $BUFR_table{PATH} to find first path for which $fname
891# exists, or (for BUFRDC) if no such path exists, first path for which the
892# corresponding master file exists, in which case
893# $self->{LOCAL_TABLES_NOT_FOUND} is set to the local table initially
894# searched for (this variable should be undefined as soon as the
895# message is finished processing). Returns empty list if no such path
896# could be found, else returns the path and the table name for which
897# path was found.
898sub _locate_table {
899    my ($self,$fname) = @_;
900
901    _croak "BUFR table path not set, did you forget to call set_tablepath()?"
902        unless $BUFR_table{PATH};
903
904    my $path;
905    foreach (split /:/, $BUFR_table{PATH}) {
906        if (-e catfile($_, $fname)) {
907            $path = $_;
908            $path =~ s|/$||;
909            return ($path,$fname);
910        }
911    }
912
913    if ($BUFR_table{FORMAT} eq 'BUFRDC') {
914        # Path couldn't be found for $fname. Then try again for master table
915        my $master_table;
916        ($master_table,$path) = $self->_locate_master_table($fname);
917        if ($path) {
918            $self->{LOCAL_TABLES_NOT_FOUND} = $fname;
919            return ($path,$master_table);
920        }
921    }
922
923    # No table found
924    return;
925}
926
927# Return master table and path corresponding to local table $fname, or
928# empty list if $fname actually is a master table or if no path for the
929# master table could be found.
930sub _locate_master_table {
931    my ($self,$fname) = @_;
932
933    my $master_table;
934    if ($BUFR_table{FORMAT} eq 'BUFRDC') {
935        _croak("$fname is not a valid name for BUFRDC tables")
936            if length($fname) < 20;
937        $master_table = substr($fname,0,4) . '00000' . '00000'
938            . substr($fname,14,3) . '000.TXT';
939    } elsif ($BUFR_table{FORMAT} eq 'ECCODES')  {
940        foreach my $metadata (qw(MASTER_TABLE MASTER_TABLE_VERSION)) {
941            return if ! defined $self->{$metadata};
942        }
943        $master_table = catfile($self->{MASTER_TABLE},'wmo',$self->{MASTER_TABLE_VERSION});
944    }
945    return if ($master_table eq $fname); # Already tried
946
947    my $path;
948    foreach (split /:/, $BUFR_table{PATH}) {
949        if (-e catfile($_, $master_table)) {
950            $path = $_;
951            $path =~ s|/$||;
952            return ($master_table,$path);
953        }
954    }
955    return;
956}
957
958## Read in a B table file into a hash, e.g.
959##  $B_table{'001001'} = "WMO BLOCK NUMBER\0NUMERIC\0  0\0           0\0  7"
960## where the B table values for 001001 are \0 (NUL) separated
961sub _read_B_table_bufrdc {
962    my ($self,$version) = @_;
963
964    my $fname = "B$version.TXT";
965    my ($path,$tname) = $self->_locate_table($fname)
966        or _croak "Couldn't find BUFR table $fname in $BUFR_table{PATH}."
967        . " Wrong tablepath?";
968
969    # If we are forced to try master table because local table
970    # couldn't be found, check if this might already have been loaded
971    if ($tname ne $fname) {
972        my $master_version = substr($tname,1,-4);
973        return $BUFR_table{"B$master_version"} if exists $BUFR_table{"B$master_version"};
974    }
975
976    my $tablefile = catfile($path,$tname);
977    open(my $TABLE, '<', $tablefile)
978        or _croak "Couldn't open BUFR table B $tablefile: $!";
979    my $txt = "Reading table $tablefile";
980    $txt .= " (since local table " . $self->{LOCAL_TABLES_NOT_FOUND}
981    . " couldn't be found)" if $self->{LOCAL_TABLES_NOT_FOUND};
982    $self->_spew(1, "%s", $txt);
983
984    my %B_table;
985    while (<$TABLE>) {
986        my ($s1,$fxy,$s2,$name,$s3,$unit,$s4,$scale,$s5,$refval,$s6,$bits)
987            = unpack('AA6AA64AA24AA3AA12AA3', $_);
988        next unless defined $bits;
989        $name =~ s/\s+$//;
990        $refval =~ s/-\s+(\d+)/-$1/; # Remove blanks between minus sign and value
991        $B_table{$fxy} = join "\0", $name, $unit, $scale, $refval, $bits;
992    }
993    # When installing Geo::BUFR on Windows Vista with Strawberry Perl,
994    # close sometimes returned an empty string. Therefore removed
995    # check on return value for close.
996    close $TABLE; # or _croak "Closing $tablefile failed: $!";
997
998    $BUFR_table{"B$version"} = \%B_table;
999    return \%B_table;
1000}
1001
1002sub _read_B_table_eccodes {
1003    my ($self,$version) = @_;
1004
1005    my ($path,$tname) = $self->_locate_table(catfile($version,'element.table'));
1006
1007    if (! $path) {
1008        if ($version =~ /wmo/) {
1009            _croak "Couldn't find BUFR table " . catfile($version,'element.table')
1010                . " in $BUFR_table{PATH}. Wrong tablepath?";
1011        } else {
1012            # This might actually not be an error, since local table
1013            # might be provided for D only. But if later a local
1014            # element descriptor is requested, we should complain
1015            $self->{LOCAL_TABLES_NOT_FOUND} = $version;
1016            return;
1017        }
1018    }
1019    my $tablefile = catfile($path,$tname);
1020
1021    open(my $TABLE, '<', $tablefile)
1022        or _croak "Couldn't open BUFR table B $tablefile: $!";
1023    $self->_spew(1, "Reading table %s", $tablefile);
1024
1025    my %B_table;
1026    while (<$TABLE>) {
1027        # Skip comments (expexted to be in first line only)
1028        next if /^#/;
1029
1030        # $rest is crex_unit|crex_scale|crex_width
1031        my ($code,$abbreviation,$type,$name,$unit,$scale,$reference,$width,$rest)
1032            = split /[|]/;
1033        next unless defined $width; # shouldn't happen
1034        $unit = 'CCITTIA5' if $unit eq 'CCITT IA5';
1035        $B_table{$code} = join "\0", $name, $unit, $scale, $reference, $width;
1036    }
1037    close $TABLE;
1038
1039    $BUFR_table{"B$version"} = \%B_table;
1040    return \%B_table;
1041}
1042
1043## Reads a D table file into a hash, e.g.
1044##  $D_table->{307080} = '301090 302031 ...'
1045## There are two different types of lines in D*.TXT, e.g.
1046##  307080 13 301090 BUFR template for synoptic reports
1047##            302031
1048## We choose to ignore the number of lines in expansion (here 13)
1049## because this number is sometimes in error. Instead we consider a
1050## line starting with 5 spaces to be of the second type above, else of
1051## the first type
1052sub _read_D_table_bufrdc {
1053    my ($self,$version) = @_;
1054
1055    my $fname = "D$version.TXT";
1056    my ($path,$tname) = $self->_locate_table($fname)
1057        or _croak "Couldn't find BUFR table $fname in $BUFR_table{PATH}."
1058            . "Wrong tablepath?";
1059
1060    # If we are forced to try master table because local table
1061    # couldn't be found, check if this might already have been loaded
1062    if ($tname ne $fname) {
1063        my $master_version = substr($tname,1,-4);
1064        return $BUFR_table{"D$master_version"} if exists $BUFR_table{"D$master_version"};
1065    }
1066
1067    my $tablefile = catfile($path,$tname);
1068    open(my $TABLE, '<', $tablefile)
1069        or _croak "Couldn't open BUFR table D $tablefile: $!";
1070    my $txt = "Reading table $tablefile";
1071    $txt .= " (since local table " . $self->{LOCAL_TABLES_NOT_FOUND}
1072    . " couldn't be found)" if $self->{LOCAL_TABLES_NOT_FOUND};
1073    $self->_spew(1, "%s", $txt);
1074
1075    my (%D_table, $alias);
1076    while (my $line = <$TABLE>) {
1077        $line =~ s/\s+$//;
1078        next if $line =~ /^\s*$/; # Blank line
1079
1080        if (substr($line,0,5) eq ' ' x 5) {
1081            $line =~ s/^\s+//;
1082            $D_table{$alias} .= " $line";
1083        } else {
1084            $line =~ s/^\s+//;
1085            # In table version 17 a descriptor with more than 100
1086            # entries occurs, causing no space between alias and
1087            # number of entries (so split /\s+/ doesn't work)
1088            my ($ali, $skip, $desc) = unpack('A6A4A6', $line);
1089            $alias = $ali;
1090            $D_table{$alias} = $desc;
1091        }
1092    }
1093    close $TABLE; # or _croak "Closing $tablefile failed: $!";
1094
1095    $BUFR_table{"D$version"} = \%D_table;
1096    return \%D_table;
1097}
1098
1099sub _read_D_table_eccodes {
1100    my ($self,$version) = @_;
1101
1102    my ($path,$tname) = $self->_locate_table(catfile($version,'sequence.def'));
1103
1104    if (! $path) {
1105        if ($version =~ /wmo/) {
1106            _croak "Couldn't find BUFR table " . catfile($version,'sequence.def')
1107                . " in $BUFR_table{PATH}. Wrong tablepath?";
1108        } else {
1109            # This might actually not be an error, since local table
1110            # might be provided for B only. But if later a local
1111            # sequence descriptor is requested, we should complain
1112            $self->{LOCAL_TABLES_NOT_FOUND} = $version;
1113        }
1114        return;
1115    }
1116    my $tablefile = catfile($path,$tname);
1117
1118    open(my $TABLE, '<', $tablefile)
1119        or _croak "Couldn't open BUFR table B $tablefile: $!";
1120    $self->_spew(1, "Reading table %s", $tablefile);
1121
1122## sequence.def is expected to contain lines like
1123#"301196" = [  301011, 301013, 301021 ]
1124## which should be converted to
1125# 301196  3 301011
1126#           301013
1127#           301021
1128## Must also handle descriptors spanning more than one line, like
1129#"301046" = [  001007, 001012, 002048, 021119, 025060, 202124, 002026, 002027, 202000, 005040
1130#               ]
1131## and
1132#"301058" = [  301011, 301012, 201152, 202135, 004006, 202000, 201000, 301021, 020111, 020112,
1133#               020113, 020114, 020115, 020116, 020117, 020118, 020119, 025035, 020121, 020122,
1134#               020123, 020124, 025175, 020023, 025063, 202136, 201136, 002121, 201000, 202000,
1135#               025061, 002184, 002189, 025036, 101000, 031002, 301059 ]
1136    my %D_table;
1137    my $txt;
1138    while (<$TABLE>) {
1139        if (substr($_,0,1) eq '"') {
1140            # New sequence descriptor, parse and store the previous
1141            _parse_sequence(\%D_table,$txt) if $txt;
1142            chomp;
1143            $txt = $_;
1144        } else {
1145            chomp;
1146            $txt .= $_;
1147        }
1148    }
1149    _parse_sequence(\%D_table,$txt) if $txt;
1150
1151    close $TABLE; # or _croak "Closing $tablefile failed: $!";
1152
1153    $BUFR_table{"D$version"} = \%D_table;
1154    return \%D_table;
1155}
1156
1157sub _parse_sequence {
1158    my ($Dtable, $txt) = @_;
1159
1160    my ($seq, $rest) = ($txt =~ /^"(\d{6})" = \[(.*)\]/);
1161    my @list = split(/,/, $rest);
1162    foreach (@list) {
1163        s/^ +//;
1164        s/ +$//;
1165    }
1166    $Dtable->{$seq} = join(' ', @list);
1167}
1168
1169## Read the flag and code tables, which in ECMWF BUFRDC tables are
1170## put in tables C$version.TXT (not to be confused with BUFR C tables,
1171## which contain the operator descriptors). Note that even though
1172## number of code values and number of lines are included in the
1173## tables, we choose to ignore them, because these values are often
1174## found to be in error. Instead we trust that the text starts at
1175## fixed positions in file. Returns reference to the C table, or undef
1176## if failing to open table file.
1177sub _read_C_table {
1178    my ($self,$version) = @_;
1179
1180    # For ECCODES loading 2 different codetables directories might be necessary
1181    if ($BUFR_table{FORMAT} eq 'ECCODES') {
1182        if ($version =~ /,/) {
1183            my ($master, $local) = (split /,/, $version);
1184            $self->_read_C_table_eccodes($master);
1185            return $self->_read_C_table_eccodes($local);
1186        } else {
1187            return $self->_read_C_table_eccodes($version);
1188        }
1189    }
1190
1191    # Rest of code is for BUFRDC
1192    my $fname = "C$version.TXT";
1193    my ($path,$tname) = $self->_locate_table($fname);
1194    return undef unless $path;
1195
1196    # If we are forced to try master table because local table
1197    # couldn't be found, check if this might already have been loaded
1198    if ($tname ne $fname) {
1199        my $master_version = substr($tname,1,-4);
1200        return $BUFR_table{"C$master_version"} if exists $BUFR_table{"C$master_version"};
1201    }
1202
1203    my $tablefile = catfile($path,$tname);
1204    open(my $TABLE, '<', $tablefile)
1205        or _croak "Couldn't open BUFR table C $tablefile: $!";
1206    my $txt = "Reading table $tablefile";
1207    $txt .= " (since local table " . $self->{LOCAL_TABLES_NOT_FOUND}
1208    . " couldn't be found)" if $self->{LOCAL_TABLES_NOT_FOUND};
1209    $self->_spew(1, "%s", $txt);
1210
1211    my (%C_table, $table, $value);
1212    while (my $line = <$TABLE>) {
1213        $line =~ s/\s+$//;
1214        next if $line =~ /^\s*$/; # Blank line
1215
1216        if (substr($line,0,15) eq ' ' x 15) {
1217            $line =~ s/^\s+//;
1218            next if $line eq 'NOT DEFINED' || $line eq 'RESERVED';
1219            $C_table{$table}{$value} .= $line . "\n";
1220        } elsif (substr($line,0,10) eq ' ' x 10) {
1221            $line =~ s/^\s+//;
1222            my ($val, $nlines, $txt) = split /\s+/, $line, 3;
1223            $value = $val+0;
1224            next if !defined $txt || $txt eq 'NOT DEFINED' || $txt eq 'RESERVED';
1225            $C_table{$table}{$value} .= $txt . "\n";
1226        } else {
1227            my ($tbl, $nval, $val, $nlines, $txt) = split /\s+/, $line, 5;
1228            $table = sprintf "%06d", $tbl;
1229            # For tables listed 2 or more times, use last instance only.
1230            # This prevents $txt to be duplicated in $C_table{$table}{$value}
1231            undef $C_table{$table} if defined $C_table{$table};
1232            $value = $val+0;
1233            next if !defined $txt || $txt eq 'NOT DEFINED' || $txt eq 'RESERVED';
1234            $C_table{$table}{$value} = $txt . "\n";
1235        }
1236    }
1237    close $TABLE; # or _croak "Closing $tablefile failed: $!";
1238
1239    $BUFR_table{"C$version"} = \%C_table;
1240    return \%C_table;
1241}
1242
1243sub _read_C_table_eccodes {
1244    my ($self,$version) = @_;
1245
1246    my ($path,$tname) = $self->_locate_table(catfile($version,'codetables'));
1247
1248    if (! $path) {
1249        if ($version =~ /wmo/) {
1250            _croak "Couldn't find BUFR table " . catfile($version,'element.table')
1251                . " in $BUFR_table{PATH}. Wrong tablepath?"
1252                if (! $path && $version =~ /wmo/);
1253        } else {
1254            # This might actually not be an error, if none of the
1255            # local descriptors are of type code or flag table. So
1256            # prefer to keep silent in this case.
1257            return;
1258        }
1259    }
1260
1261    my $tabledir = catfile($path,$tname);
1262    my $cwd = getcwd();
1263    chdir $tabledir || croak "Couldn't chdir to $tabledir: $!";
1264
1265    my @table_files = map { $_->[1] }
1266                      sort { $a->[0] <=> $b->[0] }
1267                      map { [_get_tableid_eccodes($_), $_] }
1268                      glob("*.table");
1269    $self->_spew(1, "Reading tables in %s", $tabledir) if @table_files;
1270
1271    my %C_table;
1272    foreach my $table_file (@table_files) {
1273        my ($table) = ($table_file =~ /(\d+)\.table$/);
1274        die "Unexpected name of table file: $table_file" unless $table;
1275        $table =  sprintf "%06d", $table;
1276
1277        open my $IN, '<', $table_file
1278            or croak "Couldn't open $table_file: $!";
1279        while (<$IN>) {
1280            chomp;
1281            my ($num, $val, $txt) = split(/ /, $_, 3);
1282            _complain("Unexpected: first 2 fields in $table_file in $tabledir are unequal: $num $val")
1283                if ($Strict_checking and $num ne $val);
1284
1285            # Fix a common problem in ecCodes codetables with long
1286            # lines, hopefully not changing valid use of '"' in local
1287            # tables (e.g. 8/78/0/codetables/8198.table:  ""Nebenamtliche"" measurement
1288            $txt =~ s/(?<!")" +//;
1289##          $txt =~ s/" +//;
1290
1291            $C_table{$table}{$val} = $txt . "\n";
1292        }
1293
1294        _complain("$table_file in $tabledir is empty!")
1295            if ($Strict_checking and not $C_table{$table});
1296        close $IN;
1297    }
1298    chdir $cwd;
1299
1300    $BUFR_table{"C$version"} = \%C_table;
1301    return \%C_table;
1302}
1303
1304sub _get_tableid_eccodes {
1305    my $table_file = shift;
1306    my ($id) = ($table_file =~ /(\d+)\.table$/);
1307    return $id;
1308}
1309
1310
1311sub load_BDtables {
1312    my $self = shift;
1313    my $table = shift || '';
1314
1315    my $version = $self->{TABLE_VERSION} = $self->get_table_version($table)
1316        or _croak "Not enough info to decide which tables to load";
1317
1318    if ($BUFR_table{FORMAT} eq 'BUFRDC') {
1319        $self->{B_TABLE} = $BUFR_table{"B$version"} || $self->_read_B_table_bufrdc($version);
1320        $self->{D_TABLE} = $BUFR_table{"D$version"} || $self->_read_D_table_bufrdc($version);
1321    } elsif ($BUFR_table{FORMAT} eq 'ECCODES') {
1322        if ($version =~ /,/) {
1323            my ($master, $local) = (split /,/, $version);
1324            $self->{B_TABLE} = $BUFR_table{"B$master"} || $self->_read_B_table_eccodes($master);
1325            $self->{D_TABLE} = $BUFR_table{"D$master"} || $self->_read_D_table_eccodes($master);
1326
1327            # Append local table to the master table (should work even if empty)
1328            my $local_Btable = (exists($BUFR_table{"B$local"})) ? $BUFR_table{"B$local"}
1329            : $self->_read_B_table_eccodes($local);
1330            @{$self->{B_TABLE}}{ keys %$local_Btable } = values %$local_Btable;
1331            my $local_Dtable = (exists($BUFR_table{"D$local"})) ? $BUFR_table{"D$local"}
1332            : $self->_read_D_table_eccodes($local);
1333            @{$self->{D_TABLE}}{ keys %$local_Dtable } = values %$local_Dtable;;
1334
1335        } else {
1336            $self->{B_TABLE} = $BUFR_table{"B$version"} || $self->_read_B_table_eccodes($version);
1337            $self->{D_TABLE} = $BUFR_table{"D$version"} || $self->_read_D_table_eccodes($version);
1338        }
1339    }
1340    return $version;
1341}
1342
1343sub load_Ctable {
1344    my $self = shift;
1345    my $table = shift || '';
1346    my $default_table = shift || '';
1347
1348    my $version = $self->get_table_version($table) || '';
1349    _croak "Not enough info to decide which C table to load"
1350        if not $version and not $default_table;
1351
1352    if ($BUFR_table{FORMAT} eq 'BUFRDC') {
1353        $self->{C_TABLE} = $BUFR_table{"C$version"} || $self->_read_C_table($version);
1354    } elsif ($BUFR_table{FORMAT} eq 'ECCODES') {
1355        if ($version =~ /,/) {
1356            my ($master, $local) = (split /,/, $version);
1357            $self->{C_TABLE} = $BUFR_table{"$master"} || $self->_read_C_table($master);
1358
1359            # Append local table to the master table (should work even if empty)
1360            my $local_Ctable = (exists($BUFR_table{"C$local"})) ? $BUFR_table{"C$local"}
1361                : $self->_read_C_table_eccodes($local);
1362            @{$self->{C_TABLE}}{ keys %$local_Ctable } = values %$local_Ctable;
1363
1364        } else {
1365            $self->{C_TABLE} = $BUFR_table{"C$version"} || $self->_read_C_table_eccodes($version);
1366        }
1367    }
1368
1369    if ($default_table and not $self->{C_TABLE}) {
1370        # Was not able to load $table. Try $default_table instead.
1371        $version = $self->get_table_version($default_table);
1372        _croak "Not enough info to decide which C table to load"
1373            if not $version;
1374        if ($BUFR_table{FORMAT} eq 'BUFRDC') {
1375            $self->{C_TABLE} = $BUFR_table{"C$version"} || $self->_read_C_table($version);
1376        } else {
1377            $self->{C_TABLE} = $BUFR_table{"C$version"} || $self->_read_C_table_eccodes($version);
1378        }
1379    }
1380    if (not $self->{C_TABLE}) {
1381        if ($BUFR_table{FORMAT} eq 'BUFRDC') {
1382            _croak "Unable to load C table (C$version.TXT)";
1383        } else {
1384            _croak "Unable to load codetables for $version";
1385        }
1386    }
1387
1388    return $version;
1389}
1390
1391
1392##  Specify BUFR file to read
1393sub fopen {
1394    my $self = shift;
1395    my $filename = shift
1396        or _croak "fopen() called without an argument";
1397    _croak "File $filename doesn't exist!" unless -e $filename;
1398    _croak "$filename is not a plain file" unless -f $filename;
1399
1400    # Open file for reading
1401    $self->{FILEHANDLE} = new FileHandle;
1402    open $self->{FILEHANDLE}, '<', $filename
1403        or _croak "Couldn't open file $filename for reading";
1404
1405    $self->_spew(2, "File %s opened for reading", $filename);
1406
1407    # For some OS this is necessary
1408    binmode $self->{FILEHANDLE};
1409
1410    $self->{FILENAME} = $filename;
1411    return 1;
1412}
1413
1414sub fclose {
1415    my $self = shift;
1416    if ($self->{FILEHANDLE}) {
1417        close $self->{FILEHANDLE}
1418            or _croak "Couldn't close BUFR file opened by fopen()";
1419        $self->_spew(2, "Closed file %s", $self->{FILENAME});
1420    }
1421    delete $self->{FILEHANDLE};
1422    delete $self->{FILENAME};
1423    # Much more might be considered deleted here, but usually the bufr
1424    # object goes out of scope immediately after a fclose anyway
1425    return 1;
1426}
1427
1428sub eof {
1429    my $self = shift;
1430    return ($self->{EOF} || 0);
1431}
1432
1433# Go to start of input buffer or start of file associated with the object
1434sub rewind {
1435    my $self = shift;
1436    if (exists $self->{FILEHANDLE}) {
1437        seek $self->{FILEHANDLE}, 0, 0 or _croak "Cannot seek: $!";
1438    } elsif (! $self->{IN_BUFFER}) {
1439        _croak "Cannot rewind: no file or input buffer associated with this object";
1440    }
1441    $self->{CURRENT_MESSAGE} = 0;
1442    $self->{CURRENT_SUBSET} = 0;
1443    delete $self->{START_POS};
1444    delete $self->{POS};
1445    delete $self->{EOF};
1446    return 1;
1447}
1448
1449## Read in next BUFR message from file if $self->{FILEHANDLE} is set,
1450## else from $self->{IN_BUFFER} (string argument to
1451## constructor). Decodes section 0 and sets $self->{START_POS} to
1452## start of message and $self->{POS} to end of BUFR message (or after
1453## first 8 bytes of truncated/corrupt BUFR message for which we still
1454## want to attempt decoding). $self->{CURRENT_AHL} is updated if a
1455## GTS ahl is found (implemented for file reading only), and
1456## $self->{EOF} is set if no more 'BUFR' in file/buffer. Croaks if an
1457## error occurs when reading BUFR message.
1458
1459## Returns BUFR message from section 1 on, or undef if no BUFR message
1460## is found.
1461sub _read_message {
1462    my $self = shift;
1463
1464    my $filehandle = $self->{FILEHANDLE} ? $self->{FILEHANDLE} : undef;
1465    my $in_buffer = $self->{IN_BUFFER} ? $self->{IN_BUFFER} : undef;
1466    _croak "_read_message: Neither BUFR file nor BUFR text is given"
1467        unless $filehandle or $in_buffer;
1468
1469    # Locate next 'BUFR' and set $pos to this position in file/string,
1470    # also finding corresponding GTS ahl if exists (for file
1471    # only). Possibly sets $self->{EOF}
1472    my $pos = defined $self->{POS} ? $self->{POS} : 0;
1473    my $ahl;
1474    ($pos, $ahl) = $self->_find_next_BUFR($filehandle,$in_buffer,$pos,'');
1475    return if $pos < 0;
1476    $self->{REUSED_CURRENT_AHL} = 0;
1477    if ($ahl) {
1478        $self->{CURRENT_AHL} = $ahl;
1479    } elsif (! $Reuse_current_ahl) {
1480        $self->{CURRENT_AHL} = undef;
1481    } elsif (defined $self->{CURRENT_AHL}) {
1482        $self->{REUSED_CURRENT_AHL} = 1;
1483    }
1484
1485    # Remember start position of BUFR message in case we need to
1486    # rewind later because length of BUFR cannot be trusted
1487    $self->{START_POS} = $pos;
1488
1489    # Report (if verbose setting) where we found the BUFR message
1490    $self->_spew(2, "BUFR message at position %d", $pos) if $Spew;
1491
1492    # Read (rest) of Section 0 (length of BUFR message and edition number)
1493    my $sec0;                   # Section 0 is BUFR$sec0
1494    if ($filehandle) {
1495        if ((read $filehandle, $sec0, 8) != 8) {
1496            $self->{EOF} = 1;
1497            _croak "Error reading section 0 in file '$self->{FILENAME}', position "
1498                . tell($filehandle);
1499        }
1500        $sec0 = substr $sec0, 4;
1501    } else {
1502        if (length($in_buffer) < $pos+8) {
1503            $self->{EOF} = 1;
1504            _croak "Error reading section 0: this is not a BUFR message?"
1505        }
1506        $sec0 = substr $in_buffer, $pos+4, 4;
1507    }
1508    $self->{SEC0_STREAM}  = "BUFR$sec0";
1509
1510    # Extract length and edition number
1511    my ($length, $edition) = unpack 'NC', "\0$sec0";
1512    $self->{BUFR_LENGTH}  = $length;
1513    $self->{BUFR_EDITION} = $edition;
1514    $self->_spew(2, "Message length: %d, Edition: %d", $length, $edition) if $Spew;
1515    _croak "Cannot handle BUFR edition $edition" if $edition < 2 || $edition > 4;
1516
1517    # Read rest of BUFR message (section 1-5)
1518    my $msg;
1519    my $msgisOK = 1;
1520    if ($filehandle) {
1521        if ((read $filehandle, $msg, $length-8) != $length-8) {
1522            # Probably a corrupt or truncated BUFR message. We choose
1523            # to decode as much as possible (maybe the length in
1524            # section 0 is all that is wrong), but obviously we cannot
1525            # trust the stated length of BUFR message, so reset
1526            # position of filehandle to just after section 0
1527            $self->{BAD_LENGTH} = 1;
1528            $msgisOK = 0;
1529            seek $filehandle, $pos+8, 0;
1530            $self->_spew(2, "Danger: file %s not big enough to contain the stated"
1531                         . " length of BUFR message", $self->{FILENAME});
1532            $pos += 8;
1533        } else {
1534            $pos = tell($filehandle);
1535            if (substr($msg, -4) ne '7777') {
1536                $self->{BAD_LENGTH} = 1;
1537                $self->_spew(2, "Danger: BUFR length in sec 0 can't be correct, "
1538                             . "last 4 bytes are not '7777'");
1539            }
1540        }
1541    } else {
1542        if (length($in_buffer) < $pos+$length) {
1543            $self->{BAD_LENGTH} = 1;
1544            $msgisOK = 0;
1545            $self->_spew(2, "Danger: buffer not big enough "
1546                         . "to contain the stated length of BUFR message");
1547            $msg = substr $in_buffer, $pos+8, $length-8;
1548            $pos += 8;
1549        } else {
1550            $msg = substr $in_buffer, $pos+8, $length-8;
1551            $pos += $length;
1552            if (substr($msg, -4) ne '7777') {
1553                $self->{BAD_LENGTH} = 1;
1554                $self->_spew(2, "Danger: BUFR length in sec 0 can't be correct, "
1555                             . "last 4 bytes are not '7777'");
1556            }
1557        }
1558    }
1559    if ($Spew) {
1560        if ($msgisOK) {
1561            $self->_spew(2, "Successfully read BUFR message; position now %d", $pos);
1562        } else {
1563            $self->_spew(2, "Resetting position to %d", $pos);
1564        }
1565    }
1566
1567    # Reset $self->{POS} to end of BUFR message (or after first 8
1568    # bytes of truncated/corrupt BUFR message)
1569    $self->{POS} = $pos;
1570
1571    return $msg;
1572}
1573
1574my $ahl_regex = qr{[A-Z]{4}\d\d [A-Z]{4} \d{6}(?: (?:(?:RR|CC|AA|PA)[A-Z])| COR| RTD)?};
1575# BBB=Pxx (segmentation) was allowed until 2007, but at least one
1576# centre still uses PAA as of 2014.  COR and RTD shouldn't be
1577# allowed (from ?), but are still used
1578
1579## Advance to first occurrence of 'BUFR', or to the possibly preceding
1580## GTS ahl if this is requested in $at. Returns the new position and
1581## (if called in array context) the possibly preceding ahl. If no
1582## 'BUFR' is found, sets $self->{EOF} and returns -1 for the new
1583## position.
1584sub _find_next_BUFR {
1585    my $self = shift;
1586    my ($filehandle, $in_buffer, $pos, $at) = @_;
1587
1588    my ($new_pos, $ahl);
1589    if ($filehandle) {
1590        my $oldeol = $/;
1591        $/ = "BUFR";
1592        my $slurp = <$filehandle> || '    ';
1593        $/ = $oldeol;
1594        if (CORE::eof($filehandle) or substr($slurp,-4) ne 'BUFR') {
1595            $self->{EOF} = 1;
1596        } else {
1597            # Get the GTS ahl (TTAAii CCCC DTG [BBB]) before 'BUFR',
1598            # if present. Use '\n+' not '\n' since adding an extra
1599            # '\n' in bulletin has been seen. Allow also for not
1600            # including \r\r (which might be how the bulletin file was
1601            # prepared originally, or might catch cases where ahl is
1602            # mistakingly included twice)
1603            my $reset = 4;
1604            if ($slurp =~ /(${ahl_regex})((?:\r\r)?\n+BUFR)$/) {
1605                $ahl = $1;
1606                # Don't use lenght($&), since this slows down execution for
1607                # Perl 5.16 or earlier. See the WARNING at the end of
1608                # the Capture Buffers section of the perlre documentation
1609                $reset = length($1) + length($2) if $at eq 'at_ahl';
1610
1611                $self->_spew(2,"GTS ahl found: %s",$ahl) if $Spew;
1612            }
1613            # Reset position of filehandle to just before 'BUFR', or
1614            # if requested, before possible preceding AHL
1615            seek($filehandle, -$reset, 1);
1616            $new_pos = tell $filehandle;
1617        }
1618    } else {
1619        $new_pos = index($in_buffer, 'BUFR', $pos);
1620        if ($new_pos < 0) {
1621            $self->{EOF} = 1;
1622        } else {
1623            if (substr($in_buffer,$pos,$new_pos-$pos) =~ /(${ahl_regex})((?:\r\r)?\n+)$/) {
1624                $ahl = $1;
1625                $self->_spew(2,"GTS ahl found: %s",$ahl) if $Spew;
1626                if ($at eq 'at_ahl') {
1627                    $new_pos -= length($1) + length($2);
1628                }
1629            }
1630        }
1631    }
1632
1633    if ($self->{EOF}) {
1634        if ($pos == 0) {
1635            if ($filehandle) {
1636                $self->_spew(2,"No BUFR message in file %s",$self->{FILENAME})
1637                    if $Spew;
1638            } else {
1639                $self->_spew(2, "No BUFR message found") if $Spew;
1640            }
1641        }
1642        return -1;
1643    }
1644
1645    return wantarray ? ($new_pos,$ahl) : $new_pos;
1646}
1647
1648## Returns the BUFR message in raw (binary) form, '' if errors encountered
1649sub get_bufr_message {
1650    my $self = shift;
1651
1652    if ($self->{BAD_LENGTH} || $self->{ERROR_IN_MESSAGE}) {
1653        $self->_spew(2, "Skipping erroneous BUFR message");
1654        return '';
1655    }
1656    if (!$self->{FILEHANDLE} && !$self->{IN_BUFFER}) {
1657        $self->_spew(2, "No file or input buffer associated with this object");
1658        return '';
1659    }
1660    if (!exists $self->{START_POS} || !$self->{BUFR_LENGTH}) {
1661        $self->_spew(2, "No bufr message to return");
1662        return '';
1663    }
1664
1665    my $msg;
1666    if (exists $self->{FILEHANDLE}) {
1667        my $fh = $self->{FILEHANDLE};
1668        my $old_pos = tell($fh);
1669        seek($fh, $self->{START_POS}, 0);
1670        read($fh, $msg, $self->{BUFR_LENGTH});
1671        seek($fh, $old_pos, 0);
1672        $self->_spew(2, "BUFR message extracted from file");
1673    } elsif (exists $self->{IN_BUFFER}) {
1674        $msg = substr $self->{IN_BUFFER}, $self->{START_POS}, $self->{BUFR_LENGTH};
1675        $self->_spew(2, "BUFR message extracted");
1676    }
1677
1678    return $msg;
1679}
1680
1681## Decode section 1 to 5. Section 0 is already decoded in _read_message.
1682sub _decode_sections {
1683    my $self = shift;
1684    my $msg = shift;
1685
1686    $self->{BUFR_STREAM}  = $msg;
1687    $self->{SEC1_STREAM}  = undef;
1688    $self->{SEC2_STREAM}  = undef;
1689    $self->{SEC3_STREAM}  = undef;
1690    $self->{SEC4_STREAM}  = undef;
1691    $self->{SEC5_STREAM}  = undef;
1692
1693    # Breaking the rule that all debugging should be on lines starting
1694    # with 'BUFR.pm:', therefore using $verbose=6
1695    $self->_spew(6, "%s", $self->dumpsection0()) if $Spew;
1696
1697    ##  Decode Section 1 (Identification Section)  ##
1698
1699    $self->_spew(2, "Decoding section 1") if $Spew;
1700
1701    # Extract Section 1 information
1702    if ($self->{BUFR_EDITION} < 4) {
1703        # N means 4 byte integer, so put an extra null byte ('\0') in
1704        # front of string to get first 3 bytes as integer
1705        my @sec1 =  unpack 'NC14', "\0" . $self->{BUFR_STREAM};
1706
1707        # Check that stated length of section 1 makes sense
1708        _croak "Length of section 1 too small (< 17): $sec1[0]"
1709            if $sec1[0] < 17;
1710        _croak "Rest of BUFR message shorter (" . length($self->{BUFR_STREAM})
1711            . " bytes) than stated length of section 1 ($sec1[0] bytes)"
1712                if $sec1[0] > length($self->{BUFR_STREAM});
1713
1714        push @sec1, (unpack 'a*', substr $self->{BUFR_STREAM},17,$sec1[0]-17);
1715        $self->{SEC1_STREAM} = substr $self->{BUFR_STREAM}, 0, $sec1[0];
1716        $self->{BUFR_STREAM} = substr $self->{BUFR_STREAM}, $sec1[0];
1717        $self->{SEC1}                 = \@sec1;
1718        $self->{MASTER_TABLE}         = $sec1[1];
1719        $self->{SUBCENTRE}            = $sec1[2];
1720        $self->{CENTRE}               = $sec1[3];
1721        $self->{UPDATE_NUMBER}        = $sec1[4];
1722        $self->{OPTIONAL_SECTION}     = vec($sec1[5] & 0x80,0,1); # 1. bit
1723        $self->{DATA_CATEGORY}        = $sec1[6];
1724        $self->{DATA_SUBCATEGORY}     = $sec1[7];
1725        $self->{MASTER_TABLE_VERSION} = $sec1[8];
1726        $self->{LOCAL_TABLE_VERSION}  = $sec1[9];
1727        $self->{YEAR_OF_CENTURY}      = $sec1[10];
1728        $self->{MONTH}                = $sec1[11];
1729        $self->{DAY}                  = $sec1[12];
1730        $self->{HOUR}                 = $sec1[13];
1731        $self->{MINUTE}               = $sec1[14];
1732        $self->{LOCAL_USE}            = $sec1[15];
1733        # In case previous message was edition 4
1734        foreach my $key (qw(INT_DATA_SUBCATEGORY LOC_DATA_SUBCATEGORY
1735                            YEAR SECOND)) {
1736            undef $self->{$key};
1737        }
1738    } elsif ($self->{BUFR_EDITION} == 4) {
1739        my @sec1 =  unpack 'NCnnC7nC5', "\0" . $self->{BUFR_STREAM};
1740
1741        # Check that stated length of section 1 makes sense
1742        _croak "Length of section 1 too small (< 22): $sec1[0]"
1743            if $sec1[0] < 22;
1744        _croak "Rest of BUFR message shorter (" . length($self->{BUFR_STREAM})
1745            . " bytes) than stated length of section 1 ($sec1[0] bytes)"
1746                if $sec1[0] > length($self->{BUFR_STREAM});
1747
1748        push @sec1, (unpack 'a*', substr $self->{BUFR_STREAM},22,$sec1[0]-22);
1749        $self->{SEC1_STREAM} = substr $self->{BUFR_STREAM}, 0, $sec1[0];
1750        $self->{BUFR_STREAM} = substr $self->{BUFR_STREAM}, $sec1[0];
1751        $self->{SEC1}                 = \@sec1;
1752        $self->{MASTER_TABLE}         = $sec1[1];
1753        $self->{CENTRE}               = $sec1[2];
1754        $self->{SUBCENTRE}            = $sec1[3];
1755        $self->{UPDATE_NUMBER}        = $sec1[4];
1756        $self->{OPTIONAL_SECTION}     = vec($sec1[5] & 0x80,0,1); # 1. bit
1757        $self->{DATA_CATEGORY}        = $sec1[6];
1758        $self->{INT_DATA_SUBCATEGORY} = $sec1[7];
1759        $self->{LOC_DATA_SUBCATEGORY} = $sec1[8];
1760        $self->{MASTER_TABLE_VERSION} = $sec1[9];
1761        $self->{LOCAL_TABLE_VERSION}  = $sec1[10];
1762        $self->{YEAR}                 = $sec1[11];
1763        $self->{MONTH}                = $sec1[12];
1764        $self->{DAY}                  = $sec1[13];
1765        $self->{HOUR}                 = $sec1[14];
1766        $self->{MINUTE}               = $sec1[15];
1767        $self->{SECOND}               = $sec1[16];
1768        $self->{LOCAL_USE} = ($sec1[0] > 22) ? $sec1[17] : undef;
1769        # In case previous message was edition 3 or lower
1770        foreach my $key (qw(DATA_SUBCATEGORY YEAR_OF_CENTURY)) {
1771            undef $self->{$key};
1772        }
1773    }
1774    $self->_spew(2, "BUFR edition: %d Optional section: %d Update sequence number: %d",
1775                $self->{BUFR_EDITION}, $self->{OPTIONAL_SECTION}, $self->{UPDATE_NUMBER}) if $Spew;
1776    $self->_spew(6, "%s", $self->dumpsection1()) if $Spew;
1777
1778    $self->_validate_datetime() if ($Strict_checking);
1779
1780    ##  Decode Section 2 (Optional Section) if present  ##
1781
1782    $self->_spew(2, "Decoding section 2") if $Spew;
1783
1784    if ($self->{OPTIONAL_SECTION}) {
1785        my @sec2 = unpack 'N', "\0" . $self->{BUFR_STREAM};
1786
1787        # Check that stated length of section 2 makes sense
1788        _croak "Length of section 2 too small (< 4): $sec2[0]"
1789            if $sec2[0] < 4;
1790        _croak "Rest of BUFR message shorter (" . length($self->{BUFR_STREAM})
1791            . " bytes) than stated length of section 2 ($sec2[0] bytes)"
1792                if $sec2[0] > length($self->{BUFR_STREAM});
1793
1794        push @sec2, substr $self->{BUFR_STREAM}, 4, $sec2[0]-4;
1795        $self->{SEC2_STREAM} = substr $self->{BUFR_STREAM}, 0, $sec2[0];
1796        $self->{BUFR_STREAM} = substr $self->{BUFR_STREAM}, $sec2[0];
1797        $self->{SEC2} = \@sec2;
1798        $self->_spew(2, "Length of section 2: %d", $sec2[0]) if $Spew;
1799    } else {
1800        $self->{SEC2} = undef;
1801        $self->{SEC2_STREAM} = undef;
1802    }
1803
1804    ##  Decode Section 3 (Data Description Section)  ##
1805
1806    $self->_spew(2, "Decoding section 3") if $Spew;
1807
1808    my @sec3 = unpack 'NCnC', "\0".$self->{BUFR_STREAM};
1809
1810    # Check that stated length of section 3 makes sense
1811    _croak "Length of section 3 too small (< 8): $sec3[0]"
1812        if $sec3[0] < 8;
1813    _croak "Rest of BUFR message shorter (" . length($self->{BUFR_STREAM})
1814        . " bytes) than stated length of section 3 ($sec3[0] bytes)"
1815            if $sec3[0] > length($self->{BUFR_STREAM});
1816
1817    push @sec3, substr $self->{BUFR_STREAM},7,($sec3[0]-7)&0x0ffe; # $sec3[0]-7 will be reduced by one if odd integer,
1818                                                                   # so will not push last byte if length of sec3 is even,
1819                                                                   # which might happen for BUFR edition < 4 (padding byte)
1820    $self->{SEC3_STREAM} = substr $self->{BUFR_STREAM}, 0, $sec3[0];
1821    $self->{BUFR_STREAM} = substr $self->{BUFR_STREAM}, $sec3[0];
1822
1823    $self->{SEC3}             = \@sec3;
1824    $self->{NUM_SUBSETS}      = $sec3[2];
1825    $self->{OBSERVED_DATA}    = vec($sec3[3] & 0x80,0,1); # extract 1. bit
1826    $self->{COMPRESSED_DATA}  = vec($sec3[3] & 0x40,1,1); # extract 2. bit
1827    $self->_spew(2, "Length of section 3: %d", $sec3[0]) if $Spew;
1828    $self->_spew(2, "Number of subsets: %d Observed data: %d Compressed data: %d",
1829                 $self->{NUM_SUBSETS}, $self->{OBSERVED_DATA}, $self->{COMPRESSED_DATA}) if $Spew;
1830    _complain("0 subsets in BUFR message")
1831        if ($Strict_checking and $self->{NUM_SUBSETS} == 0);
1832    _complain("Bits 3-8 in octet 7 in section 3 are not 0 (octet 7 = $sec3[3])")
1833        if ($Strict_checking and ($sec3[3] & 0x3f) != 0);
1834    if ($Spew == 6 || $Nodata) {
1835        my @unexpanded = _int2fxy(unpack 'n*', $self->{SEC3}[4]);
1836        $self->{DESCRIPTORS_UNEXPANDED} = @unexpanded ?
1837            join(' ', @unexpanded) : '';
1838        $self->_spew(6, "%s", $self->dumpsection3());
1839    }
1840
1841    $self->{IS_FILTERED} = defined $self->{FILTER_CB}
1842        ? $self->{FILTER_CB}->(@{$self->{FILTER_ARGS}}) : 0;
1843    return if $self->{IS_FILTERED} || $Nodata;
1844
1845    ##  Decode Section 4 (Data Section)  ##
1846
1847    $self->_spew(2, "Decoding section 4") if $Spew;
1848
1849    my $sec4_len = unpack 'N', "\0$self->{BUFR_STREAM}";
1850    $self->_spew(2, "Length of section 4: %d", $sec4_len) if $Spew;
1851
1852    # Check that stated length of section 4 makes sense
1853    _croak "Length of section 4 too small (< 4): $sec4_len"
1854        if $sec4_len < 4;
1855    _croak "Rest of BUFR message (" . length($self->{BUFR_STREAM}) . " bytes)"
1856        . " shorter than stated length of section 4 ($sec4_len bytes)."
1857        . " Probably the BUFR message is truncated"
1858        if $sec4_len > length($self->{BUFR_STREAM});
1859
1860    $self->{SEC4_STREAM}  = substr $self->{BUFR_STREAM}, 0, $sec4_len;
1861    $self->{SEC4_RAWDATA} = substr $self->{BUFR_STREAM}, 4, $sec4_len-4;
1862    $self->{BUFR_STREAM}  = substr $self->{BUFR_STREAM}, $sec4_len;
1863
1864    ##  Decode Section 5 (End Section)  ##
1865
1866    $self->_spew(2, "Decoding section 5") if $Spew;
1867
1868    # Next 4 characters should be '7777' and these should be end of
1869    # message, but allow more characters (i.e. length of message in
1870    # section 0 has been set too big) if $Strict_checking not set
1871    my $str = $self->{BUFR_STREAM};
1872    my $len = length($str);
1873    if ($len > 4
1874        || ($len == 4 && substr($str,0,4) ne '7777')) {
1875        my $err_msg = "Section 5 is not '7777' but the $len"
1876            . " characters (in hex): "
1877                . join(' ', map {sprintf "0x%02X", $_} unpack('C*', $str));
1878        if ($len > 4 && substr($str,0,4) eq '7777') {
1879            _complain($err_msg);
1880        } elsif ($len == 4 && substr($str,0,4) ne '7777') {
1881            _croak($err_msg);
1882        }
1883    }
1884
1885    return;
1886}
1887
1888##  Read next BUFR message and decode. Set $self->{ERROR_IN_MESSAGE} if
1889##  anything goes seriously wrong, so that sub next_observation can use
1890##  this to skip to next message if user chooses to trap the call to
1891##  next_observation in an eval and then calls next_observation again.
1892sub _next_message {
1893    my $self = shift;
1894
1895    $self->_spew(2, "Reading next BUFR message") if $Spew;
1896
1897    $self->{ERROR_IN_MESSAGE} = 0;
1898    $self->{BAD_LENGTH} = 0;
1899
1900    my $msg;
1901    eval {
1902        # Read BUFR message and decode section 0 (needed to get length
1903        # of message)
1904        $msg = $self->_read_message();
1905
1906        # Unpack section 1-5
1907        $self->_decode_sections($msg) if $msg;
1908    };
1909    if ($@) {
1910        $self->{ERROR_IN_MESSAGE} = 1;
1911        $self->{CURRENT_MESSAGE}++;
1912        die $@;  # Could use croak, but then 2 "at ... line ..."  will
1913                 # be printed to STDERR
1914    }
1915    if (!$msg) {
1916        # Nothing to decode. $self->{EOF} should have been set
1917        $self->_spew(2, "No more BUFR messages found") if $Spew;
1918        return;
1919    }
1920
1921    $self->{CURRENT_MESSAGE}++;
1922
1923    return if $Nodata || $self->{IS_FILTERED};
1924
1925    # Load the relevant code tables
1926    my $table_version;
1927    eval { $table_version = $self->load_BDtables() };
1928    if ($@) {
1929        $self->{ERROR_IN_MESSAGE} = 1;
1930        die $@;
1931    }
1932
1933    # Get the data descriptors and expand them
1934    my @unexpanded = _int2fxy(unpack 'n*', $self->{SEC3}[4]);
1935    _croak "No data description in section 3" if !defined $unexpanded[0];
1936    # Using master table because local tables couldn't be found is
1937    # risky, so catch missing descriptors here to be able to give
1938    # informative error messages
1939    $self->_check_descriptors(\@unexpanded) if $self->{LOCAL_TABLES_NOT_FOUND};
1940    $self->{DESCRIPTORS_UNEXPANDED} = join ' ', @unexpanded;
1941    $self->_spew(2, "Unexpanded data descriptors: %s", $self->{DESCRIPTORS_UNEXPANDED}) if $Spew;
1942
1943    $self->_spew(2, "Expanding data descriptors") if $Spew;
1944    my $alias = "$table_version " . $self->{DESCRIPTORS_UNEXPANDED};
1945    if (exists $Descriptors_already_expanded{$alias}) {
1946        $self->{DESCRIPTORS_EXPANDED} = $Descriptors_already_expanded{$alias};
1947    } else {
1948        eval {
1949            $Descriptors_already_expanded{$alias} = $self->{DESCRIPTORS_EXPANDED}
1950                = join " ", _expand_descriptors($self->{D_TABLE}, @unexpanded);
1951        };
1952        if ($@) {
1953            $self->{ERROR_IN_MESSAGE} = 1;
1954            die $@;
1955        }
1956    }
1957
1958    # Unpack data from bitstream
1959    $self->_spew(2, "Unpacking data") if $Spew;
1960    eval {
1961        if ($self->{COMPRESSED_DATA}) {
1962            $self->_decompress_bitstream();
1963        } else {
1964            $self->_decode_bitstream();
1965        }
1966    };
1967    if ($@) {
1968        $self->{ERROR_IN_MESSAGE} = 1;
1969        die $@;
1970    }
1971
1972    return;
1973}
1974
1975## Check if all element and sequence descriptors given are found in
1976## B/D-tables (but skip check for those preceded by 206-operator)
1977sub _check_descriptors {
1978    my ($self,$unexpanded) = @_;
1979
1980    my $B_table = $self->{B_TABLE};
1981    my $D_table = $self->{D_TABLE};
1982    my $skip_next = 0;
1983    foreach my $id (@{$unexpanded}) {
1984        # Skip descriptors preceded by 206-operator
1985        if ($skip_next) {
1986            $skip_next = 0;
1987        } elsif (substr($id,0,3) eq '206') {
1988            $skip_next = 1;
1989        } elsif ( (substr($id,0,1) eq '0' && ! exists $B_table->{$id})
1990            || (substr($id,0,1) eq '3' && ! exists $D_table->{$id}) ) {
1991            my $version = ($BUFR_table{FORMAT} eq 'BUFRDC')
1992                ? substr($self->{LOCAL_TABLES_NOT_FOUND},1,-4)
1993                : $self->{LOCAL_TABLES_NOT_FOUND};
1994            undef $BUFR_table{"B$version"};
1995            undef $BUFR_table{"D$version"};
1996            $self->{ERROR_IN_MESSAGE} = 1;
1997            _croak("Data descriptor $id is not in master table."
1998                . " You need to get the local tables B/D$version.TXT");
1999        }
2000    }
2001    return;
2002}
2003
2004##  Get next observation, i.e. next subset in current BUFR message or
2005##  first subset in next message. Returns (reference to) data and
2006##  descriptors, or empty list if either no observation is found (in
2007##  which case $self->{EOF} should have been set) or if decoding of
2008##  section 4 is not requested (in which case all of sections 0-3 have
2009##  been decoded in next message).
2010sub next_observation {
2011    my $self = shift;
2012
2013    $self->_spew(2, "Fetching next observation") if $Spew;
2014
2015    # If an error occurred during decoding of previous message, we
2016    # don't know if stated length in section 0 is to be trusted,
2017    # so rewind to next 'BUFR', or setting EOF if no such exists
2018    if ($self->{ERROR_IN_MESSAGE}) {
2019        # First rewind to right after 'BUFR' in previous (faulty)
2020        # message. We cannot go further if file/buffer starts as
2021        # 'BUFRBUFR'
2022        my $pos = $self->{START_POS} + 4;
2023        seek($self->{FILEHANDLE}, $pos, 0) if $self->{FILEHANDLE};
2024        $self->_spew(2, "Error in processing BUFR message (check STDERR for "
2025                     . "details), rewinding to next 'BUFR'") if $Spew;
2026        # Prepare for (a possible) next call to _read_message by
2027        # advancing to next 'BUFR', not skipping a preceding ahl
2028        my $new_pos = $self->_find_next_BUFR($self->{FILEHANDLE},
2029                                                    $self->{IN_BUFFER},$pos,'at_ahl');
2030        if ($self->{EOF}) {
2031            $self->_spew(2, "Last BUFR message (reached end of file)") if $Spew;
2032            return;
2033        } else {
2034            $self->{POS} = $new_pos;
2035        }
2036    }
2037
2038    # Read next BUFR message
2039    if ($self->{CURRENT_MESSAGE} == 0
2040        or $self->{ERROR_IN_MESSAGE}
2041        or $self->{CURRENT_SUBSET} >= $self->{NUM_SUBSETS}) {
2042
2043        $self->{CURRENT_SUBSET} = 0;
2044        # The bit maps must be rebuilt for each message
2045        undef $self->{BITMAPS};
2046        undef $self->{BITMAP_OPERATORS};
2047        undef $self->{BITMAP_START};
2048        undef $self->{REUSE_BITMAP};
2049        $self->{NUM_BITMAPS} = 0;
2050        $self->{BACKWARD_DATA_REFERENCE} = 1;
2051        # Some more tidying after decoding of previous message might
2052        # be necessary
2053        $self->{NUM_CHANGE_OPERATORS} = 0;
2054        undef $self->{CHANGE_WIDTH};
2055        undef $self->{CHANGE_CCITTIA5_WIDTH};
2056        undef $self->{CHANGE_SCALE};
2057        undef $self->{CHANGE_REFERENCE_VALUE};
2058        undef $self->{NEW_REFVAL_OF};
2059        undef $self->{CHANGE_SRW};
2060        undef $self->{ADD_ASSOCIATED_FIELD};
2061        undef $self->{LOCAL_TABLES_NOT_FOUND};
2062        undef $self->{DATA};
2063        undef $self->{DESC};
2064        # Note that we should NOT undef metadata in section 1-3 here,
2065        # since if the next call (_next_message) finds no more
2066        # messages, we don't want to lose the metadata of the last
2067        # valid message extracted. sub join_subsets is based on this
2068        # assumption
2069
2070        $self->_next_message();
2071        return if $self->{EOF};
2072
2073        if ($Nodata || $self->{IS_FILTERED}) {
2074            # Make a simple check that section 4 and 5 are complete
2075            if ($self->{BAD_LENGTH}) {
2076                # We could have set $self->{ERROR_IN_MESSAGE} here and
2077                # let next_observation() take care of the rewinding.
2078                # But we don't want error messages to be displayed if
2079                # e.g. message is to be filtered
2080                $self->{POS} = $self->{START_POS} + 4;
2081                seek($self->{FILEHANDLE}, $self->{POS}, 0) if $self->{FILEHANDLE};
2082                $self->_spew(2, "Possibly truncated message found (last 4 bytes"
2083                             . " are not '7777'), so rewinding to position %d",
2084                             $self->{POS}) if $Spew;
2085            }
2086            # This will ensure next call to next_observation to read next message
2087            $self->{CURRENT_SUBSET} = $self->{NUM_SUBSETS};
2088            return;
2089        }
2090    }
2091
2092    $self->{CURRENT_SUBSET}++;
2093
2094    # Return references to data and descriptor arrays
2095    if ($self->{COMPRESSED_DATA}) {
2096        return ($self->{DATA}[$self->{CURRENT_SUBSET}],
2097                $self->{DESC});
2098    } else {
2099        return ($self->{DATA}[$self->{CURRENT_SUBSET}],
2100                $self->{DESC}[$self->{CURRENT_SUBSET}]);
2101    }
2102}
2103
2104# Dumping contents of a subset (including section 0, 1 and 3 if this is
2105# first subset) in a BUFR message, also displaying message number and
2106# ahl (if found) and subset number
2107sub dumpsections {
2108    my $self = shift;
2109    my $data = shift;
2110    my $descriptors = shift;
2111    my $options = shift || {};
2112
2113    my $width = $options->{width} || 15;
2114    my $bitmap = exists $options->{bitmap} ? $options->{bitmap} : 1;
2115
2116    my $current_subset_number = $self->get_current_subset_number();
2117    my $current_message_number = $self->get_current_message_number();
2118    my $current_ahl = $self->get_current_ahl() || '';
2119
2120    my $txt;
2121    if ($current_subset_number == 1) {
2122        $txt = "\nMessage $current_message_number";
2123        $txt .= defined $current_ahl ? "  $current_ahl\n" : "\n";
2124        $txt .= $self->dumpsection0() . $self->dumpsection1() . $self->dumpsection3();
2125    }
2126
2127    # If this is last message and there is a BUFR formatting error
2128    # caught by user with eval, we might end up here with current
2129    # subset number 0 (and no section 4 to dump)
2130    if ($current_subset_number > 0) {
2131        $txt .= "\nSubset $current_subset_number\n";
2132        $txt .= $bitmap ? $self->dumpsection4_with_bitmaps($data,$descriptors,
2133                                 $current_subset_number,$width)
2134                        : $self->dumpsection4($data,$descriptors,$width);
2135    }
2136
2137    return $txt;
2138}
2139
2140sub dumpsection0 {
2141    my $self = shift;
2142    _croak "BUFR object not properly initialized to call dumpsection0. "
2143        . "Did you forget to call next_observation()?" unless $self->{BUFR_LENGTH};
2144
2145    my $txt = <<"EOT";
2146
2147Section 0:
2148    Length of BUFR message:            $self->{BUFR_LENGTH}
2149    BUFR edition:                      $self->{BUFR_EDITION}
2150EOT
2151    return $txt;
2152}
2153
2154sub dumpsection1 {
2155    my $self = shift;
2156    _croak "BUFR object not properly initialized to call dumpsection1. "
2157        . "Did you forget to call next_observation()?" unless $self->{SEC1_STREAM};
2158
2159    my $txt;
2160    if ($self->{BUFR_EDITION} < 4) {
2161        $txt = <<"EOT";
2162
2163Section 1:
2164    Length of section:                 @{[ length $self->{SEC1_STREAM} ]}
2165    BUFR master table:                 $self->{MASTER_TABLE}
2166    Originating subcentre:             $self->{SUBCENTRE}
2167    Originating centre:                $self->{CENTRE}
2168    Update sequence number:            $self->{UPDATE_NUMBER}
2169    Optional section present:          $self->{OPTIONAL_SECTION}
2170    Data category (table A):           $self->{DATA_CATEGORY}
2171    Data subcategory:                  $self->{DATA_SUBCATEGORY}
2172    Master table version number:       $self->{MASTER_TABLE_VERSION}
2173    Local table version number:        $self->{LOCAL_TABLE_VERSION}
2174    Year of century:                   $self->{YEAR_OF_CENTURY}
2175    Month:                             $self->{MONTH}
2176    Day:                               $self->{DAY}
2177    Hour:                              $self->{HOUR}
2178    Minute:                            $self->{MINUTE}
2179EOT
2180    } else {
2181        $txt = <<"EOT";
2182
2183Section 1:
2184    Length of section:                 @{[ length $self->{SEC1_STREAM} ]}
2185    BUFR master table:                 $self->{MASTER_TABLE}
2186    Originating centre:                $self->{CENTRE}
2187    Originating subcentre:             $self->{SUBCENTRE}
2188    Update sequence number:            $self->{UPDATE_NUMBER}
2189    Optional section present:          $self->{OPTIONAL_SECTION}
2190    Data category (table A):           $self->{DATA_CATEGORY}
2191    International data subcategory:    $self->{INT_DATA_SUBCATEGORY}
2192    Local data subcategory:            $self->{LOC_DATA_SUBCATEGORY}
2193    Master table version number:       $self->{MASTER_TABLE_VERSION}
2194    Local table version number:        $self->{LOCAL_TABLE_VERSION}
2195    Year:                              $self->{YEAR}
2196    Month:                             $self->{MONTH}
2197    Day:                               $self->{DAY}
2198    Hour:                              $self->{HOUR}
2199    Minute:                            $self->{MINUTE}
2200    Second:                            $self->{SECOND}
2201EOT
2202    }
2203    # Last part of section 1: "Reserved for local use by ADP centres"
2204    # is considered so uninteresting (and rare), that it is displayed
2205    # only if verbose >= 2, in a _spew statement. Note that for BUFR
2206    # edition < 4 there is always one byte here (to make an even
2207    # number of bytes in section 1).
2208    $self->_spew(2, "Reserved for local use:             0x@{[unpack('H*', $self->{LOCAL_USE})]}")
2209        if $self->{LOCAL_USE} and length $self->{LOCAL_USE} > 1;
2210
2211    return $txt;
2212}
2213
2214sub dumpsection2 {
2215    my $self = shift;
2216    return '' if not defined $self->{SEC2};
2217
2218    my $sec2_code_ref = shift;
2219    _croak "dumpsection2: no code ref provided"
2220        unless defined $sec2_code_ref && ref($sec2_code_ref) eq 'CODE';
2221
2222    my $txt = <<"EOT";
2223
2224Section 2:
2225    Length of section:                 @{[ length $self->{SEC2_STREAM} ]}
2226EOT
2227
2228    return $txt . $sec2_code_ref->($self->{SEC2_STREAM}) . "\n";
2229}
2230
2231sub dumpsection3 {
2232    my $self = shift;
2233    _croak "BUFR object not properly initialized to call dumpsection3. "
2234        . "Did you forget to call next_observation()?" unless $self->{SEC3_STREAM};
2235    $self->{DESCRIPTORS_UNEXPANDED} ||= '';
2236
2237    my $txt = <<"EOT";
2238
2239Section 3:
2240    Length of section:                 @{[ length $self->{SEC3_STREAM} ]}
2241    Number of data subsets:            $self->{NUM_SUBSETS}
2242    Observed data:                     $self->{OBSERVED_DATA}
2243    Compressed data:                   $self->{COMPRESSED_DATA}
2244    Data descriptors unexpanded:       $self->{DESCRIPTORS_UNEXPANDED}
2245EOT
2246    return $txt;
2247}
2248
2249sub dumpsection4 {
2250    my $self = shift;
2251    my $data = shift;
2252    my $descriptors = shift;
2253    my $width = shift || 15;    # Optional argument
2254    # Since last (optional) argument to dumpsection() is an anonymous
2255    # hash, check that this is not mistakenly applied here also
2256    _croak "Last optional argument to dumpsection4 should be integer"
2257        if ref($width) || $width !~ /^\d+$/;
2258
2259    my $txt = "\n";
2260    my $B_table = $self->{B_TABLE};
2261    # Add the artificial descriptor for associated field
2262    $B_table->{999999} = "ASSOCIATED FIELD\0NUMERIC";
2263    my $C_table = $self->{C_TABLE} || '';
2264    my $idx = 0;
2265    my $line_no = 0;    # Precede each line with a line number, except
2266                        # for replication descriptors and for operator
2267                        # descriptors with no data value in section 4
2268  ID:
2269    foreach my $id (@{$descriptors}) {
2270        my $value = defined $data->[$idx] ? $data->[$idx] : 'missing';
2271        $idx++;
2272        my $f = substr($id, 0, 1);
2273        if ($f == 1) {
2274            $txt .= sprintf "        %6d\n", $id;
2275            next ID;
2276        } elsif ($f == 2) {
2277            if ($id =~ /^205/) {    # Character information operator
2278                $txt .= sprintf "%6d  %06d  %${width}.${width}s  %s\n",
2279                    ++$line_no, $id, $value, "CHARACTER INFORMATION";
2280                next ID;
2281            } else {
2282                my $operator_name = _get_operator_name($id);
2283                if ($operator_name) {
2284                    $txt .= sprintf "        %06d  %${width}.${width}s  %s\n",
2285                        $id, "", $operator_name;
2286                }
2287                next ID;
2288            }
2289        } elsif ($f == 9 && $id != 999999) {
2290            $txt .= sprintf "%6d  %06d  %${width}.${width}s  %s %06d\n",
2291                ++$line_no, $id, $value, 'NEW REFERENCE VALUE FOR', $id - 900000;
2292            next ID;
2293        } elsif ($id == 31031) { # This is the only data descriptor
2294                                 # where all bits set to one should
2295                                 # not be rendered as missing value
2296                                 # (for replication/repetition factors in
2297                                 # class 31 $value has been adjusted already)
2298            $value = 1 if $value eq 'missing';
2299        }
2300        _croak "Data descriptor $id is not present in BUFR table B"
2301            unless exists $B_table->{$id};
2302        my ($name, $unit, $bits) = (split /\0/, $B_table->{$id})[0,1,4];
2303        # Code or flag table number equals $id, so no need to display this in [unit]
2304        my $short_unit = $unit;
2305        my $unit_start = uc(substr($unit, 0, 4));
2306        if ($unit_start eq 'CODE') {
2307            $short_unit = 'CODE TABLE';
2308        } elsif ($unit_start eq 'FLAG') {
2309            $short_unit = 'FLAG TABLE';
2310        }
2311        $txt .= sprintf "%6d  %06d  %${width}.${width}s  %s\n",
2312            ++$line_no, $id, $value, "$name [$short_unit]";
2313
2314        # Check for illegal flag value
2315        if ($Strict_checking && $short_unit eq 'FLAG TABLE' && $bits > 1) {
2316            if ($value ne 'missing' && $value % 2) {
2317                $bits += 0; # get rid of spaces
2318                my $max_value = 2**$bits - 1;
2319                _complain("$id - $value: rightmost bit $bits is set indicating missing value"
2320                          . " but then value should be $max_value");
2321            }
2322        }
2323
2324        # Resolve flag and code table values if code table is loaded
2325        # (but don't bother about 031031 - too much uninformative output)
2326        if ($C_table && $id != 31031 && $value ne 'missing') {
2327            my $num_spaces = $width + 18;
2328            $txt .= _get_code_table_txt($id,$value,$unit,$B_table,$C_table,$num_spaces)
2329        }
2330    }
2331    return $txt;
2332}
2333
2334# Operators which should always be displayed in dumpsection4
2335my %OPERATOR_NAME_A =
2336    ( 222000 => 'QUALITY INFORMATION FOLLOW',
2337      223000 => 'SUBSTITUTED VALUES FOLLOW',
2338      224000 => 'FIRST ORDER STATISTICS FOLLOW',
2339      225000 => 'DIFFERENCE STATISTICAL VALUES FOLLOW',
2340      232000 => 'REPLACE/RETAINED VALUES FOLLOW',
2341      235000 => 'CANCEL BACKWARD DATA REFERENCE',
2342      236000 => 'DEFINE DATA PRESENT BIT MAP',
2343      237000 => 'USE PREVIOUSLY DEFINED BIT MAP',
2344 );
2345# Operators which should normally not be displayed in dumpsection4
2346my %OPERATOR_NAME_B =
2347    ( 201000 => 'CANCEL CHANGE DATA WIDTH',
2348      202000 => 'CANCEL CHANGE SCALE',
2349      203000 => 'CANCEL CHANGE REFERENCE VALUES',
2350      207000 => 'CANCEL INCREASE SCALE, REFERENCE VALUE AND DATA WIDTH',
2351      208000 => 'CANCEL CHANGE WIDTH OF CCITT IA5 FIELD',
2352      203255 => 'STOP CHANGING REFERENCE VALUES',
2353      223255 => 'SUBSTITUTED VALUES MARKER OPERATOR',
2354      224255 => 'FIRST ORDER STATISTICAL VALUES MARKER OPERATOR',
2355      225255 => 'DIFFERENCE STATISTICAL STATISTICAL VALUES MARKER OPERATOR',
2356      232255 => 'REPLACED/RETAINED VALUES MARKER OPERATOR',
2357      237255 => 'CANCEL DEFINED DATA PRESENT BIT MAP',
2358 );
2359# Operator classes which should normally not be displayed in dumpsection4
2360my %OPERATOR_NAME_C =
2361    ( 201 => 'CHANGE DATA WIDTH',
2362      202 => 'CHANGE SCALE',
2363      203 => 'CHANGE REFERENCE VALUES',
2364      204 => 'ADD ASSOCIATED FIELD',
2365      # This one is displayed, treated specially (and named CHARACTER INFORMATION)
2366##      205 => 'SIGNIFY CHARACTER',
2367      206 => 'SIGNIFY DATA WIDTH FOR THE IMMEDIATELY FOLLOWING LOCAL DESCRIPTOR',
2368      207 => 'INCREASE SCALE, REFERENCE VALUE AND DATA WIDTH',
2369      208 => 'CHANGE WIDTH OF CCITT IA5 FIELD',
2370      221 => 'DATA NOT PRESENT',
2371 );
2372sub _get_operator_name {
2373    my $id = shift;
2374    my $operator_name = '';
2375    if ($OPERATOR_NAME_A{$id}) {
2376        $operator_name = $OPERATOR_NAME_A{$id}
2377    } elsif ($Show_all_operators) {
2378        if ($OPERATOR_NAME_B{$id}) {
2379            $operator_name = $OPERATOR_NAME_B{$id}
2380        } else {
2381            my $fx = substr $id, 0, 3;
2382            if ($OPERATOR_NAME_C{$fx}) {
2383                $operator_name = $OPERATOR_NAME_C{$fx};
2384            }
2385        }
2386    }
2387    return $operator_name;
2388}
2389
2390## Display bit mapped values on same line as the original value. This
2391## offer a much shorter and easier to read dump of section 4 when bit
2392## maps has been used (i.e. for 222000 quality information, 223000
2393## substituted values, 224000 first order statistics, 225000
2394## difference statistics, 232000 replaced/retained values). '*******'
2395## is displayed if data is not present in bit map (bit set to 1 in
2396## 031031 or data not covered by the 031031 descriptors), 'missing' is
2397## displayed if value is missing.  But note that we miss other
2398## descriptors like 001031 and 001032 if these come after 222000 etc
2399## with the current implementation. And there are more shortcomings,
2400## described in CAVEAT section in POD for bufrread.pl
2401sub dumpsection4_with_bitmaps {
2402    my $self = shift;
2403    my $data = shift;
2404    my $descriptors = shift;
2405    my $isub = shift;
2406    my $width = shift || 15;    # Optional argument
2407
2408    # If no bit maps call the ordinary dumpsection4
2409    if (not defined $self->{BITMAPS}) {
2410        return $self->dumpsection4($data, $descriptors, $width);
2411    }
2412
2413    # $Show_all_operators must be turned off for this sub to work correctly
2414    _croak "Cannot dump section 4 properly with bitmaps"
2415        . " when Show_all_operators is set" if $Show_all_operators;
2416
2417    # The kind of bit maps (i.e. the operator descriptors) used in BUFR message
2418    my @bitmap_desc = @{ $self->{BITMAP_OPERATORS} };
2419
2420    my @bitmap_array; # Will contain for each bit map a reference to a hash with
2421                      # key: index (in data and descriptor arrays) for data value
2422                      # value: index for bit mapped value
2423
2424    # For compressed data all subsets use same bit map (we assume)
2425    $isub = 0 if $self->{COMPRESSED_DATA};
2426
2427    my $txt = "\n";
2428    my $space = ' ';
2429    my $line = $space x (17 + $width);
2430    foreach my $bitmap_num (0..$#bitmap_desc) {
2431        $line .= "  $bitmap_desc[$bitmap_num]";
2432        # Convert the sequence of ($data_idesc,$bitmapped_idesc) pairs into a hash
2433        my %hash = @{ $self->{BITMAPS}->[$bitmap_num + 1]->[$isub] };
2434        $bitmap_array[$bitmap_num] = \%hash;
2435    }
2436    # First make a line showing the operator descriptors using bit maps
2437    $txt .= "$line\n";
2438
2439    my $B_table = $self->{B_TABLE};
2440    # Add the artificial descriptor for associated field
2441    $B_table->{999999} = "ASSOCIATED FIELD\0Numeric";
2442    my $C_table = $self->{C_TABLE} || '';
2443
2444    my $idx = 0;
2445    # Loop over data descriptors
2446  ID:
2447    foreach my $id (@{$descriptors}) {
2448        # Stop printing when the bit map part starts
2449        last ID if (substr($id,0,1) eq '2'
2450                        and ($id =~ /^22[2-5]/ || $id =~ /^232/));
2451
2452        # Get the data value
2453        my $value = defined $data->[$idx] ? $data->[$idx] : 'missing';
2454        _croak "Data descriptor $id is not present in BUFR table B"
2455            unless exists $B_table->{$id};
2456        my ($name, $unit, $bits) = (split /\0/, $B_table->{$id})[0,1,4];
2457        $line = sprintf "%6d  %06d  %${width}.${width}s ",
2458            $idx+1, $id, $value;
2459
2460        # Then get the corresponding bit mapped values, using '*******'
2461        # if 'data not present' in bit map
2462        my $max_len = 7;
2463        foreach my $bitmap_num (0..$#bitmap_desc) {
2464            my $val;
2465            if ($bitmap_array[$bitmap_num]->{$idx}) {
2466                # data marked as 'data present' in bitmap
2467                my $bitmapped_idesc = $bitmap_array[$bitmap_num]->{$idx};
2468                $val = defined $data->[$bitmapped_idesc]
2469                    ? $data->[$bitmapped_idesc] : 'missing';
2470                $max_len = length($val) if length($val) > $max_len;
2471            } else {
2472                $val = '*******';
2473            }
2474            # If $max_len has been increased, this might not always
2475            # print very pretty, but at least there is no truncation
2476            # of digits in value
2477            $line .= sprintf " %${max_len}.${max_len}s", $val;
2478        }
2479        # Code or flag table number equals $id, so no need to display this in [unit]
2480        my $short_unit = $unit;
2481        my $unit_start = uc(substr($unit, 0, 4));
2482        if ($unit_start eq 'CODE') {
2483            $short_unit = 'CODE TABLE';
2484        } elsif ($unit_start eq 'FLAG') {
2485            $short_unit = 'FLAG TABLE';
2486        }
2487        $line .=  sprintf "  %s\n", "$name [$short_unit]";
2488        $txt .= $line;
2489
2490        # Check for illegal flag value
2491        if ($Strict_checking && $short_unit eq 'FLAG TABLE' && $bits > 1) {
2492            if ($value ne 'missing' and $value % 2) {
2493                my $max_value = 2**$bits - 1;
2494                $bits += 0; # get rid of spaces
2495                _complain("$id - $value: rightmost bit $bits is set indicating missing value"
2496                          . " but then value should be $max_value");
2497            }
2498        }
2499
2500        # Resolve flag and code table values if code table is loaded
2501        if ($C_table && $value ne 'missing') {
2502            my $num_spaces = $width + 19 + 7*@bitmap_desc;
2503            $txt .= _get_code_table_txt($id,$value,$unit,$B_table,$C_table,$num_spaces)
2504        }
2505        $idx++;
2506    }
2507    return $txt;
2508}
2509
2510## Return the text found in flag or code tables for value $value of
2511## descriptor $id. The empty string is returned if $unit is neither
2512## CODE TABLE nor FLAG TABLE, or if $unit is CODE TABLE but for this
2513## $value there is no text in C table. Returns a "... does not exist!"
2514## message if flag/code table is not found. If $check_illegal is
2515## defined, an 'Illegal value' message is returned if $value is bigger
2516## than allowed or has highest bit set without having all other bits
2517## set.
2518sub _get_code_table_txt {
2519    my ($id,$value,$unit,$B_table,$C_table,$num_spaces,$check_illegal) = @_;
2520
2521    my $txt = '';
2522    # Need case insensitive matching, since local tables from at least
2523    # DWD use 'Code table', not 'CODE TABLE', in the ECMWF ecCodes
2524    # distribution
2525    if ($unit =~ m/^CODE[ ]?TABLE/i) {
2526        my $code_table = sprintf "%06d", $id;
2527        return "Code table $code_table does not exist!\n"
2528            if ! exists $C_table->{$code_table};
2529        if ($C_table->{$code_table}{$value}) {
2530            my @lines = split "\n", $C_table->{$code_table}{$value};
2531            foreach (@lines) {
2532                $txt .= sprintf "%s   %s\n", ' ' x ($num_spaces), lc $_;
2533            }
2534        }
2535    } elsif ($unit =~ m/^FLAG[ ]?TABLE/i) {
2536        my $flag_table = sprintf "%06d", $id;
2537        return "Flag table $flag_table does not exist!\n"
2538            if ! exists $C_table->{$flag_table};
2539
2540        my $width = (split /\0/, $B_table->{$flag_table})[4];
2541        $width += 0;            # Get rid of spaces
2542        # Cannot handle more than 32 bits flags with current method
2543        _croak "Unable to handle > 32 bits flag; $id has width $width"
2544            if $width > 32;
2545
2546        my $max_value = 2**$width - 1;
2547
2548        if (defined $check_illegal and $value > $max_value) {
2549            $txt = "Illegal value: $value is bigger than maximum allowed ($max_value)\n";
2550        } elsif ($value == $max_value) {
2551            $txt = sprintf "%s=> %s", ' ' x ($num_spaces), "bit $width set:"
2552                . sprintf "%s   %s\n", ' ' x ($num_spaces), "missing value\n";
2553        } else {
2554            # Convert to bitstring and localize the 1 bits
2555            my $binary = pack "N", $value; # Packed as 32 bits in big-endian order
2556            my $bitstring = substr unpack('B*',$binary), 32-$width;
2557            for my $i (1..$width) {
2558                if (substr($bitstring, $i-1, 1) == 1) {
2559                    $txt .= sprintf "%s=> %s", ' ' x ($num_spaces),
2560                        "bit $i set";
2561                    if ($C_table->{$flag_table}{$i}) {
2562                        my @lines = split "\n", $C_table->{$flag_table}{$i};
2563                        $txt .= ': ' . lc (shift @lines) . "\n";
2564                        foreach (@lines) {
2565                            $txt .= sprintf "%s   %s\n", ' ' x ($num_spaces), lc $_;
2566                        }
2567                    } else {
2568                        $txt .= "\n";
2569                    }
2570                }
2571            }
2572            if (defined $check_illegal and $txt =~ /bit $width set/) {
2573                $txt = "Illegal value ($value): bit $width is set indicating missing value,"
2574                    . " but then value should be $max_value\n";
2575            }
2576        }
2577    }
2578    return $txt;
2579}
2580
2581##  Convert from integer to descriptor
2582sub _int2fxy {
2583    my @fxy = map {sprintf("%1d%02d%03d", ($_>>14)&0x3, ($_>>8)&0x3f, $_&0xff)} @_;
2584    return @_ > 1 ? @fxy : $fxy[0];
2585}
2586
2587##  Expand a list of descriptors using BUFR table D, also expanding
2588##  simple replication but not delayed replication
2589sub _expand_descriptors {
2590    my $D_table = shift;
2591    my @expanded = ();
2592
2593    for (my $di = 0; $di < @_; $di++) {
2594        my $descriptor = $_[$di];
2595        _croak "$descriptor is not a BUFR descriptor"
2596            if $descriptor !~ /^\d{6}$/;
2597        my $f = int substr($descriptor, 0, 1);
2598        if ($f == 1) {
2599            my $x = substr $descriptor, 1, 2; # Replicate next $x descriptors
2600            my $y = substr $descriptor, 3;    # Number of replications
2601            if ($y > 0) {
2602                # Simple replication (replicate next x descriptors y times)
2603                _croak "Cannot expand: Not enough descriptors following "
2604                    . "replication descriptor $descriptor (or there is "
2605                    . "a problem in nesting of replication)" if $di+$x+1 > @_;
2606                my @r = ();
2607                push @r, @_[($di+1)..($di+$x)] for (1..$y);
2608                # Recursively expand replicated descriptors $y times
2609                my @s = ();
2610                @s = _expand_descriptors($D_table, @r) if @r;
2611                if ($Show_replication) {
2612                    # Adjust x since replicated descriptors might have been expanded
2613                    # Unfortunately _spew is not available here to report the x>99 -> x=0 hack
2614                    my $z =  @s/$y > 99 ? 0 : @s/$y;
2615                    substr($_[$di], 1, 2) = sprintf "%02d", $z;
2616                    push @expanded, $_[$di];
2617                }
2618                push @expanded, @s if @s;
2619                $di += $x;
2620            } else {
2621                # Delayed replication. Next descriptor ought to be the
2622                # delayed descriptor replication (and data repetition)
2623                # factor, i.e. one of 0310(00|01|02|11|12), followed
2624                # by the x descriptors to be replicated
2625                if ($di+2 == @_ && $_[$di+1] =~ /^0310(00|01|02|11|12)$/) {
2626                    _complain "Missing the $x descriptors which should follow"
2627                        . " $descriptor $_[$di+1]";
2628                    push @expanded, @_[$di,$di+1];
2629                    last;
2630                }
2631                _croak "Cannot expand: Not enough descriptors following delayed"
2632                    . " replication descriptor $descriptor (or there is "
2633                    . "a problem in nesting of replication)" if $di+$x+1 > @_;
2634                _croak "Cannot expand: Delayed replication descriptor "
2635                    . "$descriptor is not followed by one of "
2636                    . "0310(00|01|02|11|12) but by $_[$di+1]"
2637                        if $_[$di+1] !~ /^0310(00|01|02|11|12)$/;
2638                my @r = @_[($di+2)..($di+$x+1)];
2639                # Here we just expand the D descriptors in the
2640                # descriptors to be replicated. The final expansion
2641                # using delayed replication factor has to wait until
2642                # data part is decoded
2643                my @s = ();
2644                @s = _expand_descriptors($D_table, @r) if @r;
2645                # Must adjust x since replicated descriptors might have been expanded
2646                substr($_[$di], 1, 2) = sprintf "%02d", scalar @s;
2647                push @expanded, @_[$di,$di+1], @s;
2648                $di += 1+$x; # NOTE: 1 is added to $di on next iteration
2649            }
2650            next;
2651        } elsif ($f == 3) {
2652            _croak "No sequence descriptor $descriptor in BUFR table D"
2653                if not exists $D_table->{$descriptor};
2654            # Expand recursively, if necessary
2655            push @expanded,
2656                _expand_descriptors($D_table, split /\s/, $D_table->{$descriptor});
2657        } else { # f=0,2
2658            push @expanded, $descriptor;
2659        }
2660    }
2661
2662    return @expanded;
2663}
2664
2665## Return a text string suitable for printing information about the given
2666## BUFR table descriptors
2667##
2668## $how = 'fully': Expand all D descriptors fully into B descriptors,
2669## with name, unit, scale, reference value and width (each on a
2670## numbered line, except for replication operators which are not
2671## numbered).
2672##
2673## $how = 'partially': Like 'fully, but expand D descriptors only once
2674## and ignore replication.
2675##
2676## $how = 'noexpand': Like 'partially', but do not expand D
2677## descriptors at all.
2678##
2679## $how = 'simply': Like 'partially', but list the descriptors on one
2680## single line with no extra information provided.
2681sub resolve_descriptor {
2682    my $self = shift;
2683    my $how = shift;
2684    foreach (@_) {
2685        _croak("'$_' is not an integer argument to resolve_descriptor!")
2686            unless /^\d+$/;
2687    }
2688    my @desc = map { sprintf "%06d", $_ } @_;
2689
2690    my @allowed_hows = qw( simply fully partially noexpand );
2691    _croak "First argument in resolve_descriptor must be one of"
2692        . " '@allowed_hows', is: '$how'"
2693            unless grep { $how eq $_ } @allowed_hows;
2694
2695    if (! $self->{B_TABLE}) {
2696        if ($BUFR_table{FORMAT} eq 'ECCODES' && $self->{LOCAL_TABLES_NOT_FOUND}) {
2697            _croak "Local table " . $self->{LOCAL_TABLES_NOT_FOUND} . " couldn't be found,"
2698                . " or you might need to load WMO master table also?";
2699        } else {
2700            _croak "No B table is loaded - did you forget to call load_BDtables?";
2701        }
2702    }
2703    my $B_table = $self->{B_TABLE};
2704
2705    # Some local tables are provided only for element descriptors, and
2706    # we might in fact not need the sequence descriptors for resolving
2707    my $D_table;
2708    my $need_Dtable = 0;
2709    foreach my $id (@desc) {
2710        if (substr($id,0,1) eq '3') {
2711            $need_Dtable = 1;
2712        }
2713    }
2714    if ($need_Dtable && ! $self->{D_TABLE}) {
2715        if ($BUFR_table{FORMAT} eq 'ECCODES' && $self->{LOCAL_TABLES_NOT_FOUND}) {
2716            _croak "Local table " . $self->{LOCAL_TABLES_NOT_FOUND} . " couldn't be found,"
2717                . " or you might need to load WMO master table also?";
2718        } else {
2719            _croak "No D table is loaded - did you forget to call load_BDtables?";
2720        }
2721    } else {
2722        # Could consider omitting this if $need_Dtable = 0 ...
2723        $D_table = $self->{D_TABLE};
2724    }
2725
2726    my $txt = '';
2727
2728    if ($how eq 'simply' or $how eq 'partially') {
2729        my @expanded;
2730        foreach my $id (@desc) {
2731            my $f = substr $id, 0, 1;
2732            if ($f == 3) {
2733                _croak "$id is not in table D, unable to expand"
2734                    unless $D_table->{$id};
2735                push @expanded, split /\s/, $D_table->{$id};
2736            } else {
2737                push @expanded, $id;
2738            }
2739        }
2740        if ($how eq 'simply') {
2741            return $txt = "@expanded\n";
2742        } else {
2743            @desc = @expanded;
2744        }
2745    }
2746    if ($how eq 'fully') {
2747        if (@desc == 1 and $desc[0] =~ /^1/) {
2748            # This is simply a replication descriptor; do not try to expand
2749        } else {
2750            @desc = _expand_descriptors($D_table, @desc);
2751        }
2752    }
2753
2754    my $count = 0;
2755    foreach my $id (@desc) {
2756        if ($id =~ /^[123]/) {
2757            $txt .= sprintf "    %06d\n", $id;
2758        } elsif ($B_table->{$id}) {
2759            my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$id};
2760            $txt .= sprintf "%3d %06d  %s [%s] %d %d %d\n",
2761                ++$count,$id,$name,$unit,$scale,$refval,$width;
2762        } else {
2763            $txt .= sprintf "%3d %06d  Not in table B\n",
2764                ++$count,$id;
2765        }
2766    }
2767    return $txt;
2768}
2769
2770## Return BUFR table B information for an element descriptor for the
2771## last table loaded, as an array of name, unit, scale, reference
2772## value and data width in bits. Returns false if the descriptor is
2773## not found or no data width is defined, or croaks if no table B has
2774## been loaded.
2775sub element_descriptor {
2776    my $self = shift;
2777    my $desc = shift;
2778    _croak "Argument to element_descriptor must be an integer\n"
2779        unless $desc =~ /^\d+$/;
2780    $desc = sprintf "%06d", $desc;
2781    _croak "No BUFR B table loaded\n" unless defined $self->{B_TABLE};
2782    return unless defined $self->{B_TABLE}->{$desc};
2783    my ($name, $unit, $scale, $refval, $width)
2784        = split /\0/, $self->{B_TABLE}->{$desc};
2785    return unless defined $width && $width =~ /\d+$/;
2786    return ($name, $unit, $scale+0, $refval+0, $width+0);
2787}
2788
2789## Return BUFR table D information for a sequence descriptor for the
2790## last table loaded, as a space separated string of the descriptors
2791## in the direct (nonrecursive) lookup in table D. Returns false if
2792## the sequence descriptor is not found, or croaks if no table D has
2793## been loaded.
2794sub sequence_descriptor {
2795    my $self = shift;
2796    my $desc = shift;
2797    _croak "Argument to element_descriptor must be an integer\n"
2798        unless $desc =~ /^\d+$/;
2799    _croak "No BUFR D table loaded\n" unless defined $self->{D_TABLE};
2800    return unless defined $self->{D_TABLE}->{$desc};
2801    if (wantarray) {
2802        return split / /, $self->{D_TABLE}->{$desc};
2803    } else {
2804        return $self->{D_TABLE}->{$desc};
2805    }
2806}
2807
2808## Return a text string telling which bits are set and the meaning of
2809## the bits set when $value is interpreted as a flag value, also
2810## checking for illegal values. The empty string is returned if $value=0.
2811sub resolve_flagvalue {
2812    my $self = shift;
2813    my ($value,$flag_table,$table,$default_table,$num_leading_spaces) = @_;
2814    _croak "Flag value can't be negative!\n" if $value < 0;
2815    $num_leading_spaces ||= 0;  # Default value
2816
2817    $self->load_Ctable($table,$default_table);
2818    my $C_table = $self->{C_TABLE};
2819
2820    # Number of bits used for the flag is hard to extract from C
2821    # table; it is much easier to obtain from B table
2822    $self->load_BDtables($table);
2823    my $B_table = $self->{B_TABLE};
2824
2825    my $unit = 'FLAG TABLE';
2826    return _get_code_table_txt($flag_table,$value,$unit,
2827                               $B_table,$C_table,$num_leading_spaces,'check_illegal');
2828}
2829
2830## Return the contents of code table $code_table, or empty string if
2831## code table is not found
2832sub dump_codetable {
2833    my $self = shift;
2834    my ($code_table,$table,$default_table) = @_;
2835    _croak("code_table '$code_table' is not a (positive) integer in dump_codetable()")
2836        unless $code_table =~ /^\d+$/;
2837    $code_table = sprintf "%06d", $code_table;
2838
2839    $self->load_Ctable($table,$default_table);
2840    my $C_table = $self->{C_TABLE};
2841
2842    return '' unless $C_table->{$code_table};
2843
2844    my $dump;
2845    foreach my $value (sort {$a <=> $b} keys %{ $C_table->{$code_table} }) {
2846        my $txt = $C_table->{$code_table}{$value};
2847        chomp $txt;
2848        $txt =~ s/\n/\n       /g;
2849        $dump .= sprintf "%3d -> %s\n", $value, $txt;
2850    }
2851    return $dump;
2852}
2853
2854## Decode bitstream (data part of section 4) while working through the
2855## (expanded) descriptors in section 3. The final data and
2856## corresponding descriptors are put in $self->{DATA} and
2857## $self->{DESC} (indexed by subset number)
2858sub _decode_bitstream {
2859    my $self = shift;
2860    $self->{CODING} = 'DECODE';
2861    my $bitstream = $self->{SEC4_RAWDATA} . "\0\0\0\0";
2862    my $maxpos = 8*length($self->{SEC4_RAWDATA});
2863    my $pos = 0;
2864    my @operators;
2865    my $ref_values_ref; # Hash ref to reference values with descriptors as keys;
2866                        # to be implemented later (not used yet)
2867    my @subset_data; # Will contain data values for subset 1,2...
2868    my @subset_desc; # Will contain the set of descriptors for subset 1,2...
2869                     # expanded to be in one to one correspondance with the data
2870    my $repeat_X; # Set to number of descriptors to be repeated if
2871                  # delayed descriptor and data repetition factor is
2872                  # in effect
2873    my $repeat_factor; # Set to number of times descriptors (and data)
2874                       # are to be repeated if delayed descriptor and
2875                       # data repetition factor is in effect
2876    my @repeat_desc; # The descriptors to be repeated
2877    my @repeat_data; # The data to be repeated
2878    my $B_table = $self->{B_TABLE};
2879
2880    # Has to fully expand @desc for each subset in turn, as delayed
2881    # replication factors might be different for each subset,
2882    # resulting in different full expansions. During the expansion the
2883    # effect of operator descriptors are taken into account, causing
2884    # most of them to be eliminated (unless $Show_all_operators is
2885    # set), so that @desc and the equivalent $subset_desc[$isub] ends
2886    # up being in one to one correspondence with the data values in
2887    # $subset_data[$isub] (the operators included having data value
2888    # '')
2889  S_LOOP: foreach my $isub (1..$self->{NUM_SUBSETS}) {
2890        $self->_spew(2, "Decoding subset number %d", $isub) if $Spew;
2891
2892        # Bit maps might vary from subset to subset, so must be rebuilt
2893        undef $self->{BITMAP_OPERATORS};
2894        undef $self->{BITMAP_START};
2895        undef $self->{REUSE_BITMAP};
2896        $self->{NUM_BITMAPS} = 0;
2897        $self->{BACKWARD_DATA_REFERENCE} = 1;
2898        $self->{NUM_CHANGE_OPERATORS} = 0;
2899
2900        my @desc = split /\s/, $self->{DESCRIPTORS_EXPANDED};
2901
2902        # Note: @desc as well as $idesc may be changed during this loop,
2903        # so we cannot use a foreach loop instead
2904      D_LOOP: for (my $idesc = 0; $idesc < @desc; $idesc++) {
2905            my $id = $desc[$idesc];
2906            my $f = substr($id,0,1);
2907            my $x = substr($id,1,2)+0;
2908            my $y = substr($id,3,3)+0;
2909
2910            if ($f == 1) {
2911                if ($Show_replication) {
2912                    push @{$subset_desc[$isub]}, $id;
2913                    push @{$subset_data[$isub]}, '';
2914                    $self->_spew(4, "X=0 in $id for F=1, might have been > 99 in expansion")
2915                        if $Spew && $x == 0;
2916                }
2917                next D_LOOP if $y > 0; # Nothing more to do for normal replication
2918
2919                if ($x == 0) {
2920                    _complain("Nonsensical replication of zero descriptors ($id)");
2921                    $idesc++;
2922                    next D_LOOP;
2923                }
2924
2925                $_ = $desc[$idesc+1];
2926                _croak "$id Erroneous replication factor"
2927                    unless /^0310(00|01|02|11|12)/ && exists $B_table->{$_};
2928
2929                my $width = (split /\0/, $B_table->{$_})[-1];
2930                my $factor = bitstream2dec($bitstream, $pos, $width);
2931                $pos += $width;
2932                # Delayed descriptor replication factors (and
2933                # associated fields) are the only values in section 4
2934                # where all bits being 1 is not to be interpreted as a
2935                # missing value
2936                if (not defined $factor) {
2937                    $factor = 2**$width - 1;
2938                }
2939                if ($Spew) {
2940                    if ($_ eq '031011' || $_ eq '031012') {
2941                        $self->_spew(4, "$_  Delayed repetition factor: %s", $factor);
2942                    } else {
2943                        $self->_spew(4, "$_  Delayed replication factor: %s", $factor);
2944                    }
2945                }
2946                # Include the delayed replication in descriptor and data list
2947                splice @desc, $idesc++, 0, $_;
2948                push @{$subset_desc[$isub]}, $_;
2949                push @{$subset_data[$isub]}, $factor;
2950
2951                if ($_ eq '031011' || $_ eq '031012') {
2952                    # For delayed repetition, descriptor *and* data are
2953                    # to be repeated
2954                    $repeat_X = $x;
2955                    $repeat_factor = $factor;
2956                }
2957                my @r = ();
2958                push @r, @desc[($idesc+2)..($idesc+$x+1)] while $factor--;
2959                splice @desc, $idesc, 2+$x, @r;
2960
2961                if ($repeat_factor) {
2962                    # Skip to the last set to be repeated, which will
2963                    # then be included $repeat_factor times
2964                    $idesc += $x * ($repeat_factor - 1);
2965                    $self->_spew(4, "Delayed repetition ($id $_ -> @r)") if $Spew;
2966                } else {
2967                    $self->_spew(4, "Delayed replication ($id $_ -> @r)") if $Spew;
2968                }
2969                if ($idesc < @desc) {
2970                    redo D_LOOP;
2971                } else {
2972                    last D_LOOP; # Might happen if delayed factor is 0
2973                }
2974
2975            } elsif ($f == 2) {
2976                my $flow;
2977                my $bm_idesc;
2978                ($pos, $flow, $bm_idesc, @operators)
2979                    = $self->_apply_operator_descriptor($id, $x, $y, $pos, $isub,
2980                                                        $desc[$idesc+1], @operators);
2981                if ($flow eq 'redo_bitmap') {
2982                    # Data value is associated with the descriptor
2983                    # defined by bit map. Remember original and new
2984                    # index in descriptor array for the bit mapped
2985                    # values ('dr' = data reference)
2986                    my $dr_idesc;
2987                    if (!defined $bm_idesc) {
2988                        $dr_idesc = shift @{$self->{REUSE_BITMAP}->[$isub]};
2989                    } elsif (!$Show_all_operators) {
2990                        $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
2991                            + $bm_idesc;
2992                    } else {
2993                        $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}];
2994                        # Skip operator descriptors
2995                        while ($bm_idesc-- > 0) {
2996                            $dr_idesc++;
2997                            $dr_idesc++ while ($desc[$dr_idesc] >= 200000);
2998                        }
2999                    }
3000                    push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[$isub] },
3001                         $dr_idesc, $idesc;
3002                    if ($Show_all_operators) {
3003                        push @{$subset_desc[$isub]}, $id;
3004                        push @{$subset_data[$isub]}, '';
3005                    }
3006                    $desc[$idesc] = $desc[$dr_idesc];
3007                    redo D_LOOP;
3008                } elsif ($flow eq 'signify_character') {
3009                    push @{$subset_desc[$isub]}, $id;
3010                    # Extract ASCII string
3011                    my $value = bitstream2ascii($bitstream, $pos, $y);
3012                    $pos += 8*$y;
3013                    # Trim string, also removing nulls
3014                    $value = _trim($value, $id);
3015                    push @{$subset_data[$isub]}, $value;
3016                    next D_LOOP;
3017                } elsif ($flow eq 'no_value') {
3018                    # Some operator descriptors ought to be included
3019                    # in expanded descriptors even though they have no
3020                    # corresponding data value, because they contain
3021                    # valuable information to be displayed in
3022                    # dumpsection4 (e.g. 222000 'Quality information follows')
3023                    push @{$subset_desc[$isub]}, $id;
3024                    push @{$subset_data[$isub]}, '';
3025                    next D_LOOP;
3026                }
3027
3028                if ($Show_all_operators) {
3029                    push @{$subset_desc[$isub]}, $id;
3030                    push @{$subset_data[$isub]}, '';
3031                } else {
3032                    # Remove operator descriptor from @desc
3033                    splice @desc, $idesc--, 1;
3034                }
3035
3036                next D_LOOP if $flow eq 'next';
3037                last D_LOOP if $flow eq 'last';
3038                if ($flow eq 'skip') {
3039                    $idesc++;
3040                    next D_LOOP;
3041                }
3042            }
3043
3044            if ($self->{CHANGE_REFERENCE_VALUE}) {
3045                # The data descriptor is to be associated with a new
3046                # reference value, which is fetched from data stream
3047                _croak "Change reference operator 203Y is not followed by element"
3048                    . " descriptor, but $id" if $f > 0;
3049                my $num_bits = $self->{CHANGE_REFERENCE_VALUE};
3050                my $new_refval = bitstream2dec($bitstream, $pos, $num_bits);
3051                $pos += $num_bits;
3052                # Negative value if most significant bit is set (one's complement)
3053                $new_refval = $new_refval & (1<<$num_bits-1)
3054                    ? -($new_refval & ((1<<$num_bits-1)-1))
3055                        : $new_refval;
3056                $self->_spew(4, "$id * Change reference value: ".
3057                             ($new_refval > 0 ? "+" : "")."$new_refval") if $Spew;
3058                $self->{NEW_REFVAL_OF}{$id}{$isub} = $new_refval;
3059                # Identify new reference values by setting f=9
3060                push @{$subset_desc[$isub]}, $id + 900000;
3061                push @{$subset_data[$isub]}, $new_refval;
3062                next D_LOOP;
3063            }
3064
3065            # If operator 204$y 'Add associated field is in effect',
3066            # each data value is preceded by $y bits which should be
3067            # decoded separately. We choose to provide a descriptor
3068            # 999999 in this case (like the ECMWF BUFRDC software)
3069            if ($self->{ADD_ASSOCIATED_FIELD} and $id ne '031021') {
3070                # First extract associated field
3071                my $width = $self->{ADD_ASSOCIATED_FIELD};
3072                my $value = bitstream2dec($bitstream, $pos, $width);
3073                # All bits set to 1 for associated field is NOT
3074                # interpreted as missing value
3075                $value = 2**$width - 1 if ! defined $value;
3076                $pos += $width;
3077                push @{$subset_desc[$isub]}, 999999;
3078                push @{$subset_data[$isub]}, $value;
3079                $self->_spew(4, "Added associated field: %s", $value) if $Spew;
3080            }
3081
3082            # We now have a "real" data descriptor
3083            push @{$subset_desc[$isub]}, $id;
3084
3085            # For quality information, if this relates to a bit map we
3086            # need to store index of the data ($data_idesc) for which
3087            # the quality information applies, as well as the new
3088            # index ($idesc) in the descriptor array for the bit
3089            # mapped values
3090            if (substr($id,0,3) eq '033'
3091                && defined $self->{BITMAP_OPERATORS}
3092                && $self->{BITMAP_OPERATORS}->[-1] eq '222000') {
3093                if (defined $self->{REUSE_BITMAP}) {
3094                    my $data_idesc = shift @{ $self->{REUSE_BITMAP}->[$isub] };
3095                    _croak "$id: Not enough quality values provided"
3096                        if not defined $data_idesc;
3097                    push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[$isub] },
3098                         $data_idesc, $idesc;
3099                } else {
3100                    my $data_idesc = shift @{ $self->{CURRENT_BITMAP} };
3101                    _croak "$id: Not enough quality values provided"
3102                        if not defined $data_idesc;
3103                    push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[$isub] },
3104                         $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
3105                             + $data_idesc, $idesc;
3106                }
3107            }
3108
3109            # Find the relevant entry in BUFR table B
3110            _croak "Data descriptor $id is not present in BUFR table B"
3111                unless exists $B_table->{$id};
3112            my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$id};
3113            $self->_spew(3, "%6s  %-20s  %s", $id, $unit, $name) if $Spew;
3114
3115            # Override Table B values if Data Description Operators are in effect
3116            if ($self->{NUM_CHANGE_OPERATORS} > 0) {
3117                if ($unit ne 'CCITTIA5' && $unit !~ /^(CODE|FLAG)/) {
3118                    if (defined $self->{CHANGE_SRW}) {
3119                        $scale += $self->{CHANGE_SRW};
3120                        $width += int((10*$self->{CHANGE_SRW}+2)/3);
3121                        $refval *= 10*$self->{CHANGE_SRW};
3122                    } else {
3123                        $scale += $self->{CHANGE_SCALE} if defined $self->{CHANGE_SCALE};
3124                        $width += $self->{CHANGE_WIDTH} if defined $self->{CHANGE_WIDTH};
3125                    }
3126                } elsif ($unit eq 'CCITTIA5' && defined $self->{CHANGE_CCITTIA5_WIDTH}) {
3127                    $width = $self->{CHANGE_CCITTIA5_WIDTH}
3128                }
3129                # To prevent autovivification (see perldoc -f exists) we
3130                # need this laborious test for defined
3131                $refval = $self->{NEW_REFVAL_OF}{$id}{$isub} if defined $self->{NEW_REFVAL_OF}{$id}
3132                    && defined $self->{NEW_REFVAL_OF}{$id}{$isub};
3133                # Difference statistical values use different width and reference value
3134                if ($self->{DIFFERENCE_STATISTICAL_VALUE}) {
3135                    $width += 1;
3136                    $refval = -2**$width;
3137                    undef $self->{DIFFERENCE_STATISTICAL_VALUE};
3138                    $self->{NUM_CHANGE_OPERATORS}--;
3139                }
3140            }
3141            _croak "$id Data width <= 0" if $width <= 0;
3142
3143            my $value;
3144            if ($unit eq 'CCITTIA5') {
3145                # Extract ASCII string
3146                _croak "Width for unit CCITTIA5 must be integer bytes\n"
3147                    . "is $width bits for descriptor $id" if $width % 8;
3148                $value = bitstream2ascii($bitstream, $pos, $width/8);
3149                $self->_spew(3, "  %s", defined $value ? $value : 'missing') if $Spew;
3150                # Trim string, also removing nulls
3151                $value = _trim($value, $id);
3152            } else {
3153                $value = bitstream2dec($bitstream, $pos, $width);
3154                if (defined $value) {
3155                    # Compute and format decoded value
3156                    ($scale) = $scale =~ /(-?\d+)/; # untaint
3157                    $value = $scale <= 0 ? ($value + $refval)/10**$scale
3158                        : sprintf "%.${scale}f", ($value + $refval)/10**$scale;
3159                }
3160                $self->_spew(3, "  %s", defined $value ? $value : 'missing') if $Spew;
3161            }
3162            $pos += $width;
3163            push @{$subset_data[$isub]}, $value;
3164            # $value = undef if missing value
3165
3166            if ($repeat_X) {
3167                # Delayed repetition factor (030011/030012) is in
3168                # effect, so descriptors and data are to be repeated
3169                push @repeat_desc, $id;
3170                push @repeat_data, $value;
3171                if (--$repeat_X == 0) {
3172                    # Store $repeat_factor repetitions of data and descriptors
3173                    # (one repetition has already been included)
3174                    while (--$repeat_factor) {
3175                        push @{$subset_desc[$isub]}, @repeat_desc;
3176                        push @{$subset_data[$isub]}, @repeat_data;
3177                    }
3178                    @repeat_desc = ();
3179                    @repeat_data = ();
3180                }
3181            }
3182
3183            if ($id eq '031031' and $self->{BUILD_BITMAP}) {
3184                # Store the index of expanded descriptors if data is
3185                # marked as present in data present indicator: 0 is
3186                # 'present', 1 (undef value) is 'not present'. E.g.
3187                # bitmap = 1100110 => (2,3,6) is stored in $self->{CURRENT_BITMAP}
3188                if (defined $value) {
3189                    push @{$self->{CURRENT_BITMAP}}, $self->{BITMAP_INDEX};
3190                }
3191                $self->{BITMAP_INDEX}++;
3192                if ($self->{BACKWARD_DATA_REFERENCE} == $self->{NUM_BITMAPS}) {
3193                    my $numb = $self->{NUM_BITMAPS};
3194                    if (!defined $self->{BITMAP_START}[$numb]) {
3195                        # Look up the element descriptor immediately
3196                        # preceding the bitmap operator
3197                        my $i = $idesc;
3198                        $i-- while ($desc[$i] ne $self->{BITMAP_OPERATORS}->[-1]
3199                                    && $i >=0);
3200                        $i-- while ($desc[$i] > 100000 && $i >=0);
3201                        _croak "No element descriptor preceding bitmap" if $i < 0;
3202                        $self->{BITMAP_START}[$numb] = $i;
3203                    } else {
3204                        $self->{BITMAP_START}[$numb]--;
3205                        _croak "Bitmap too big"
3206                            if $self->{BITMAP_START}[$numb] < 0;
3207                    }
3208                }
3209            } elsif ($self->{BUILD_BITMAP} and $self->{BITMAP_INDEX} > 0) {
3210                # We have finished building the bit map
3211                $self->{BUILD_BITMAP} = 0;
3212                $self->{BITMAP_INDEX} = 0;
3213                if ($self->{BACKWARD_DATA_REFERENCE} != $self->{NUM_BITMAPS}) {
3214                    $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
3215                        = $self->{BITMAP_START}[$self->{BACKWARD_DATA_REFERENCE}];
3216                }
3217            }
3218        } # End D_LOOP
3219    } # END S_LOOP
3220
3221    # Check that length of section 4 corresponds to what expected from section 3
3222    $self->_check_section4_length($pos,$maxpos);
3223
3224    $self->{DATA} = \@subset_data;
3225    $self->{DESC} = \@subset_desc;
3226    return;
3227}
3228
3229## Decode bitstream (data part of section 4 encoded using BUFR
3230## compression) while working through the (expanded) descriptors in
3231## section 3. The final data and corresponding descriptors are put in
3232## $self->{DATA} and $self->{DESC} (the data indexed by subset number)
3233sub _decompress_bitstream {
3234    my $self = shift;
3235    $self->{CODING} = 'DECODE';
3236    my $bitstream = $self->{SEC4_RAWDATA}."\0\0\0\0";
3237    my $nsubsets = $self->{NUM_SUBSETS};
3238    my $B_table = $self->{B_TABLE};
3239    my $maxpos = 8*length($self->{SEC4_RAWDATA});
3240    my $pos = 0;
3241    my @operators;
3242    my @subset_data;     # Will contain data values for subset 1,2...,
3243                         # i.e. $subset[$i] is a reference to an array
3244                         # containing the data values for subset $i
3245    my @desc_exp;        # Will contain the set of descriptors for one
3246                         # subset, expanded to be in one to one
3247                         # correspondance with the data, i.e. element
3248                         # descriptors only
3249    my $repeat_X; # Set to number of descriptors to be repeated if
3250                  # delayed descriptor and data repetition factor is
3251                  # in effect. Will be decremented while (repeated)
3252                  # data sets are extracted
3253    my $repeat_XX; # Like $repeat_X, but will not be decremented
3254    my $repeat_factor; # Set to number of times descriptors (and data)
3255                       # are to be repeated if delayed descriptor and
3256                       # data repetition factor is in effect
3257    my @repeat_desc; # The descriptors to be repeated
3258    my @repeat_data; # The data to be repeated (reference to an array
3259                     # containing the data values for subset $i)
3260
3261    _complain("Compression set in section 1 for one subset message")
3262        if $nsubsets == 1;
3263
3264    $#subset_data = $nsubsets;
3265
3266    my @desc = split /\s/, $self->{DESCRIPTORS_EXPANDED};
3267    # This will be further expanded to be in one to one correspondance
3268    # with the data, taking replication and table C operators into account
3269
3270    # All subsets in a compressed BUFR message must have exactly the same
3271    # fully expanded section 3, i.e. all replications factors must be the same
3272    # in all subsets. So, as opposed to noncompressed messages, it is enough
3273    # to run through the set of descriptors once.
3274  D_LOOP: for (my $idesc = 0; $idesc < @desc; $idesc++) {
3275        my $id = $desc[$idesc];
3276        my $f = substr($id,0,1);
3277        my $x = substr($id,1,2)+0;
3278        my $y = substr($id,3,3)+0;
3279
3280        if ($f == 1) {
3281            if ($Show_replication) {
3282                push @desc_exp, $id;
3283                foreach my $isub (1..$nsubsets) {
3284                    push @{$subset_data[$isub]}, '';
3285                }
3286                $self->_spew(4, "X=0 in $id for F=1, might have been > 99 in expansion")
3287                    if $Spew && $x == 0;
3288            }
3289            next D_LOOP if $y > 0; # Nothing more to do for normal replication
3290
3291            if ($x == 0) {
3292                _complain("Nonsensical replication of zero descriptors ($id)");
3293                $idesc++;
3294                next D_LOOP;
3295            }
3296
3297            $_ = $desc[$idesc+1];
3298            _croak "$id Erroneous replication factor"
3299                unless /^0310(00|01|02|11|12)/ && exists $B_table->{$_};
3300
3301            my $width = (split /\0/, $B_table->{$_})[-1];
3302            my $factor = bitstream2dec($bitstream, $pos, $width);
3303            $pos += $width + 6; # 6 bits for the bit count (which we
3304                                # skip because we know it has to be 0
3305                                # for delayed replication)
3306            # Delayed descriptor replication factors (and associated
3307            # fields) are the only values in section 4 where all bits
3308            # being 1 is not interpreted as a missing value
3309            if (not defined $factor) {
3310                $factor = 2**$width - 1;
3311            }
3312            # Include the delayed replication in descriptor and data list
3313            push @desc_exp, $_;
3314            splice @desc, $idesc++, 0, $_;
3315            foreach my $isub (1..$nsubsets) {
3316                push @{$subset_data[$isub]}, $factor;
3317            }
3318
3319            if ($_ eq '031011' || $_ eq '031012') {
3320                # For delayed repetition, descriptor *and* data is
3321                # to be repeated
3322                $repeat_X = $repeat_XX = $x;
3323                $repeat_factor = $factor;
3324                $self->_spew(4, "$_  Delayed repetition factor: $factor") if $Spew;
3325            } else {
3326                $self->_spew(4, "$_  Delayed replication factor: $factor") if $Spew;
3327            }
3328            my @r = ();
3329            push @r, @desc[($idesc+2)..($idesc+$x+1)] while $factor--;
3330            splice @desc, $idesc, 2+$x, @r;
3331            if ($Spew) {
3332                if ($repeat_factor) {
3333                    $self->_spew(4, "$_  Delayed repetition ($id $_ -> @r)");
3334                } else {
3335                    $self->_spew(4, "$_  Delayed replication ($id $_ -> @r)");
3336                }
3337            }
3338
3339            if ($idesc < @desc) {
3340                redo D_LOOP;
3341            } else {
3342                last D_LOOP; # Might happen if delayed factor is 0
3343            }
3344
3345        } elsif ($f == 2) {
3346            my $flow;
3347            my $bm_idesc;
3348            ($pos, $flow, $bm_idesc, @operators)
3349                = $self->_apply_operator_descriptor($id, $x, $y, $pos, 0,
3350                                                    $desc[$idesc+1], @operators);
3351            if ($flow eq 'redo_bitmap') {
3352                # Data value is associated with the descriptor
3353                # defined by bit map. Remember original and new
3354                # index in descriptor array for the bit mapped
3355                # values ('dr' = data reference)
3356                my $dr_idesc;
3357                if (!defined $bm_idesc) {
3358                    $dr_idesc = shift @{ $self->{REUSE_BITMAP}->[0] };
3359                } elsif (!$Show_all_operators) {
3360                    $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
3361                        + $bm_idesc;
3362                } else {
3363                    $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}];
3364                    # Skip operator descriptors
3365                    while ($bm_idesc-- > 0) {
3366                        $dr_idesc++;
3367                        $dr_idesc++ while ($desc[$dr_idesc] >= 200000);
3368                    }
3369                }
3370                push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[0] },
3371                     $dr_idesc, $idesc;
3372                if ($Show_all_operators) {
3373                    push @desc_exp, $id;
3374                    foreach my $isub (1..$nsubsets) {
3375                        push @{$subset_data[$isub]}, '';
3376                    }
3377                }
3378                $desc[$idesc] = $desc[$dr_idesc];
3379                redo D_LOOP;
3380            } elsif ($flow eq 'signify_character') {
3381                push @desc_exp, $id;
3382                $pos = $self->_extract_compressed_value($id, $idesc, $pos, $bitstream,
3383                                                $nsubsets, \@subset_data);
3384                next D_LOOP;
3385            } elsif ($flow eq 'no_value') {
3386                # Some operator descriptors ought to be included
3387                # in expanded descriptors even though they have no
3388                # corresponding data value, because they contain
3389                # valuable information to be displayed in
3390                # dumpsection4 (e.g. 222000 'Quality information follows')
3391                push @desc_exp, $id;
3392                foreach my $isub (1..$nsubsets) {
3393                    push @{$subset_data[$isub]}, '';
3394                }
3395                next D_LOOP;
3396            }
3397
3398            if ($Show_all_operators) {
3399                push @desc_exp, $id;
3400                foreach my $isub (1..$nsubsets) {
3401                    push @{$subset_data[$isub]}, '';
3402                }
3403            } else {
3404                # Remove operator descriptor from @desc
3405                splice @desc, $idesc--, 1;
3406            }
3407
3408            next D_LOOP if $flow eq 'next';
3409            last D_LOOP if $flow eq 'last';
3410            if ($flow eq 'skip') {
3411                $idesc++;
3412                next D_LOOP;
3413            }
3414        }
3415
3416        if ($self->{CHANGE_REFERENCE_VALUE}) {
3417            # The data descriptor is to be associated with a new
3418            # reference value, which is fetched from data stream
3419            _croak "Change reference operator 203Y is not followed by element"
3420                . " descriptor, but $id" if $f > 0;
3421            my $num_bits = $self->{CHANGE_REFERENCE_VALUE};
3422            my $new_refval = bitstream2dec($bitstream, $pos, $num_bits);
3423            $pos += $num_bits + 6;
3424            # Negative value if most significant bit is set (one's complement)
3425            $new_refval = $new_refval & (1<<$num_bits-1)
3426                ? -($new_refval & ((1<<$num_bits-1)-1))
3427                    : $new_refval;
3428            $self->_spew(4, "$id * Change reference value: ".
3429                         ($new_refval > 0 ? "+" : "")."$new_refval") if $Spew;
3430            $self->{NEW_REFVAL_OF}{$id} = $new_refval;
3431            # Identify new reference values by setting f=9
3432            push @desc_exp, $id + 900000;
3433            foreach my $isub (1..$nsubsets) {
3434                push @{$subset_data[$isub]}, $new_refval;
3435            }
3436            next D_LOOP;
3437        }
3438
3439        # If operator 204$y 'Add associated field is in effect',
3440        # each data value is preceded by $y bits which should be
3441        # decoded separately. We choose to provide a descriptor
3442        # 999999 in this case (like the ECMWF BUFRDC software)
3443        if ($self->{ADD_ASSOCIATED_FIELD} and $id ne '031021') {
3444            # First extract associated field
3445            push @desc_exp, 999999;
3446            $pos = $self->_extract_compressed_value(999999, $idesc, $pos, $bitstream,
3447                                                    $nsubsets, \@subset_data);
3448        }
3449
3450        # We now have a "real" data descriptor, so add it to the descriptor list
3451        push @desc_exp, $id;
3452
3453        $pos = $self->_extract_compressed_value($id, $idesc, $pos, $bitstream,
3454                                                $nsubsets, \@subset_data, \@desc);
3455        if ($repeat_X) {
3456            # Delayed repetition factor (030011/030012) is in
3457            # effect, so descriptors and data are to be repeated
3458            push @repeat_desc, $id;
3459            foreach my $isub (1..$nsubsets) {
3460                push @{$repeat_data[$isub]}, $subset_data[$isub]->[-1];
3461            }
3462            if (--$repeat_X == 0) {
3463                # Store $repeat_factor repetitions of data and descriptors
3464                # (one repetition has already been included)
3465                while (--$repeat_factor) {
3466                    push @desc_exp, @repeat_desc;
3467                    foreach my $isub (1..$nsubsets) {
3468                        push @{$subset_data[$isub]}, @{$repeat_data[$isub]};
3469                    }
3470                    $idesc += $repeat_XX;
3471                }
3472                @repeat_desc = ();
3473                @repeat_data = ();
3474                $repeat_XX = 0;
3475            }
3476        }
3477    }
3478
3479    # Check that length of section 4 corresponds to what expected from section 3
3480    $self->_check_section4_length($pos,$maxpos);
3481
3482    $self->{DATA} = \@subset_data;
3483    $self->{DESC} = \@desc_exp;
3484    return;
3485}
3486
3487## Extract the data values for descriptor $id (with index $idesc in
3488## the final expanded descriptor array) for each subset, into
3489## $subset_data_ref->[$isub], $isub = 1...$nsubsets (number of
3490## subsets). Extraction starts at position $pos in $bitstream.
3491sub _extract_compressed_value {
3492    my $self = shift;
3493    my ($id, $idesc, $pos, $bitstream, $nsubsets, $subset_data_ref, $desc_ref) = @_;
3494    my $B_table = $self->{B_TABLE};
3495
3496    # For quality information, if this relates to a bit map we
3497    # need to store index of the data ($data_idesc) for which
3498    # the quality information applies, as well as the new
3499    # index ($idesc) in the descriptor array for the bit
3500    # mapped values
3501    if (substr($id,0,3) eq '033'
3502        && defined $self->{BITMAP_OPERATORS}
3503        && $self->{BITMAP_OPERATORS}->[-1] eq '222000') {
3504        if (defined $self->{REUSE_BITMAP}) {
3505            my $data_idesc = shift @{ $self->{REUSE_BITMAP}->[0] };
3506            _croak "$id: Not enough quality values provided"
3507                if not defined $data_idesc;
3508            push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[0] },
3509                 $data_idesc, $idesc;
3510        } else {
3511            my $data_idesc = shift @{ $self->{CURRENT_BITMAP} };
3512            _croak "$id: Not enough quality values provided"
3513                if not defined $data_idesc;
3514            push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[0] },
3515                 $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
3516                     + $data_idesc, $idesc;
3517        }
3518    }
3519
3520    # Find the relevant entry in BUFR table B
3521    my ($name,$unit,$scale,$refval,$width);
3522    if ($id == 999999) {
3523        $name = 'ASSOCIATED FIELD';
3524        $unit = 'NUMERIC';
3525        $scale = 0;
3526        $refval = 0;
3527        $width = $self->{ADD_ASSOCIATED_FIELD};
3528    } elsif ($id =~ /^205(\d\d\d)/) { # Signify character
3529        $name = 'CHARACTER INFORMATION';
3530        $unit = 'CCITTIA5';
3531        $scale = 0;
3532        $refval = 0;
3533        $width = 8*$1;
3534    } else {
3535        _croak "Data descriptor $id is not present in BUFR table B"
3536            if not exists $B_table->{$id};
3537        ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$id};
3538
3539        # Override Table B values if Data Description Operators are in effect
3540        if ($self->{NUM_CHANGE_OPERATORS} > 0) {
3541            if ($unit ne 'CCITTIA5' && $unit !~ /^(CODE|FLAG)/) {
3542                if (defined $self->{CHANGE_SRW}) {
3543                    $scale += $self->{CHANGE_SRW};
3544                    $width += int((10*$self->{CHANGE_SRW}+2)/3);
3545                    $refval *= 10*$self->{CHANGE_SRW};
3546                } else {
3547                    $scale += $self->{CHANGE_SCALE} if defined $self->{CHANGE_SCALE};
3548                    $width += $self->{CHANGE_WIDTH} if defined $self->{CHANGE_WIDTH};
3549                }
3550            } elsif ($unit eq 'CCITTIA5' && defined $self->{CHANGE_CCITTIA5_WIDTH}) {
3551                $width = $self->{CHANGE_CCITTIA5_WIDTH}
3552            }
3553            $refval = $self->{NEW_REFVAL_OF}{$id} if defined $self->{NEW_REFVAL_OF}{$id};
3554            # Difference statistical values use different width and reference value
3555            if ($self->{DIFFERENCE_STATISTICAL_VALUE}) {
3556                $width += 1;
3557                $refval = -2**$width;
3558                undef $self->{DIFFERENCE_STATISTICAL_VALUE};
3559                $self->{NUM_CHANGE_OPERATORS}--;
3560            }
3561        }
3562    }
3563    $self->_spew(3, "%6s  %-20s   %s", $id, $unit, $name) if $Spew;
3564    _croak "$id Data width <= 0" if $width <= 0;
3565
3566    if ($unit eq 'CCITTIA5') {
3567        # Extract ASCII string ('minimum value')
3568        _croak "Width for unit CCITTIA5 must be integer bytes\n"
3569            . "is $width bits for descriptor $id" if $width % 8;
3570        my $minval = bitstream2ascii($bitstream, $pos, $width/8);
3571        if ($Spew) {
3572            if ($minval eq "\0" x ($width/8)) {
3573                $self->_spew(5, " Local reference value has all bits zero");
3574            } else {
3575                $self->_spew(5, " Local reference value: %s", $minval);
3576            }
3577        }
3578        $pos += $width;
3579        # Extract number of bytes for next subsets
3580        my $deltabytes = bitstream2dec($bitstream, $pos, 6);
3581        $self->_spew(5, " Increment width (bytes): %d", $deltabytes) if $Spew;
3582        $pos += 6;
3583        if ($deltabytes && defined $minval) {
3584            # Extract compressed data for all subsets. According
3585            # to 94.6.3 (2) (i) in FM 94 BUFR, the first value for
3586            # character data shall be set to all bits zero
3587            my $nbytes = $width/8;
3588            _complain("Local reference value for compressed CCITTIA5 data "
3589                      . "hasn't all bits set to zero, but is '$minval'")
3590                if $Strict_checking and $minval ne "\0" x $nbytes;
3591            my $incr_values;
3592            foreach my $isub (1..$nsubsets) {
3593                my $string = bitstream2ascii($bitstream, $pos, $deltabytes);
3594                if ($Spew) {
3595                    $incr_values .= defined $string ? "$string," : ',';
3596                }
3597                # Trim string, also removing nulls
3598                $string = _trim($string, $id);
3599                push @{$subset_data_ref->[$isub]}, $string;
3600                $pos += 8*$deltabytes;
3601            }
3602            if ($Spew) {
3603                chop $incr_values;
3604                $self->_spew(5, " Increment values: %s", $incr_values);
3605            }
3606        } else {
3607            # If min value is defined => All subsets set to min value
3608            # If min value is undefined => Data in all subsets are undefined
3609            my $value = defined $minval ? $minval : undef;
3610            # Trim string, also removing nulls
3611            $value = _trim($value, $id);
3612            foreach my $isub (1..$nsubsets) {
3613                push @{$subset_data_ref->[$isub]}, $value;
3614            }
3615            $pos += $nsubsets*8*$deltabytes;
3616        }
3617        $self->_spew(3, "  %s", join ',',
3618             map { defined($subset_data_ref->[$_][-1]) ?
3619                 $subset_data_ref->[$_][-1] : 'missing'} 1..$nsubsets) if $Spew;
3620    } else {
3621        # Extract minimum value
3622        my $minval = bitstream2dec($bitstream, $pos, $width);
3623        $minval += $refval if defined $minval;
3624        $pos += $width;
3625        $self->_spew(5, " Local reference value: %d", $minval) if $Spew && defined $minval;
3626
3627        # Extract number of bits for next subsets
3628        my $deltabits = bitstream2dec($bitstream, $pos, 6);
3629        $pos += 6;
3630        $self->_spew(5, " Increment width (bits): %d", $deltabits) if $Spew;
3631
3632        if ($deltabits && defined $minval) {
3633            # Extract compressed data for all subsets
3634            my $incr_values;
3635            foreach my $isub (1..$nsubsets) {
3636                my $value = bitstream2dec($bitstream, $pos, $deltabits);
3637                _complain("value " . ($value + $minval) . " in subset $isub for "
3638                          . "$id too big to be encoded without compression")
3639                    if ($Strict_checking && defined $value &&
3640                        ($value + $minval) > 2**$width);
3641                $incr_values .= defined $value ? "$value," : ',' if $Spew;
3642                if (defined $value) {
3643                    # Compute and format decoded value
3644                    ($scale) = $scale =~ /(-?\d+)/; # untaint
3645                    $value = $scale <= 0 ? ($value + $minval)/10**$scale
3646                        : sprintf "%.${scale}f", ($value + $minval)/10**$scale;
3647                }
3648                # All bits set to 1 for associated field is NOT
3649                # interpreted as missing value
3650                if ($id == 999999 and ! defined $value) {
3651                    $value = 2**$width - 1;
3652                }
3653                push @{$subset_data_ref->[$isub]}, $value;
3654                $pos += $deltabits;
3655            }
3656            if ($Spew) {
3657                chop $incr_values;
3658                $self->_spew(5, " Increment values: %s", $incr_values);
3659            }
3660        } else {
3661            # If minimum value is defined => All subsets set to minimum value
3662            # If minimum value is undefined => Data in all subsets are undefined
3663            my $value;
3664            if (defined $minval) {
3665                # Compute and format decoded value
3666                ($scale) = $scale =~ /(-?\d+)/; # untaint
3667                $value = $scale <= 0 ? $minval/10**$scale
3668                    : sprintf "%.${scale}f", $minval/10**$scale;
3669            }
3670            # Exception: all bits set to 1 for associated field is NOT
3671            # interpreted as missing value
3672            if ($id == 999999 and ! defined $value) {
3673                $value = 2**$width - 1;
3674            }
3675            foreach my $isub (1..$nsubsets) {
3676                push @{$subset_data_ref->[$isub]}, $value;
3677            }
3678            $pos += $nsubsets*$deltabits if defined $deltabits;
3679        }
3680
3681        # Bit maps need special treatment. We are only able to
3682        # handle those where all subsets have exactly the same
3683        # bit map with the present method.
3684        if ($id eq '031031' and $self->{BUILD_BITMAP}) {
3685            _croak "$id: Unable to handle bit maps which differ between subsets"
3686                . " in compressed data" if $deltabits;
3687            # Store the index of expanded descriptors if data is
3688            # marked as present in data present indicator: 0 is
3689            # 'present', 1 (undef value) is 'not present'
3690            if (defined $minval) {
3691                push @{$self->{CURRENT_BITMAP}}, $self->{BITMAP_INDEX};
3692            }
3693            $self->{BITMAP_INDEX}++;
3694            if ($self->{BACKWARD_DATA_REFERENCE} == $self->{NUM_BITMAPS}) {
3695                my $numb = $self->{NUM_BITMAPS};
3696                if (!defined $self->{BITMAP_START}[$numb]) {
3697                    # Look up the element descriptor immediately
3698                    # preceding the bitmap operator
3699                    my $i = $idesc;
3700                    $i-- while ($desc_ref->[$i] ne $self->{BITMAP_OPERATORS}->[-1]
3701                                && $i >=0);
3702                    $i-- while ($desc_ref->[$i] > 100000 && $i >=0);
3703                    _croak "No element descriptor preceding bitmap" if $i < 0;
3704                    $self->{BITMAP_START}[$numb] = $i;
3705                } else {
3706                    if ($Show_all_operators) {
3707                        my $i = $self->{BITMAP_START}[$numb] - 1;
3708                        $i-- while ($desc_ref->[$i] > 100000 && $i >=0);
3709                        $self->{BITMAP_START}[$numb] = $i;
3710                    } else {
3711                        $self->{BITMAP_START}[$numb]--;
3712                    }
3713                    _croak "Bitmap too big"
3714                        if $self->{BITMAP_START}[$numb] < 0;
3715                }
3716            }
3717        } elsif ($self->{BUILD_BITMAP} and $self->{BITMAP_INDEX} > 0) {
3718            # We have finished building the bit map
3719            $self->{BUILD_BITMAP} = 0;
3720            $self->{BITMAP_INDEX} = 0;
3721            if ($self->{BACKWARD_DATA_REFERENCE} != $self->{NUM_BITMAPS}) {
3722                $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
3723                    = $self->{BITMAP_START}[$self->{BACKWARD_DATA_REFERENCE}];
3724            }
3725        }
3726        $self->_spew(3, "  %s", join ' ',
3727             map { defined($subset_data_ref->[$_][-1]) ?
3728                 $subset_data_ref->[$_][-1] : 'missing'} 1..$nsubsets) if $Spew;
3729    }
3730    return $pos;
3731}
3732
3733## Takes a text $decoded_message as argument and returns BUFR messages
3734## which would give the same output as $decoded_message when running
3735## dumpsection0(), dumpsection1(), dumpsection3() and dumpsection4() in
3736## turn on each of the reencoded BUFR messages
3737sub reencode_message {
3738    my $self = shift;
3739    my $decoded_message = shift;
3740    my $width = shift || 15;    # Optional argument
3741    # Data values usually start at column 31, but if a $width
3742    # different from 15 was used in dumpsection4 you should use the
3743    # same value here
3744
3745    my @lines = split /\n/, $decoded_message;
3746    my $bufr_messages = '';
3747    my $i = 0;
3748
3749  MESSAGE: while ($i < @lines) {
3750        # Some tidying after decoding of previous message might be
3751        # necessary
3752        $self->{NUM_CHANGE_OPERATORS} = 0;
3753        undef $self->{CHANGE_WIDTH};
3754        undef $self->{CHANGE_CCITTIA5_WIDTH};
3755        undef $self->{CHANGE_SCALE};
3756        undef $self->{CHANGE_REFERENCE_VALUE};
3757        undef $self->{NEW_REFVAL_OF};
3758        undef $self->{CHANGE_SRW};
3759        undef $self->{ADD_ASSOCIATED_FIELD};
3760        undef $self->{BITMAPS};
3761        undef $self->{BITMAP_OPERATORS};
3762        undef $self->{REUSE_BITMAP};
3763        $self->{NUM_BITMAPS} = 0;
3764        # $self->{LOCAL_USE} is always set for BUFR edition < 4 in _encode_sec1
3765        undef $self->{LOCAL_USE};
3766
3767        # Extract section 0 info
3768        $i++ while $lines[$i] !~ /^Section 0/ and $i < @lines-1;
3769        last MESSAGE if $i >= @lines-1; # Not containing any decoded BUFR message
3770        $i++; # Skip length of BUFR message
3771        ($self->{BUFR_EDITION}) = $lines[++$i]
3772            =~ /BUFR edition:\s+(\d+)/;
3773        _croak "BUFR edition number not provided or is not a number"
3774            unless defined $self->{BUFR_EDITION};
3775
3776        # Extract section 1 info
3777        $i++ while $lines[$i] !~ /^Section 1/;
3778        _croak "reencode_message: Don't find decoded section 1" if $i >= @lines;
3779        $i++; # Skip length of section 1
3780        if ($self->{BUFR_EDITION} < 4 ) {
3781            ($self->{MASTER_TABLE}) = $lines[++$i]
3782                =~ /BUFR master table:\s+(\d+)/;
3783            ($self->{SUBCENTRE}) = $lines[++$i]
3784                =~ /Originating subcentre:\s+(\d+)/;
3785            ($self->{CENTRE}) = $lines[++$i]
3786                =~ /Originating centre:\s+(\d+)/;
3787            ($self->{UPDATE_NUMBER}) = $lines[++$i]
3788                =~ /Update sequence number:\s+(\d+)/;
3789            ($self->{OPTIONAL_SECTION}) = $lines[++$i]
3790                =~ /Optional section present:\s+(\d+)/;
3791            ($self->{DATA_CATEGORY}) = $lines[++$i]
3792                =~ /Data category \(table A\):\s+(\d+)/;
3793            ($self->{DATA_SUBCATEGORY}) = $lines[++$i]
3794                =~ /Data subcategory:\s+(\d+)/;
3795            ($self->{MASTER_TABLE_VERSION}) = $lines[++$i]
3796                =~ /Master table version number:\s+(\d+)/;
3797            ($self->{LOCAL_TABLE_VERSION}) = $lines[++$i]
3798                =~ /Local table version number:\s+(\d+)/;
3799            ($self->{YEAR_OF_CENTURY}) = $lines[++$i]
3800                =~ /Year of century:\s+(\d+)/;
3801            ($self->{MONTH}) = $lines[++$i]
3802                =~ /Month:\s+(\d+)/;
3803            ($self->{DAY}) = $lines[++$i]
3804                =~ /Day:\s+(\d+)/;
3805            ($self->{HOUR}) = $lines[++$i]
3806                =~ /Hour:\s+(\d+)/;
3807            ($self->{MINUTE}) = $lines[++$i]
3808                =~ /Minute:\s+(\d+)/;
3809            _croak "reencode_message: Something seriously wrong in decoded section 1"
3810                unless defined $self->{MINUTE};
3811        } elsif ($self->{BUFR_EDITION} == 4) {
3812            ($self->{MASTER_TABLE}) = $lines[++$i]
3813                =~ /BUFR master table:\s+(\d+)/;
3814            ($self->{CENTRE}) = $lines[++$i]
3815                =~ /Originating centre:\s+(\d+)/;
3816            ($self->{SUBCENTRE}) = $lines[++$i]
3817                =~ /Originating subcentre:\s+(\d+)/;
3818            ($self->{UPDATE_NUMBER}) = $lines[++$i]
3819                =~ /Update sequence number:\s+(\d+)/;
3820            ($self->{OPTIONAL_SECTION}) = $lines[++$i]
3821                =~ /Optional section present:\s+(\d+)/;
3822            ($self->{DATA_CATEGORY}) = $lines[++$i]
3823                =~ /Data category \(table A\):\s+(\d+)/;
3824            ($self->{INT_DATA_SUBCATEGORY}) = $lines[++$i]
3825                =~ /International data subcategory:\s+(\d+)/;
3826            ($self->{LOC_DATA_SUBCATEGORY}) = $lines[++$i]
3827                =~ /Local data subcategory:\s+(\d+)/;
3828            ($self->{MASTER_TABLE_VERSION}) = $lines[++$i]
3829                =~ /Master table version number:\s+(\d+)/;
3830            ($self->{LOCAL_TABLE_VERSION}) = $lines[++$i]
3831                =~ /Local table version number:\s+(\d+)/;
3832            ($self->{YEAR}) = $lines[++$i]
3833                =~ /Year:\s+(\d+)/;
3834            ($self->{MONTH}) = $lines[++$i]
3835                =~ /Month:\s+(\d+)/;
3836            ($self->{DAY}) = $lines[++$i]
3837                =~ /Day:\s+(\d+)/;
3838            ($self->{HOUR}) = $lines[++$i]
3839                =~ /Hour:\s+(\d+)/;
3840            ($self->{MINUTE}) = $lines[++$i]
3841                =~ /Minute:\s+(\d+)/;
3842            ($self->{SECOND}) = $lines[++$i]
3843                =~ /Second:\s+(\d+)/;
3844            _croak "reencode_message: Something seriously wrong in decoded section 1"
3845                unless defined $self->{SECOND};
3846        }
3847
3848        # Extract section 3 info
3849        $i++ while $lines[$i] !~ /^Section 3/;
3850        _croak "reencode_message: Don't find decoded section 3" if $i >= @lines;
3851        $i++; # Skip length of section 3
3852
3853        ($self->{NUM_SUBSETS}) = $lines[++$i]
3854            =~ /Number of data subsets:\s+(\d+)/;
3855        _croak "Don't support reencoding of 0 subset message"
3856            if $self->{NUM_SUBSETS} == 0;
3857        ($self->{OBSERVED_DATA}) = $lines[++$i]
3858            =~ /Observed data:\s+(\d+)/;
3859        ($self->{COMPRESSED_DATA}) = $lines[++$i]
3860            =~ /Compressed data:\s+(\d+)/;
3861        ($self->{DESCRIPTORS_UNEXPANDED}) = $lines[++$i]
3862            =~ /Data descriptors unexpanded:\s+(\d+.*)/;
3863        _croak "reencode_message: Something seriously wrong in decoded section 3"
3864            unless defined $self->{DESCRIPTORS_UNEXPANDED};
3865
3866        # Extract data values to use in section 4
3867        my ($data_refs, $desc_refs);
3868        my $subset = 0;
3869      SUBSET: while ($i < @lines-1) {
3870            $_ = $lines[++$i];
3871            next SUBSET if /^$/ or /^Subset/;
3872            last SUBSET if /^Message/;
3873            $_ = substr $_, 0, $width + 16;
3874            s/^\s+//;
3875            next SUBSET if not /^\d/;
3876            my ($n, $desc, $value) = split /\s+/, $_, 3;
3877            $subset++ if $n == 1;
3878            if (defined $value) {
3879                $value =~ s/\s+$//;
3880                $value = undef if $value eq '' or $value eq 'missing';
3881            } else {
3882                # Some descriptors are not numbered (like 222000)
3883                $desc = $n;
3884                $value = '';
3885            }
3886            push @{$data_refs->[$subset]}, $value;
3887            push @{$desc_refs->[$subset]}, $desc;
3888        }
3889
3890        # If optional section is present, pretend it is not, because we
3891        # are not able to encode this section
3892        if ($self->{OPTIONAL_SECTION}) {
3893            $self->{OPTIONAL_SECTION} = 0;
3894            carp "Warning: 'Optional section present' changed from 1 to 0'\n";
3895        }
3896
3897        $bufr_messages .= $self->encode_message($data_refs, $desc_refs);
3898    }
3899
3900    return $bufr_messages;
3901}
3902
3903
3904## Encode a new BUFR message. All relevant metadata
3905## ($self->{BUFR_EDITION} etc) must have been initialized already or
3906## else the _encode_sec routines will croak.
3907sub encode_message {
3908    my $self = shift;
3909    my ($data_refs, $desc_refs) = @_;
3910
3911    _croak "encode_message: No data/descriptors provided" unless $desc_refs;
3912
3913    $self->{MESSAGE_NUMBER}++;
3914    $self->_spew(2, "Encoding message number %d", $self->{MESSAGE_NUMBER}) if $Spew;
3915
3916    $self->load_BDtables();
3917
3918    $self->_spew(2, "Encoding section 1-3") if $Spew;
3919    my $sec1_stream = $self->_encode_sec1();
3920    my $sec2_stream = $self->_encode_sec2();
3921    my $sec3_stream = $self->_encode_sec3();
3922    $self->_spew(2, "Encoding section 4") if $Spew;
3923    my $sec4_stream = $self->_encode_sec4($data_refs, $desc_refs);
3924
3925    # Compute length of whole message and encode section 0
3926    my $msg_len = 8 + length($sec1_stream) + length($sec2_stream)
3927        + length($sec3_stream) + length($sec4_stream) + 4;
3928    my $msg_len_binary = pack("N", $msg_len);
3929    my $bufr_edition_binary = pack('n', $self->{BUFR_EDITION});
3930    my $sec0_stream = 'BUFR' . substr($msg_len_binary,1,3)
3931                             . substr($bufr_edition_binary,1,1);
3932
3933    my $new_message = $sec0_stream . $sec1_stream . $sec2_stream
3934        . $sec3_stream  . $sec4_stream  . '7777';
3935    return $new_message;
3936}
3937
3938## Encode and return section 1
3939sub _encode_sec1 {
3940    my $self = shift;
3941
3942    my $bufr_edition = $self->{BUFR_EDITION} or
3943        _croak "_encode_sec1: BUFR edition not defined";
3944
3945    my @keys = qw( MASTER_TABLE  CENTRE  SUBCENTRE  UPDATE_NUMBER
3946                   OPTIONAL_SECTION  DATA_CATEGORY  MASTER_TABLE_VERSION
3947                   LOCAL_TABLE_VERSION  MONTH  DAY  HOUR  MINUTE );
3948    if ($bufr_edition < 4) {
3949        push @keys, qw( DATA_SUBCATEGORY  YEAR_OF_CENTURY );
3950    } elsif ($bufr_edition == 4) {
3951        push @keys, qw( INT_DATA_SUBCATEGORY  LOC_DATA_SUBCATEGORY  YEAR  SECOND );
3952    }
3953
3954    # Check that the required variables for section 1 are provided
3955    foreach my $key (@keys) {
3956        _croak "_encode_sec1: $key not given"
3957            unless defined $self->{$key};
3958    }
3959
3960    $self->_validate_datetime() if ($Strict_checking);
3961
3962    my $sec1_stream;
3963    # Byte 4-
3964    if ($bufr_edition < 4) {
3965        $self->{LOCAL_USE} = "\0" if !defined $self->{LOCAL_USE};
3966        $sec1_stream = pack 'C14a*',
3967            $self->{MASTER_TABLE},
3968            $self->{SUBCENTRE},
3969            $self->{CENTRE},
3970            $self->{UPDATE_NUMBER},
3971            $self->{OPTIONAL_SECTION} ? 128 : 0,
3972            $self->{DATA_CATEGORY},
3973            $self->{DATA_SUBCATEGORY},
3974            $self->{MASTER_TABLE_VERSION},
3975            $self->{LOCAL_TABLE_VERSION},
3976            $self->{YEAR_OF_CENTURY},
3977            $self->{MONTH},
3978            $self->{DAY},
3979            $self->{HOUR},
3980            $self->{MINUTE},
3981            $self->{LOCAL_USE};
3982    } elsif ($bufr_edition == 4) {
3983        $sec1_stream = pack 'CnnC7nC5',
3984            $self->{MASTER_TABLE},
3985            $self->{CENTRE},
3986            $self->{SUBCENTRE},
3987            $self->{UPDATE_NUMBER},
3988            $self->{OPTIONAL_SECTION} ? 128 : 0,
3989            $self->{DATA_CATEGORY},
3990            $self->{INT_DATA_SUBCATEGORY},
3991            $self->{LOC_DATA_SUBCATEGORY},
3992            $self->{MASTER_TABLE_VERSION},
3993            $self->{LOCAL_TABLE_VERSION},
3994            $self->{YEAR},
3995            $self->{MONTH},
3996            $self->{DAY},
3997            $self->{HOUR},
3998            $self->{MINUTE},
3999            $self->{SECOND};
4000        $sec1_stream .= pack 'a*', $self->{LOCAL_USE}
4001            if defined $self->{LOCAL_USE};
4002    }
4003
4004    my $sec1_len = 3 + length $sec1_stream;
4005    if ($bufr_edition < 4) {
4006        # Each section should be an even number of octets
4007        if ($sec1_len % 2) {
4008            $sec1_stream .= "\0";
4009            $sec1_len++;
4010        }
4011    }
4012
4013    # Byte 1-3
4014    my $sec1_len_binary = substr pack("N", $sec1_len), 1, 3;
4015
4016    return $sec1_len_binary . $sec1_stream;
4017}
4018
4019## Encode and return section 2 (empty string if no optional section)
4020sub _encode_sec2 {
4021    my $self = shift;
4022    if ($self->{OPTIONAL_SECTION}) {
4023        _croak "_encode_sec2: No optional section provided"
4024            unless defined  $self->{SEC2_STREAM};
4025        return $self->{SEC2_STREAM};
4026    } else {
4027        return '';
4028    }
4029}
4030
4031## Encode and return section 3
4032sub _encode_sec3 {
4033    my $self = shift;
4034
4035    # Check that the required variables for section 3 are provided
4036    foreach my $key (qw(NUM_SUBSETS OBSERVED_DATA COMPRESSED_DATA
4037                        DESCRIPTORS_UNEXPANDED)) {
4038        _croak "_encode_sec3: $key not given"
4039            unless defined $self->{$key};
4040    }
4041
4042    my @desc = split / /, $self->{DESCRIPTORS_UNEXPANDED};
4043
4044    # Byte 5-6
4045    my $nsubsets_binary = pack "n", $self->{NUM_SUBSETS};
4046
4047    # Byte 7
4048    my $flag = pack 'C', $self->{OBSERVED_DATA}*128 +
4049                         $self->{COMPRESSED_DATA}*64;
4050
4051    # Byte 8-
4052    my $desc_binary = "\0\0" x @desc;
4053    my $pos = 0;
4054    foreach my $desc (@desc) {
4055        my $f = substr($desc,0,1);
4056        my $x = substr($desc,1,2)+0;
4057        my $y = substr($desc,3,3)+0;
4058        dec2bitstream($f, $desc_binary, $pos, 2);
4059        $pos += 2;
4060        dec2bitstream($x, $desc_binary, $pos, 6);
4061        $pos += 6;
4062        dec2bitstream($y, $desc_binary, $pos, 8);
4063        $pos += 8;
4064    }
4065
4066    my $sec3_len = 7 + length $desc_binary;
4067    if ($self->{BUFR_EDITION} < 4) {
4068        # Each section should be an even number of octets
4069        if ($sec3_len % 2) {
4070            $desc_binary .= "\0";
4071            $sec3_len++;
4072        }
4073    }
4074
4075    # Byte 1-4
4076    my $sec3_len_binary = pack("N", $sec3_len);
4077    my $sec3_start = substr($sec3_len_binary, 1, 3) . "\0";
4078
4079    return $sec3_start . $nsubsets_binary . $flag . $desc_binary;
4080}
4081
4082## Encode and return section 4
4083sub _encode_sec4 {
4084    my $self = shift;
4085    my ($data_refs, $desc_refs) = @_;
4086
4087    # Check that dimension of argument arrays agrees with number of
4088    # subsets in section 3
4089    my $nsubsets = $self->{NUM_SUBSETS};
4090    _croak "Wrong number of subsets ($nsubsets) in section 3?\n"
4091        . "Disagrees with dimension of descriptor array used as argument "
4092            . "to encode_message()"
4093                unless @$desc_refs == $nsubsets + 1;
4094
4095    my ($bitstream, $byte_len) = $self->{COMPRESSED_DATA}
4096        ? $self->_encode_compressed_bitstream($data_refs, $desc_refs)
4097            : $self->_encode_bitstream($data_refs, $desc_refs);
4098
4099    my $sec4_len = $byte_len + 4;
4100    my $sec4_len_binary = pack("N", $sec4_len);
4101    my $sec4_stream = substr($sec4_len_binary, 1, 3) . "\0" . $bitstream;
4102
4103    return $sec4_stream;
4104}
4105
4106## Encode a nil message, i.e. all values set to missing except delayed
4107## replication factors and the (descriptor, value) pairs in the hash
4108## ref $stationid_ref. Delayed replication factors will all be set to
4109## 1 unless $delayed_repl_ref is provided, in which case the
4110## descriptors 031001 and 031002 will get the values contained in
4111## @$delayed_repl_ref. Note that data in section 1 and 3 must have
4112## been set before calling this method.
4113sub encode_nil_message {
4114    my $self = shift;
4115    my ($stationid_ref, $delayed_repl_ref) = @_;
4116
4117    _croak "encode_nil_message: No station descriptors provided"
4118        unless $stationid_ref;
4119
4120    my $bufr_edition = $self->{BUFR_EDITION} or
4121        _croak "encode_nil_message: BUFR edition not defined";
4122
4123    # Since a nil message necessarily is a one subset message, some
4124    # metadata might need to be adjusted (saving the user for having
4125    # to remember this)
4126    $self->set_number_of_subsets(1);
4127    $self->set_compressed_data(0);
4128
4129    $self->load_BDtables();
4130
4131    $self->_spew(2, "Encoding NIL message") if $Spew;
4132    my $sec1_stream = $self->_encode_sec1();
4133    my $sec3_stream = $self->_encode_sec3();
4134    my $sec4_stream = $self->_encode_nil_sec4($stationid_ref,
4135                                              $delayed_repl_ref);
4136
4137    # Compute length of whole message and encode section 0
4138    my $msg_len = 8 + length($sec1_stream) + length($sec3_stream)
4139        + length($sec4_stream) + 4;
4140    my $msg_len_binary = pack("N", $msg_len);
4141    my $bufr_edition_binary = pack('n', $bufr_edition);
4142    my $sec0_stream = 'BUFR' . substr($msg_len_binary,1,3)
4143                             . substr($bufr_edition_binary,1,1);
4144
4145    my $new_message = $sec0_stream . $sec1_stream . $sec3_stream . $sec4_stream
4146        . '7777';
4147    return $new_message;
4148}
4149
4150## Encode and return section 4 with all values set to missing except
4151## delayed replication factors and the (descriptor, value) pairs in
4152## the hash ref $stationid_ref. Delayed replication factors will all
4153## be set to 1 unless $delayed_repl_ref is provided, in which case the
4154## descriptors 031001 and 031002 will get the values contained in
4155## @$delayed_repl_ref (in that order).
4156sub _encode_nil_sec4 {
4157    my $self = shift;
4158    $self->{CODING} = 'ENCODE';
4159    my ($stationid_ref, $delayed_repl_ref) = @_;
4160    my @delayed_repl = defined $delayed_repl_ref ? @$delayed_repl_ref : ();
4161
4162    # Get the expanded list of descriptors (i.e. expanded with table D)
4163    if (not $self->{DESCRIPTORS_EXPANDED}) {
4164        _croak "_encode_nil_sec4: DESCRIPTORS_UNEXPANDED not given"
4165            unless $self->{DESCRIPTORS_UNEXPANDED};
4166        my @unexpanded = split / /, $self->{DESCRIPTORS_UNEXPANDED};
4167        _croak "_encode_nil_sec4: D_TABLE not given"
4168            unless $self->{D_TABLE};
4169        my $alias = "$self->{TABLE_VERSION} " . $self->{DESCRIPTORS_UNEXPANDED};
4170        if (exists $Descriptors_already_expanded{$alias}) {
4171            $self->{DESCRIPTORS_EXPANDED} = $Descriptors_already_expanded{$alias};
4172        } else {
4173            $Descriptors_already_expanded{$alias} = $self->{DESCRIPTORS_EXPANDED}
4174                = join " ", _expand_descriptors($self->{D_TABLE}, @unexpanded);
4175        }
4176    }
4177
4178    # The rest is very similar to sub _decode_bitstream, except that we
4179    # now are encoding, not decoding a bitstream, with most values set
4180    # to missing value, and we do not need to fully expand the
4181    # descriptors.
4182    my $B_table = $self->{B_TABLE};
4183    my @operators;
4184    my $bitstream = chr(255) x 65536; # one bits only
4185    my $pos = 0;
4186
4187    my @desc = split /\s/, $self->{DESCRIPTORS_EXPANDED};
4188  D_LOOP: for (my $idesc = 0; $idesc < @desc; $idesc++) {
4189
4190        my $id = $desc[$idesc];
4191        my $f = substr($id,0,1);
4192        my $x = substr($id,1,2)+0;
4193        my $y = substr($id,3,3)+0;
4194
4195        if ($f == 1) {
4196            # Delayed replication
4197            if ($x == 0) {
4198                _complain("Nonsensical replication of zero descriptors ($id)");
4199                $idesc++;
4200                next D_LOOP;
4201            }
4202            _croak "$id _expand_descriptors() did not do its job"
4203                if $y > 0;
4204
4205            $_ = $desc[$idesc+1];
4206            _croak "$id Erroneous replication factor"
4207                unless /^0310(00|01|02|11|12)/ && exists $B_table->{$_};
4208            my $factor = 1;
4209            if (@delayed_repl && /^03100(1|2)/) {
4210                $factor = shift @delayed_repl;
4211                croak "Delayed replication factor must be positive integer in "
4212                    . "encode_nil_message, is '$factor'"
4213                        if ($factor !~ /^\d+$/ || $factor == 0);
4214            }
4215            my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$_};
4216            if ($Spew) {
4217                $self->_spew(3, "%6s  %-20s   %s", $id, $unit, $name);
4218                $self->_spew(3, "  %s", $factor);
4219            }
4220            dec2bitstream($factor, $bitstream, $pos, $width);
4221            $pos += $width;
4222            # Include the delayed replication in descriptor list
4223            splice @desc, $idesc++, 0, $_;
4224
4225            my @r = ();
4226            push @r, @desc[($idesc+2)..($idesc+$x+1)] while $factor--;
4227            $self->_spew(4, "Delayed replication ($id $_ -> @r)") if $Spew;
4228            splice @desc, $idesc, 2+$x, @r;
4229
4230            if ($idesc < @desc) {
4231                redo D_LOOP;
4232            } else {
4233                last D_LOOP; # Might happen if delayed factor is 0
4234            }
4235
4236        } elsif ($f == 2) {
4237            my $next_id = $desc[$idesc+1];
4238            my $flow;
4239            my $bm_idesc;
4240            ($pos, $flow, $bm_idesc, @operators)
4241                = $self->_apply_operator_descriptor($id, $x, $y, $pos, 0,
4242                                                    $next_id, @operators);
4243            next D_LOOP if $flow eq 'next';
4244        }
4245
4246        # We now have a "real" data descriptor
4247
4248        # Find the relevant entry in BUFR table B
4249        _croak "Data descriptor $id is not present in BUFR table B"
4250            unless exists $B_table->{$id};
4251        my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$id};
4252        $self->_spew(3, "%6s  %-20s   %s", $id, $unit, $name) if $Spew;
4253
4254        # Override Table B values if Data Description Operators are in effect
4255        if ($self->{NUM_CHANGE_OPERATORS} > 0) {
4256            if ($unit ne 'CCITTIA5' && $unit !~ /^(CODE|FLAG)/) {
4257                if (defined $self->{CHANGE_SRW}) {
4258                    $scale += $self->{CHANGE_SRW};
4259                    $width += int((10*$self->{CHANGE_SRW}+2)/3);
4260                    $refval *= 10*$self->{CHANGE_SRW};
4261                } else {
4262                    $scale += $self->{CHANGE_SCALE} if defined $self->{CHANGE_SCALE};
4263                    $width += $self->{CHANGE_WIDTH} if defined $self->{CHANGE_WIDTH};
4264                }
4265            } elsif ($unit eq 'CCITTIA5' && defined $self->{CHANGE_CCITTIA5_WIDTH}) {
4266                $width = $self->{CHANGE_CCITTIA5_WIDTH}
4267            }
4268            $refval = $self->{NEW_REFVAL_OF}{$id} if defined $self->{NEW_REFVAL_OF}{$id};
4269        }
4270        _croak "$id Data width <= 0" if $width <= 0;
4271
4272        if ($stationid_ref->{$id}) {
4273            my $value = $stationid_ref->{$id};
4274            $self->_spew(3, "  %s", $value) if $Spew;
4275            if ($unit eq 'CCITTIA5') {
4276                # Encode ASCII string in $width bits (left justified,
4277                # padded with spaces)
4278                my $num_bytes = int($width/8);
4279                _croak "Ascii string too long to fit in $width bits: $value"
4280                    if length($value) > $num_bytes;
4281                $value .= ' ' x ($num_bytes - length($value));
4282                ascii2bitstream($value, $bitstream, $pos, $num_bytes);
4283            } else {
4284                # Encode value as integer in $width bits
4285                $value = int($value * 10**$scale - $refval + 0.5);
4286                _croak "Data value no $id is negative: $value"
4287                    if $value < 0;
4288                dec2bitstream($value, $bitstream, $pos, $width);
4289            }
4290        } else {
4291            # Missing value is encoded as 1 bits
4292        }
4293        $pos += $width;
4294    }
4295
4296    # Pad with 0 bits if necessary to get an even or integer number of
4297    # octets, depending on bufr edition
4298    my $padnum = $self->{BUFR_EDITION} < 4 ? (16-($pos%16)) % 16 : (8-($pos%8)) % 8;
4299    if ($padnum > 0) {
4300        null2bitstream($bitstream, $pos, $padnum);
4301    }
4302    my $len = ($pos + $padnum)/8;
4303    $bitstream = substr $bitstream, 0, $len;
4304
4305    # Encode section 4
4306    my $sec4_len_binary = pack("N", $len + 4);
4307    my $sec4_stream = substr($sec4_len_binary, 1, 3) . "\0" . $bitstream;
4308
4309    return $sec4_stream;
4310}
4311
4312## Encode bitstream using the data values in $data_refs, first
4313## expanding section 3 fully (and comparing with $desc_refs to check
4314## for consistency). This sub is very similar to sub _decode_bitstream
4315sub _encode_bitstream {
4316    my $self = shift;
4317    $self->{CODING} = 'ENCODE';
4318    my ($data_refs, $desc_refs) = @_;
4319
4320    # Expand section 3 except for delayed replication and operator descriptors
4321    my @unexpanded = split / /, $self->{DESCRIPTORS_UNEXPANDED};
4322    my $alias = "$self->{TABLE_VERSION} " . $self->{DESCRIPTORS_UNEXPANDED};
4323    if (exists $Descriptors_already_expanded{$alias}) {
4324        $self->{DESCRIPTORS_EXPANDED} = $Descriptors_already_expanded{$alias};
4325    } else {
4326        $Descriptors_already_expanded{$alias} = $self->{DESCRIPTORS_EXPANDED}
4327            = join " ", _expand_descriptors($self->{D_TABLE}, @unexpanded);
4328    }
4329
4330    my $nsubsets = $self->{NUM_SUBSETS};
4331    my $B_table = $self->{B_TABLE};
4332    my $maxlen = 1024;
4333    my $bitstream = chr(255) x $maxlen; # one bits only
4334    my $pos = 0;
4335    my @operators;
4336
4337  S_LOOP: foreach my $isub (1..$nsubsets) {
4338        $self->_spew(2, "Encoding subset number %d", $isub) if $Spew;
4339
4340        # Bit maps might vary from subset to subset, so must be rebuilt
4341        undef $self->{BITMAP_OPERATORS};
4342        undef $self->{BITMAP_START};
4343        undef $self->{REUSE_BITMAP};
4344        $self->{NUM_BITMAPS} = 0;
4345        $self->{BACKWARD_DATA_REFERENCE} = 1;
4346        $self->{NUM_CHANGE_OPERATORS} = 0;
4347
4348        # The data values to use for this subset
4349        my $data_ref = $data_refs->[$isub];
4350        # The descriptors from expanding section 3
4351        my @desc = split /\s/, $self->{DESCRIPTORS_EXPANDED};
4352        # The descriptors to compare with for this subset
4353        my $desc_ref = $desc_refs->[$isub];
4354
4355        # Note: @desc as well as $idesc may be changed during this loop,
4356        # so we cannot use a foreach loop instead
4357      D_LOOP: for (my $idesc = 0; $idesc < @desc; $idesc++) {
4358            my $id = $desc[$idesc]
4359                || _croak("No descriptor no. $idesc defined. Consider using --strict_checking 2"
4360                          . " or --verbose 4 to explore what went wrong in the encoding");
4361            my $f = substr($id,0,1);
4362            my $x = substr($id,1,2)+0;
4363            my $y = substr($id,3,3)+0;
4364
4365            if ($f == 1) {
4366                # Delayed replication
4367                if ($x == 0) {
4368                    _complain("Nonsensical replication of zero descriptors ($id)");
4369                    $idesc++;
4370                    next D_LOOP;
4371                }
4372                _croak "$id _expand_descriptors() did not do its job"
4373                    if $y > 0;
4374
4375                my $next_id = $desc[$idesc+1];
4376                _croak "$id Erroneous replication factor"
4377                    unless $next_id =~ /^0310(00|01|02|11|12)/ && exists $B_table->{$next_id};
4378                _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected $next_id"
4379                    if $desc_ref->[$idesc] != $next_id;
4380                my $factor = $data_ref->[$idesc];
4381                my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$next_id};
4382                if ($Spew) {
4383                    $self->_spew(3, "%6s  %-20s  %s", $next_id, $unit, $name);
4384                    $self->_spew(3, "  %s", $factor);
4385                }
4386                ($bitstream, $pos, $maxlen)
4387                    = $self->_encode_value($factor,$isub,$unit,$scale,$refval,
4388                                           $width,$next_id,$bitstream,$pos,$maxlen);
4389                # Include the delayed replication/repetition in descriptor list
4390                splice @desc, $idesc++, 0, $next_id;
4391
4392                my @r = ();
4393                push @r, @desc[($idesc+2)..($idesc+$x+1)] while $factor--;
4394                splice @desc, $idesc, 2+$x, @r;
4395
4396                if ($next_id eq '031011' || $next_id eq '031012') {
4397                    # For delayed repetition we should include data just
4398                    # once, so skip to the last set in data array
4399                    $idesc += $x * ($data_ref->[$idesc-1] - 1);
4400                    # We ought to check that the data sets we skipped are
4401                    # indeed equal to the last set!
4402                    $self->_spew(4, "Delayed repetition ($id $next_id -> @r)") if $Spew;
4403                } else {
4404                    $self->_spew(4, "Delayed replication ($id $next_id -> @r)") if $Spew;
4405                }
4406                if ($idesc < @desc) {
4407                    redo D_LOOP;
4408                } else {
4409                    last D_LOOP; # Might happen if delayed factor is 0
4410                }
4411
4412            } elsif ($f == 2) {
4413                my $flow;
4414                my $bm_idesc;
4415                ($pos, $flow, $bm_idesc, @operators)
4416                    = $self->_apply_operator_descriptor($id, $x, $y, $pos, $isub,
4417                                                        $desc[$idesc+1], @operators);
4418                if ($flow eq 'redo_bitmap') {
4419                    # Data value is associated with the descriptor
4420                    # defined by bit map. Remember original and new
4421                    # index in descriptor array for the bit mapped
4422                    # values ('dr' = data reference)
4423                    my $dr_idesc;
4424                    if (!defined $bm_idesc) {
4425                        $dr_idesc = shift @{ $self->{REUSE_BITMAP}->[$isub]};
4426                    } elsif (!$Show_all_operators) {
4427                        $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
4428                            + $bm_idesc;
4429                    } else {
4430                        $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}];
4431                        # Skip operator descriptors
4432                        while ($bm_idesc-- > 0) {
4433                            $dr_idesc++;
4434                            $dr_idesc++ while ($desc[$dr_idesc] >= 200000);
4435                        }
4436                    }
4437                    push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[$isub] },
4438                         $dr_idesc, $idesc;
4439                    $desc[$idesc] = $desc[$dr_idesc];
4440                    redo D_LOOP;
4441                } elsif ($flow eq 'signify_character') {
4442                    _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected $id"
4443                        if $desc_ref->[$idesc] != $id;
4444                    # Get ASCII string
4445                    my $value = $data_ref->[$idesc];
4446                    my $name = 'SIGNIFY CHARACTER';
4447                    my $unit = 'CCITTIA5';
4448                    my ($scale, $refval, $width) = (0, 0, 8*$y);
4449                    ($bitstream, $pos, $maxlen)
4450                        = $self->_encode_value($value,$isub,$unit,$scale,$refval,$width,"205$y",$bitstream,$pos,$maxlen);
4451                    next D_LOOP;
4452                } elsif ($flow eq 'no_value') {
4453                    next D_LOOP;
4454                }
4455
4456                # Remove operator descriptor from @desc
4457                splice @desc, $idesc--, 1;
4458
4459                next D_LOOP if $flow eq 'next';
4460                last D_LOOP if $flow eq 'last';
4461            }
4462
4463            if ($self->{CHANGE_REFERENCE_VALUE}) {
4464                # The data descriptor is to be associated with a new
4465                # reference value, which is fetched from data stream,
4466                # possibly with f=9 instead of f=0 for descriptor
4467                $id -= 900000 if $id =~ /^9/;
4468                _croak "Change reference operator 203Y is not followed by element"
4469                    . " descriptor, but $id" if $f > 0;
4470                my $new_refval = $data_ref->[$idesc];
4471                $self->{NEW_REFVAL_OF}{$id}{$isub} = $new_refval;
4472                ($bitstream, $pos, $maxlen)
4473                    = $self->_encode_reference_value($new_refval,$id,$bitstream,$pos,$maxlen);
4474                next D_LOOP;
4475            }
4476
4477            # If operator 204$y 'Add associated field' is in effect,
4478            # each data value is preceded by $y bits which should be
4479            # encoded separately. We choose to provide a descriptor
4480            # 999999 in this case (like the ECMWF BUFRDC software)
4481            if ($self->{ADD_ASSOCIATED_FIELD} and $id ne '031021') {
4482                # First encode associated field
4483                _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected 999999"
4484                    if $desc_ref->[$idesc] != 999999;
4485                my $value = $data_ref->[$idesc];
4486                my $name = 'ASSOCIATED FIELD';
4487                my $unit = 'NUMERIC';
4488                my ($scale, $refval) = (0, 0);
4489                my $width = $self->{ADD_ASSOCIATED_FIELD};
4490                $self->_spew(4, "Added associated field: %s", $value) if $Spew;
4491                ($bitstream, $pos, $maxlen)
4492                    = $self->_encode_value($value,$isub,$unit,$scale,$refval,$width,999999,$bitstream,$pos,$maxlen);
4493                # Insert the artificial 999999 descriptor for the
4494                # associated value and increment $idesc to prepare for
4495                # handling the 'real' value below
4496                splice @desc, $idesc++, 0, 999999;
4497            }
4498
4499
4500
4501            # For quality information, if this relates to a bit map we
4502            # need to store index of the data ($data_idesc) for which
4503            # the quality information applies, as well as the new
4504            # index ($idesc) in the descriptor array for the bit
4505            # mapped values
4506            if (substr($id,0,3) eq '033'
4507                && defined $self->{BITMAP_OPERATORS}
4508                && $self->{BITMAP_OPERATORS}->[-1] eq '222000') {
4509                if (defined $self->{REUSE_BITMAP}) {
4510                    my $data_idesc = shift @{ $self->{REUSE_BITMAP}->[$isub] };
4511                    _croak "$id: Not enough quality values provided"
4512                        if not defined $data_idesc;
4513                    push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[$isub] },
4514                         $data_idesc, $idesc;
4515                } else {
4516                    my $data_idesc = shift @{ $self->{CURRENT_BITMAP} };
4517                    _croak "$id: Not enough quality values provided"
4518                        if not defined $data_idesc;
4519                    push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[$isub] },
4520                         $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
4521                             + $data_idesc, $idesc;
4522                }
4523            }
4524
4525            my $value = $data_ref->[$idesc];
4526
4527            if ($id eq '031031' and $self->{BUILD_BITMAP}) {
4528                # Store the index of expanded descriptors if data is
4529                # marked as present in data present indicator: 0 is
4530                # 'present', 1 (undef value) is 'not present'. E.g.
4531                # bitmap = 1100110 => (2,3,6) is stored in $self->{CURRENT_BITMAP}
4532                if (defined $value and $value == 0) {
4533                    push @{$self->{CURRENT_BITMAP}}, $self->{BITMAP_INDEX};
4534                }
4535                $self->{BITMAP_INDEX}++;
4536                if ($self->{BACKWARD_DATA_REFERENCE} == $self->{NUM_BITMAPS}) {
4537                    my $numb = $self->{NUM_BITMAPS};
4538                    if (!defined $self->{BITMAP_START}[$numb]) {
4539                        # Look up the element descriptor immediately
4540                        # preceding the bitmap operator
4541                        my $i = $idesc;
4542                        $i-- while ($desc[$i] ne $self->{BITMAP_OPERATORS}->[-1]
4543                                    && $i >=0);
4544                        $i-- while ($desc[$i] > 100000 && $i >=0);
4545                        _croak "No element descriptor preceding bitmap" if $i < 0;
4546                        $self->{BITMAP_START}[$numb] = $i;
4547                    } else {
4548                        $self->{BITMAP_START}[$numb]--;
4549                        _croak "Bitmap too big"
4550                            if $self->{BITMAP_START}[$numb] < 0;
4551                    }
4552                }
4553            } elsif ($self->{BUILD_BITMAP} and $self->{BITMAP_INDEX} > 0) {
4554                # We have finished building the bit map
4555                $self->{BUILD_BITMAP} = 0;
4556                $self->{BITMAP_INDEX} = 0;
4557                if ($self->{BACKWARD_DATA_REFERENCE} != $self->{NUM_BITMAPS}) {
4558                    $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
4559                        = $self->{BITMAP_START}[$self->{BACKWARD_DATA_REFERENCE}];
4560                }
4561            }
4562
4563            _croak "Not enough descriptors provided (expected no $idesc to be $id)"
4564                unless exists $desc_ref->[$idesc];
4565            _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected $id"
4566                    if $desc_ref->[$idesc] != $id;
4567
4568            # Find the relevant entry in BUFR table B
4569            _croak "Error: Data descriptor $id is not present in BUFR table B"
4570                unless exists $B_table->{$id};
4571            my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$id};
4572            $refval = $self->{NEW_REFVAL_OF}{$id}{$isub} if defined $self->{NEW_REFVAL_OF}{$id}
4573                && defined $self->{NEW_REFVAL_OF}{$id}{$isub};
4574            if ($Spew) {
4575                $self->_spew(3, "%6s  %-20s  %s", $id, $unit, $name);
4576                $self->_spew(3, "  %s", defined $value ? $value : 'missing');
4577            }
4578########### call to_encode_value inlined for speed
4579    # Override Table B values if Data Description Operators are in
4580    # effect (except for associated fields)
4581    if ($self->{NUM_CHANGE_OPERATORS} > 0 && $id != 999999) {
4582        if ($unit ne 'CCITTIA5' && $unit !~ /^(CODE|FLAG)/) {
4583            if (defined $self->{CHANGE_SRW}) {
4584                $scale += $self->{CHANGE_SRW};
4585                $width += int((10*$self->{CHANGE_SRW}+2)/3);
4586                $refval *= 10*$self->{CHANGE_SRW};
4587            } else {
4588                $scale += $self->{CHANGE_SCALE} if defined $self->{CHANGE_SCALE};
4589                $width += $self->{CHANGE_WIDTH} if defined $self->{CHANGE_WIDTH};
4590            }
4591        } elsif ($unit eq 'CCITTIA5' && defined $self->{CHANGE_CCITTIA5_WIDTH}) {
4592            $width = $self->{CHANGE_CCITTIA5_WIDTH}
4593        }
4594        _croak "$id Data width is $width which is <= 0" if $width <= 0;
4595        $refval = $self->{NEW_REFVAL_OF}{$id}{$isub} if defined $self->{NEW_REFVAL_OF}{$id}
4596        && defined $self->{NEW_REFVAL_OF}{$id}{$isub};
4597        # Difference statistical values use different width and reference value
4598        if ($self->{DIFFERENCE_STATISTICAL_VALUE}) {
4599            $width += 1;
4600            $refval = -2**$width;
4601            undef $self->{DIFFERENCE_STATISTICAL_VALUE};
4602            $self->{NUM_CHANGE_OPERATORS}--;
4603        }
4604    }
4605
4606    # Ensure that bitstream is big enough to encode $value
4607    while ($pos + $width > $maxlen*8) {
4608        $bitstream .= chr(255) x $maxlen;
4609        $maxlen *= 2;
4610    }
4611
4612    if (not defined($value)) {
4613        # Missing value is encoded as 1 bits
4614        $pos += $width;
4615    } elsif ($unit eq 'CCITTIA5') {
4616        # Encode ASCII string in $width bits (left justified,
4617        # padded with spaces)
4618        my $num_bytes = int ($width/8);
4619        _croak "Ascii string too long to fit in $width bits: $value"
4620            if length($value) > $num_bytes;
4621        $value .= ' ' x ($num_bytes - length($value));
4622        ascii2bitstream($value, $bitstream, $pos, $num_bytes);
4623        $pos += $width;
4624    } else {
4625        # Encode value as integer in $width bits
4626        _croak "Value '$value' is not a number for descriptor $id"
4627            unless looks_like_number($value);
4628        $value = int($value * 10**$scale - $refval + 0.5);
4629        _croak "Encoded data value for $id is negative: $value" if $value < 0;
4630        my $max_value = 2**$width - 1;
4631        _croak "Encoded data value for $id is too big to fit in $width bits: $value"
4632            if $value > $max_value;
4633        # Check for illegal flag value
4634        if ($Strict_checking && $unit =~ /^FLAG[ ]?TABLE/ && $width > 1
4635            && $value < $max_value && $value % 2) {
4636            _complain("$id - $value: rightmost bit $width is set indicating missing value"
4637                      . " but then value should be $max_value");
4638        }
4639        dec2bitstream($value, $bitstream, $pos, $width);
4640        $pos += $width;
4641    }
4642########### end inlining of_encode_value
4643        } # End D_LOOP
4644    } # END S_LOOP
4645
4646
4647
4648
4649    # Pad with 0 bits if necessary to get an even or integer number of
4650    # octets, depending on bufr edition
4651    my $padnum = $self->{BUFR_EDITION} < 4 ? (16-($pos%16)) % 16 : (8-($pos%8)) % 8;
4652    if ($padnum > 0) {
4653        null2bitstream($bitstream, $pos, $padnum);
4654    }
4655    my $len = ($pos + $padnum)/8;
4656    $bitstream = substr $bitstream, 0, $len;
4657
4658    return ($bitstream, $len);
4659}
4660
4661sub _encode_reference_value {
4662    my $self = shift;
4663    my ($refval,$id,$bitstream,$pos,$maxlen) = @_;
4664
4665    my $width = $self->{CHANGE_REFERENCE_VALUE};
4666
4667    # Ensure that bitstream is big enough to encode $value
4668    while ($pos + $width > $maxlen*8) {
4669        $bitstream .= chr(255) x $maxlen;
4670        $maxlen *= 2;
4671    }
4672
4673    $self->_spew(4, "Encoding new reference value %d for %6s in %d bits",
4674                 $refval, $id, $width) if $Spew;
4675    if ($refval >= 0) {
4676        _croak "Encoded reference value for $id is too big to fit "
4677            . "in $width bits: $refval"
4678                if $refval > 2**$width - 1;
4679        dec2bitstream($refval, $bitstream, $pos, $width);
4680    } else {
4681        # Negative reference values should be encoded by setting first
4682        # bit to 1 and then encoding absolute value
4683        _croak "Encoded reference value for $id is too big to fit "
4684            . "in $width bits: $refval"
4685                if -$refval > 2**($width-1) - 1;
4686        dec2bitstream(-$refval, $bitstream, $pos+1, $width-1);
4687    }
4688    $pos += $width;
4689
4690    return ($bitstream, $pos, $maxlen);
4691}
4692
4693sub _encode_value {
4694    my $self = shift;
4695    my ($value,$isub,$unit,$scale,$refval,$width,$id,$bitstream,$pos,$maxlen) = @_;
4696
4697    # Override Table B values if Data Description Operators are in
4698    # effect (except for associated fields)
4699    if ($self->{NUM_CHANGE_OPERATORS} > 0 && $id != 999999) {
4700        if ($unit ne 'CCITTIA5' && $unit !~ /^(CODE|FLAG)/) {
4701            if (defined $self->{CHANGE_SRW}) {
4702                $scale += $self->{CHANGE_SRW};
4703                $width += int((10*$self->{CHANGE_SRW}+2)/3);
4704                $refval *= 10*$self->{CHANGE_SRW};
4705            } else {
4706                $scale += $self->{CHANGE_SCALE} if defined $self->{CHANGE_SCALE};
4707                $width += $self->{CHANGE_WIDTH} if defined $self->{CHANGE_WIDTH};
4708            }
4709        } elsif ($unit eq 'CCITTIA5' && defined $self->{CHANGE_CCITTIA5_WIDTH}) {
4710            $width = $self->{CHANGE_CCITTIA5_WIDTH}
4711        }
4712        _croak "$id Data width is $width which is <= 0" if $width <= 0;
4713        $refval = $self->{NEW_REFVAL_OF}{$id}{$isub} if defined $self->{NEW_REFVAL_OF}{$id}
4714        && defined $self->{NEW_REFVAL_OF}{$id}{$isub};
4715        # Difference statistical values use different width and reference value
4716        if ($self->{DIFFERENCE_STATISTICAL_VALUE}) {
4717            $width += 1;
4718            $refval = -2**$width;
4719            undef $self->{DIFFERENCE_STATISTICAL_VALUE};
4720            $self->{NUM_CHANGE_OPERATORS}--;
4721        }
4722    }
4723
4724    # Ensure that bitstream is big enough to encode $value
4725    while ($pos + $width > $maxlen*8) {
4726        $bitstream .= chr(255) x $maxlen;
4727        $maxlen *= 2;
4728    }
4729
4730    if (not defined($value)) {
4731        # Missing value is encoded as 1 bits
4732        $pos += $width;
4733    } elsif ($unit eq 'CCITTIA5') {
4734        # Encode ASCII string in $width bits (left justified,
4735        # padded with spaces)
4736        my $num_bytes = int ($width/8);
4737        _croak "Ascii string too long to fit in $width bits: $value"
4738            if length($value) > $num_bytes;
4739        $value .= ' ' x ($num_bytes - length($value));
4740        ascii2bitstream($value, $bitstream, $pos, $num_bytes);
4741        $pos += $width;
4742    } else {
4743        # Encode value as integer in $width bits
4744        _croak "Value '$value' is not a number for descriptor $id"
4745            unless looks_like_number($value);
4746        $value = int($value * 10**$scale - $refval + 0.5);
4747        _croak "Encoded data value for $id is negative: $value" if $value < 0;
4748        my $max_value = 2**$width - 1;
4749        _croak "Encoded data value for $id is too big to fit in $width bits: $value"
4750            if $value > $max_value;
4751        # Check for illegal flag value
4752        if ($Strict_checking && $unit =~ /^FLAG[ ]?TABLE/ && $width > 1
4753            && $value < $max_value && $value % 2) {
4754            _complain("$id - $value: rightmost bit $width is set indicating missing value"
4755                      . " but then value should be $max_value");
4756        }
4757        dec2bitstream($value, $bitstream, $pos, $width);
4758        $pos += $width;
4759    }
4760
4761    return ($bitstream, $pos, $maxlen);
4762}
4763
4764# Encode reference value using BUFR compression, assuming all subsets
4765# have same reference value
4766sub _encode_compressed_reference_value {
4767    my $self = shift;
4768    my ($refval,$id,$nsubsets,$bitstream,$pos,$maxlen) = @_;
4769
4770    my $width = $self->{CHANGE_REFERENCE_VALUE};
4771
4772    # Ensure that bitstream is big enough to encode $value
4773    while ($pos + ($nsubsets+1)*$width + 6 > $maxlen*8) {
4774        $bitstream .= chr(255) x $maxlen;
4775        $maxlen *= 2;
4776    }
4777
4778    $self->_spew(4, "Encoding new reference value %d for %6s in %d bits",
4779                 $refval, $id, $width) if $Spew;
4780    # Encode value as integer in $width bits
4781    if ($refval >= 0) {
4782        _croak "Encoded reference value for $id is too big to fit "
4783            . "in $width bits: $refval" if $refval > 2**$width - 1;
4784        dec2bitstream($refval, $bitstream, $pos, $width);
4785    } else {
4786        # Negative reference values should be encoded by setting first
4787        # bit to 1 and then encoding absolute value
4788        _croak "Encoded reference value for $id is too big to fit "
4789            . "in $width bits: $refval" if -$refval > 2**($width-1) - 1;
4790        dec2bitstream(-$refval, $bitstream, $pos+1, $width-1);
4791    }
4792    $pos += $width;
4793
4794    # Increment width set to 0
4795    dec2bitstream(0, $bitstream, $pos, 6);
4796    $pos += 6;
4797
4798    return ($bitstream, $pos, $maxlen);
4799}
4800
4801sub _encode_compressed_value {
4802    my $self = shift;
4803    my ($bitstream,$pos,$maxlen,$unit,$scale,$refval,$width,$id,$data_refs,$idesc,$nsubsets) = @_;
4804
4805    # Override Table B values if Data Description Operators are in
4806    # effect (except for associated fields)
4807    if ($self->{NUM_CHANGE_OPERATORS} > 0 && $id != 999999) {
4808        if ($unit ne 'CCITTIA5' && $unit !~ /^(CODE|FLAG)/) {
4809            if (defined $self->{CHANGE_SRW}) {
4810                $scale += $self->{CHANGE_SRW};
4811                $width += int((10*$self->{CHANGE_SRW}+2)/3);
4812                $refval *= 10*$self->{CHANGE_SRW};
4813            } else {
4814                $scale += $self->{CHANGE_SCALE} if defined $self->{CHANGE_SCALE};
4815                $width += $self->{CHANGE_WIDTH} if defined $self->{CHANGE_WIDTH};
4816            }
4817        } elsif ($unit eq 'CCITTIA5' && defined $self->{CHANGE_CCITTIA5_WIDTH}) {
4818            $width = $self->{CHANGE_CCITTIA5_WIDTH}
4819        }
4820        _croak "$id Data width <= 0" if $width <= 0;
4821        $refval = $self->{NEW_REFVAL_OF}{$id} if defined $self->{NEW_REFVAL_OF}{$id};
4822        # Difference statistical values use different width and reference value
4823        if ($self->{DIFFERENCE_STATISTICAL_VALUE}) {
4824            $width += 1;
4825            $refval = -2**$width;
4826            undef $self->{DIFFERENCE_STATISTICAL_VALUE};
4827            $self->{NUM_CHANGE_OPERATORS}--;
4828        }
4829    }
4830
4831    # Ensure that bitstream is big enough to encode $value
4832    while ($pos + ($nsubsets+1)*$width + 6 > $maxlen*8) {
4833        $bitstream .= chr(255) x $maxlen;
4834        $maxlen *= 2;
4835    }
4836
4837    # Get all values for this descriptor
4838    my @values;
4839    my $first_value = $data_refs->[1][$idesc];
4840    my $all_equal = 1;        # Set to 0 if at least 2 elements differ
4841    foreach my $value (map { $data_refs->[$_][$idesc] } 2..$nsubsets) {
4842        if (defined $value && $unit ne 'CCITTIA5' && !looks_like_number($value)) {
4843            _croak "Value '$value' is not a number for descriptor $id"
4844        }
4845        # This used to be a sub (_check_equality), but inlined for speed
4846        if ($all_equal) {
4847            if (defined $value && defined $first_value) {
4848                if ($unit eq 'CCITTIA5') {
4849                    $all_equal = 0 if $value ne $first_value;
4850                } else {
4851                    $all_equal = 0 if $value != $first_value;
4852                }
4853            } elsif (defined $value || defined $first_value) {
4854                $all_equal = 0;
4855            }
4856        }
4857        if (not defined $value) {
4858            push @values, undef;
4859        } elsif ($unit eq 'CCITTIA5') {
4860            push @values, $value;
4861        } else {
4862            push @values, int($value * 10**$scale - $refval + 0.5);
4863        }
4864        # Check for illegal flag value
4865        if ($Strict_checking and $unit =~ /^FLAG[ ]?TABLE/ and $width > 1) {
4866            if (defined $value and $value ne 'missing' and $value % 2) {
4867                my $max_value = 2**$width - 1;
4868                _complain("$id - value $value in subset $_:\n"
4869                          . "rightmost bit $width is set indicating missing value"
4870                          . " but then value should be $max_value");
4871            }
4872        }
4873    }
4874
4875    if ($all_equal) {
4876        # Same value in all subsets. No need to calculate or store increments
4877        if (defined $first_value) {
4878            if ($unit eq 'CCITTIA5') {
4879                # Encode ASCII string in $width bits (left justified,
4880                # padded with spaces)
4881                my $num_bytes = int ($width/8);
4882                _croak "Ascii string too long to fit in $width bits: $first_value"
4883                    if length($first_value) > $num_bytes;
4884                $first_value .= ' ' x ($num_bytes - length($first_value));
4885                ascii2bitstream($first_value, $bitstream, $pos, $num_bytes);
4886            } else {
4887                # Encode value as integer in $width bits
4888                _croak "First value '$first_value' is not a number for descriptor $id"
4889                    unless looks_like_number($first_value);
4890                $first_value = int($first_value * 10**$scale - $refval + 0.5);
4891                _croak "Encoded data value for $id is negative: $first_value"
4892                    if $first_value < 0;
4893                _croak "Encoded data value for $id is too big to fit "
4894                    . "in $width bits: $first_value"
4895                        if $first_value > 2**$width - 1;
4896                dec2bitstream($first_value, $bitstream, $pos, $width);
4897            }
4898        } else {
4899            # Missing value is encoded as 1 bits, but bitstream is
4900            # padded with 1 bits already
4901        }
4902        $pos += $width;
4903        # Increment width set to 0
4904        dec2bitstream(0, $bitstream, $pos, 6);
4905        $pos += 6;
4906    } else {
4907        if ($unit eq 'CCITTIA5') {
4908            unshift @values, $first_value;
4909            # Local reference value set to 0 bits
4910            null2bitstream($bitstream, $pos, $width);
4911            $pos += $width;
4912            # Do not store more characters than needed: remove leading
4913            # and trailing spaces, then right pad with spaces so that
4914            # all strings has same length as largest string
4915            my $largest_length = _trimpad(\@values);
4916            dec2bitstream($largest_length, $bitstream, $pos, 6);
4917            $pos += 6;
4918            # Store the character values
4919            foreach my $value (@values) {
4920                if (defined $value) {
4921                    # Encode ASCII string in $largest_length bytes
4922                    ascii2bitstream($value, $bitstream, $pos, $largest_length);
4923                } else {
4924                    # Missing value is encoded as 1 bits, but
4925                    # bitstream is padded with 1 bits already
4926                }
4927                $pos += $largest_length * 8;
4928            }
4929        } else {
4930            _croak "First value '$first_value' is not a number for descriptor $id"
4931                if defined($first_value) && !looks_like_number($first_value);
4932            unshift @values, defined $first_value
4933                ? int($first_value * 10**$scale - $refval + 0.5)
4934                    : undef;
4935            # Numeric data. First find minimum value
4936            my ($min_value, $isub) = _minimum(\@values);
4937            _croak "Encoded data value for $id and subset $isub is negative: $min_value"
4938                if $min_value < 0;
4939            my @inc_values =
4940                map { defined $_ ? $_ - $min_value : undef } @values;
4941            # Find how many bits are required to hold the increment
4942            # values (or rather: the highest increment value pluss one
4943            # (except for associated values), to be able to store
4944            # missing values also)
4945            my $max_inc = _maximum(\@inc_values);
4946            my $deltabits = ($id eq '999999')
4947                ?_get_number_of_bits_to_store($max_inc)
4948                    : _get_number_of_bits_to_store($max_inc + 1);
4949            # Store local reference value
4950            $self->_spew(5, " Local reference value: %d", $min_value) if $Spew;
4951            dec2bitstream($min_value, $bitstream, $pos, $width);
4952            $pos += $width;
4953            # Store increment width
4954            $self->_spew(5, " Increment width (bits): %d", $deltabits) if $Spew;
4955            dec2bitstream($deltabits, $bitstream, $pos, 6);
4956            $pos += 6;
4957            # Store values
4958            $self->_spew(5, " Increment values: %s",
4959                         join(',', map { defined $inc_values[$_]
4960                         ? $inc_values[$_] : ''} 0..$#inc_values))
4961                         if $Spew;
4962            foreach my $value (@inc_values) {
4963                if (defined $value) {
4964                    _complain("value " . ($value + $min_value) . " for $id too big"
4965                              . " to be encoded without compression")
4966                        if ($Strict_checking && ($value + $min_value) > 2**$width -1);
4967                    dec2bitstream($value, $bitstream, $pos, $deltabits);
4968                } else {
4969                    # Missing value is encoded as 1 bits, but
4970                    # bitstream is padded with 1 bits already
4971                }
4972                $pos += $deltabits;
4973            }
4974        }
4975    }
4976
4977    return ($bitstream, $pos, $maxlen);
4978}
4979
4980## Encode bitstream using the data values in $data_refs, first
4981## expanding section 3 fully (and comparing with $desc_refs to check
4982## for consistency). This sub is very similar to sub
4983## _decompress_bitstream
4984sub _encode_compressed_bitstream {
4985    my $self = shift;
4986    $self->{CODING} = 'ENCODE';
4987    my ($data_refs, $desc_refs) = @_;
4988
4989    # Expand section 3 except for delayed replication and operator
4990    # descriptors. This expansion is the same for all subsets, since
4991    # delayed replication has to be the same (this needs to be
4992    # checked) for compression to be possible
4993    my @unexpanded = split / /, $self->{DESCRIPTORS_UNEXPANDED};
4994    my $alias = "$self->{TABLE_VERSION} " . $self->{DESCRIPTORS_UNEXPANDED};
4995    if (exists $Descriptors_already_expanded{$alias}) {
4996        $self->{DESCRIPTORS_EXPANDED} = $Descriptors_already_expanded{$alias};
4997    } else {
4998        $Descriptors_already_expanded{$alias} = $self->{DESCRIPTORS_EXPANDED}
4999            = join " ", _expand_descriptors($self->{D_TABLE}, @unexpanded);
5000    }
5001    my @desc = split /\s/, $self->{DESCRIPTORS_EXPANDED};
5002
5003    my $nsubsets = $self->{NUM_SUBSETS};
5004    my $B_table = $self->{B_TABLE};
5005    my $maxlen = 1024;
5006    my $bitstream = chr(255) x $maxlen; # one bits only
5007    my $pos = 0;
5008    my @operators;
5009
5010    my $desc_ref = $desc_refs->[1];
5011
5012    # All subsets should have same set of expanded descriptors. This
5013    # is checked later, but we also need to check that the number of
5014    # descriptors in each subset is the same for all subsets
5015    my $num_desc = @{$desc_ref};
5016    foreach my $isub (2..$nsubsets) {
5017        my $num_d = @{$desc_refs->[$isub]};
5018        _croak "Compression impossible: Subset 1 contains $num_desc descriptors,"
5019            . " while subset $isub contains $num_d descriptors"
5020                if $num_d != $num_desc;
5021    }
5022
5023
5024  D_LOOP: for (my $idesc = 0; $idesc < @desc; $idesc++) {
5025        my $id = $desc[$idesc];
5026        my $f = substr($id,0,1);
5027        my $x = substr($id,1,2)+0;
5028        my $y = substr($id,3,3)+0;
5029
5030        if ($f == 1) {
5031            # Delayed replication
5032            if ($x == 0) {
5033                _complain("Nonsensical replication of zero descriptors ($id)");
5034                $idesc++;
5035                next D_LOOP;
5036            }
5037            _croak "$id _expand_descriptors() did not do its job"
5038                if $y > 0;
5039
5040            my $next_id = $desc[$idesc+1];
5041            _croak "$id Erroneous replication factor"
5042                unless $next_id =~ /^0310(00|01|02|11|12)/ && exists $B_table->{$next_id};
5043            _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected $next_id"
5044                if $desc_ref->[$idesc] != $next_id;
5045            my $factor = $data_refs->[1][$idesc];
5046            my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$next_id};
5047            if ($Spew) {
5048                $self->_spew(3, "%6s  %-20s  %s", $next_id, $unit, $name);
5049                $self->_spew(3, "  %s", $factor);
5050            }
5051            ($bitstream, $pos, $maxlen)
5052                = $self->_encode_compressed_value($bitstream,$pos,$maxlen,
5053                                                  $unit,$scale,$refval,$width,
5054                                                  $next_id,$data_refs,$idesc,$nsubsets);
5055            # Include the delayed replication/repetition in descriptor list
5056            splice @desc, $idesc++, 0, $next_id;
5057
5058            my @r = ();
5059            push @r, @desc[($idesc+2)..($idesc+$x+1)] while $factor--;
5060            splice @desc, $idesc, 2+$x, @r;
5061
5062            if ($next_id eq '031011' || $next_id eq '031012') {
5063                # For delayed repetition we should include data just
5064                # once, so skip to the last set in data array
5065                $idesc += $x * ($data_refs->[1][$idesc-1] - 1);
5066                # We ought to check that the data sets we skipped are
5067                # indeed equal to the last set!
5068                $self->_spew(4, "Delayed repetition ($id $next_id -> @r)") if $Spew;
5069            } else {
5070                $self->_spew(4, "Delayed replication ($id $next_id -> @r)") if $Spew;
5071            }
5072            if ($idesc < @desc) {
5073                redo D_LOOP;
5074            } else {
5075                last D_LOOP; # Might happen if delayed factor is 0
5076            }
5077
5078        } elsif ($f == 2) {
5079            my $flow;
5080            my $bm_idesc;
5081            ($pos, $flow, $bm_idesc, @operators)
5082                = $self->_apply_operator_descriptor($id, $x, $y, $pos, 0,
5083                                                    $desc[$idesc+1], @operators);
5084            if ($flow eq 'redo_bitmap') {
5085                # Data value is associated with the descriptor
5086                # defined by bit map. Remember original and new
5087                # index in descriptor array for the bit mapped
5088                # values ('dr' = data reference)
5089                my $dr_idesc;
5090                if (!defined $bm_idesc) {
5091                    $dr_idesc = shift @{ $self->{REUSE_BITMAP}->[0] };
5092                } elsif (!$Show_all_operators) {
5093                    $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
5094                        + $bm_idesc;
5095                } else {
5096                    $dr_idesc = $self->{BITMAP_START}[$self->{NUM_BITMAPS}];
5097                    # Skip operator descriptors
5098                    while ($bm_idesc-- > 0) {
5099                        $dr_idesc++;
5100                        $dr_idesc++ while ($desc[$dr_idesc] >= 200000);
5101                    }
5102                }
5103                push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[0] },
5104                     $dr_idesc, $idesc;
5105                $desc[$idesc] = $desc[$dr_idesc];
5106                redo D_LOOP;
5107            } elsif ($flow eq 'signify_character') {
5108                _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected $id"
5109                    if $desc_ref->[$idesc] != $id;
5110                # Get ASCII string
5111                my @values = map { $data_refs->[$_][$idesc] } 1..$nsubsets;
5112                my $name = 'SIGNIFY CHARACTER';
5113                my $unit = 'CCITTIA5';
5114                my ($scale, $refval, $width) = (0, 0, 8*$y);
5115                ($bitstream, $pos, $maxlen)
5116                    = $self->_encode_compressed_value($bitstream,$pos,$maxlen,
5117                                                      $unit,$scale,$refval,$width,
5118                                                      "205$y",$data_refs,$idesc,$nsubsets);
5119                next D_LOOP;
5120            } elsif ($flow eq 'no_value') {
5121                next D_LOOP;
5122            }
5123
5124            # Remove operator descriptor from @desc
5125            splice @desc, $idesc--, 1;
5126
5127            next D_LOOP if $flow eq 'next';
5128            last D_LOOP if $flow eq 'last';
5129        }
5130
5131        if ($self->{CHANGE_REFERENCE_VALUE}) {
5132            # The data descriptor is to be associated with a new
5133            # reference value, which is fetched from data stream,
5134            # possibly with f=9 instead of f=0 for descriptor
5135            $id -= 900000 if $id =~ /^9/;
5136            _croak "Change reference operator 203Y is not followed by element"
5137                . " descriptor, but $id" if $f > 0;
5138            my @new_ref_values = map { $data_refs->[$_][$idesc] } 1..$nsubsets;
5139            my $new_refval = $new_ref_values[0];
5140            # Check that they are all the same
5141            foreach my $val (@new_ref_values[1..$#new_ref_values]) {
5142                _croak "Change reference value differ between subsets"
5143                    . " which cannot be combined with BUFR compression"
5144                        if $val != $new_refval;
5145            }
5146            $self->{NEW_REFVAL_OF}{$id} = $new_refval;
5147            ($bitstream, $pos, $maxlen)
5148                = $self->_encode_compressed_reference_value($new_refval,$id,$nsubsets,$bitstream,$pos,$maxlen);
5149            next D_LOOP;
5150        }
5151
5152        # If operator 204$y 'Add associated field' is in effect,
5153        # each data value is preceded by $y bits which should be
5154        # encoded separately. We choose to provide a descriptor
5155        # 999999 in this case (like the ECMWF BUFRDC software)
5156        if ($self->{ADD_ASSOCIATED_FIELD} and $id ne '031021') {
5157            # First encode associated field
5158            _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected 999999"
5159                if $desc_ref->[$idesc] != 999999;
5160            my @values = map { $data_refs->[$_][$idesc] } 1..$nsubsets;
5161            my $name = 'ASSOCIATED FIELD';
5162            my $unit = 'NUMERIC';
5163            my ($scale, $refval) = (0, 0);
5164            my $width = $self->{ADD_ASSOCIATED_FIELD};
5165            if ($Spew) {
5166                $self->_spew(3, "%6s  %-20s  %s", $id, $unit, $name);
5167                $self->_spew(3, "  %s", 999999);
5168            }
5169            ($bitstream, $pos, $maxlen)
5170                = $self->_encode_compressed_value($bitstream,$pos,$maxlen,
5171                                                  $unit,$scale,$refval,$width,
5172                                                  999999,$data_refs,$idesc,$nsubsets);
5173            # Insert the artificial 999999 descriptor for the
5174            # associated value and increment $idesc to prepare for
5175            # handling the 'real' value below
5176            splice @desc, $idesc++, 0, 999999;
5177        }
5178
5179
5180
5181        # For quality information, if this relates to a bit map we
5182        # need to store index of the data ($data_idesc) for which
5183        # the quality information applies, as well as the new
5184        # index ($idesc) in the descriptor array for the bit
5185        # mapped values
5186        if (substr($id,0,3) eq '033'
5187            && defined $self->{BITMAP_OPERATORS}
5188            && $self->{BITMAP_OPERATORS}->[-1] eq '222000') {
5189            if (defined $self->{REUSE_BITMAP}) {
5190                my $data_idesc = shift @{ $self->{REUSE_BITMAP}->[0] };
5191                _croak "$id: Not enough quality values provided"
5192                    if not defined $data_idesc;
5193                push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[0] },
5194                     $data_idesc, $idesc;
5195            } else {
5196                my $data_idesc = shift @{ $self->{CURRENT_BITMAP} };
5197                _croak "$id: Not enough quality values provided"
5198                    if not defined $data_idesc;
5199                push @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}]->[0] },
5200                     $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
5201                         + $data_idesc, $idesc;
5202            }
5203        }
5204
5205        if ($id eq '031031' and $self->{BUILD_BITMAP}) {
5206            # Store the index of expanded descriptors if data is
5207            # marked as present in data present indicator: 0 is
5208            # 'present', 1 (undef value) is 'not present'. E.g.
5209            # bitmap = 1100110 => (2,3,6) is stored in $self->{CURRENT_BITMAP}
5210
5211            # NB: bit map might vary betwen subsets!!!!????
5212            if ($data_refs->[1][$idesc] == 0) {
5213                push @{$self->{CURRENT_BITMAP}}, $self->{BITMAP_INDEX};
5214            }
5215            $self->{BITMAP_INDEX}++;
5216            if ($self->{BACKWARD_DATA_REFERENCE} == $self->{NUM_BITMAPS}) {
5217                my $numb = $self->{NUM_BITMAPS};
5218                if (!defined $self->{BITMAP_START}[$numb]) {
5219                    # Look up the element descriptor immediately
5220                    # preceding the bitmap operator
5221                    my $i = $idesc;
5222                    $i-- while ($desc[$i] ne $self->{BITMAP_OPERATORS}->[-1]
5223                                && $i >=0);
5224                    $i-- while ($desc[$i] > 100000 && $i >=0);
5225                    _croak "No element descriptor preceding bitmap" if $i < 0;
5226                    $self->{BITMAP_START}[$numb] = $i;
5227                } else {
5228                    $self->{BITMAP_START}[$numb]--;
5229                    _croak "Bitmap too big"
5230                        if $self->{BITMAP_START}[$numb] < 0;
5231                }
5232            }
5233        } elsif ($self->{BUILD_BITMAP} and $self->{BITMAP_INDEX} > 0) {
5234            # We have finished building the bit map
5235            $self->{BUILD_BITMAP} = 0;
5236            $self->{BITMAP_INDEX} = 0;
5237            if ($self->{BACKWARD_DATA_REFERENCE} != $self->{NUM_BITMAPS}) {
5238                $self->{BITMAP_START}[$self->{NUM_BITMAPS}]
5239                    = $self->{BITMAP_START}[$self->{BACKWARD_DATA_REFERENCE}];
5240            }
5241        }
5242
5243        # We now have a "real" data descriptor
5244        _croak "Descriptor no $idesc is $desc_ref->[$idesc], expected $id"
5245            if $desc_ref->[$idesc] != $id;
5246
5247        # Find the relevant entry in BUFR table B
5248        _croak "Data descriptor $id is not present in BUFR table B"
5249            unless exists $B_table->{$id};
5250        my ($name,$unit,$scale,$refval,$width) = split /\0/, $B_table->{$id};
5251        if ($Spew) {
5252            $self->_spew(3, "%6s  %-20s  %s", $id, $unit, $name);
5253            $self->_spew(3, "  %s", join ' ',
5254                         map { defined($data_refs->[$_][$idesc]) ?
5255                                   $data_refs->[$_][$idesc] : 'missing'} 1..$nsubsets );
5256        }
5257        ($bitstream, $pos, $maxlen)
5258            = $self->_encode_compressed_value($bitstream,$pos,$maxlen,
5259                                              $unit,$scale,$refval,$width,
5260                                              $id,$data_refs,$idesc,$nsubsets);
5261    } # End D_LOOP
5262
5263    # Pad with 0 bits if necessary to get an even or integer number of
5264    # octets, depending on bufr edition
5265    my $padnum = $self->{BUFR_EDITION} < 4 ? (16-($pos%16)) % 16 : (8-($pos%8)) % 8;
5266    if ($padnum > 0) {
5267        null2bitstream($bitstream, $pos, $padnum);
5268    }
5269    my $len = ($pos + $padnum)/8;
5270    $bitstream = substr $bitstream, 0, $len;
5271
5272    return ($bitstream, $len);
5273}
5274
5275## Check that the length of data section computed from expansion of
5276## section 3 ($comp_len) equals actual length of data part of section
5277## 4, allowing for padding zero bits according to BUFR Regulation 94.1.3
5278## Strict checking should also check that padding actually consists of
5279## zero bits only.
5280sub _check_section4_length {
5281    my $self = shift;
5282    my ($comp_len, $actual_len) = @_;
5283
5284    if ($comp_len > $actual_len) {
5285        _croak "More descriptors in expansion of section 3"
5286            . " than what can fit in the given length of section 4"
5287                . " ($comp_len versus $actual_len bits)";
5288    } else {
5289        return if not $Strict_checking; # Excessive bytes in section 4
5290                                        # does not prevent further decoding
5291        return if $Noqc;  # No more sensible checks to do in this case
5292
5293        my $bufr_edition = $self->{BUFR_EDITION};
5294        my $actual_bytes = $actual_len/8; # This is sure to be an integer
5295        if ($bufr_edition < 4 and $actual_bytes % 2) {
5296            _complain("Section 4 is odd number ($actual_bytes) of bytes,"
5297                      . " which is an error in BUFR edition $bufr_edition");
5298        }
5299        my $comp_bytes = int($comp_len/8);
5300        $comp_bytes++ if $comp_len % 8; # Need to pad with zero bits
5301        $comp_bytes++ if $bufr_edition < 4 and $comp_bytes % 2; # Need to pad with an extra byte of zero bits
5302        if ($actual_bytes > $comp_bytes) {
5303            _complain("Binary data part of section 4 longer ($actual_bytes bytes)"
5304                      . " than expected from section 3 ($comp_bytes bytes)");
5305        }
5306    }
5307    return;
5308}
5309
5310# Trim string, also removing nulls (and _complain if nulls found).
5311# If strict_checking, checks also for bit 1 set in each character
5312sub _trim {
5313    my ($str, $id) = @_;
5314    return unless defined $str;
5315    if ($str =~ /\0/) {
5316        (my $str2 = $str) =~ s|\0|\\0|g;
5317        _complain("Nulls (" . '\0'
5318                  . ") found in string '$str2' for descriptor $id");
5319        $str =~ s/\0//g;
5320    } elsif ($Strict_checking && $str =~/^ +$/) {
5321        _complain("Only spaces ('$str') found for descriptor $id, "
5322                  . "ought to have been encoded as missing value ");
5323    }
5324
5325    $str =~ s/\s+$//;
5326    $str =~ s/^\s+//;
5327
5328    if ($Strict_checking && $str ne '') {
5329        foreach my $char (split //, $str) {
5330            if (ord($char) > 127) {
5331                _complain("Character $char (ascii value " . ord($char) .
5332                          ") in string '$str' is not allowed in CCITTIA5");
5333                last; # Don't want to warn for every bad character
5334            }
5335        }
5336    }
5337    return $str;
5338}
5339
5340## Remove leading and trailing spaces in the strings provided, then add
5341## spaces if necessary so that all strings have same length as largest
5342## trimmed string. This length (in bytes) is returned
5343sub _trimpad {
5344    my $string_ref = shift;
5345    my $largest_length = 0;
5346    foreach my $string (@{$string_ref}) {
5347        if (defined $string) {
5348            $string =~ s/^\s+//;
5349            $string =~ s/\s+$//;
5350            if (length $string > $largest_length) {
5351                $largest_length = length $string;
5352            }
5353        }
5354    }
5355    foreach my $string (@{$string_ref}) {
5356        if (defined $string) {
5357            $string .= ' ' x ($largest_length - length $string);
5358        }
5359    }
5360    return $largest_length;
5361}
5362
5363## Use timegm in Time::Local to validate date and time in section 1
5364sub _validate_datetime {
5365    my $self = shift;
5366    my $bufr_edition = $self->{BUFR_EDITION};
5367    my $year = $bufr_edition < 4 ? $self->{YEAR_OF_CENTURY} + 2000
5368                                 : $self->{YEAR};
5369    my $month = $self->{MONTH} - 1;
5370    my $second = $bufr_edition == 4 ? $self->{SECOND} : 0;
5371
5372    # All datetime variables set to 0 should be considered ok
5373    return if ($self->{MINUTE} == 0 && $self->{HOUR} == 0
5374           && $self->{DAY} == 0 && $self->{MONTH} == 0
5375           && $second == 0 && ($year == 0 || $year == 2000));
5376
5377    eval {
5378        my $dummy = timegm($second,$self->{MINUTE},$self->{HOUR},
5379                           $self->{DAY},$month,$year);
5380    };
5381
5382    _complain("Invalid date in section 1: $@") if $@;
5383}
5384
5385## Return number of bits necessary to store the nonnegative number $n
5386## (1 for 0,1, 2 for 2,3, 3 for 4,5,6,7 etc)
5387sub _get_number_of_bits_to_store {
5388    my $n = shift;
5389    return 1 if $n == 0;
5390    my $x = 1;
5391    my $i = 0;
5392    while ($x < $n) {
5393        $i++;
5394        $x *= 2;
5395    }
5396    return $x==$n ? $i+1 : $i;
5397}
5398
5399## Find minimum value among set of numbers (undefined values
5400## permitted, but at least one value must be defined). Also returns
5401## for which number the minimum occurs (counting from 1).
5402sub _minimum {
5403    my $v_ref = shift;
5404    my $min = 2**63;
5405    my $idx = 0;
5406    my $i=0;
5407    foreach my $v (@{$v_ref}) {
5408        $i++;
5409        next if not defined $v;
5410        if ($v < $min) {
5411            $min = $v;
5412            $idx = $i;
5413        }
5414    }
5415    return ($min, $idx);
5416}
5417
5418## Find maximum value among set of nonnegative numbers or undefined values
5419sub _maximum {
5420    my $v_ref = shift;
5421    my $max = 0;
5422    foreach my $v (@{$v_ref}) {
5423        next if not defined $v;
5424        if ($v > $max) {
5425            $max = $v;
5426        }
5427    }
5428    _croak "Internal error: Found no maximum value" if $max < 0;
5429    return $max;
5430}
5431
5432## Return index of first occurrence av $value in $list, undef if no match
5433sub _get_index_in_list {
5434    my ($list, $value) = @_;
5435    for (my $i=0; $i <= $#{$list}; $i++) {
5436        if ($list->[$i] eq $value) { # Match
5437            return $i;
5438        }
5439    }
5440    # No match
5441    return undef;
5442}
5443
5444## Apply the operator descriptor $id, adjusting $pos and
5445## @operators. Also returning $bm_idesc (explained in start of module)
5446## and a hint of what to do next in $flow
5447sub _apply_operator_descriptor {
5448    my $self = shift;
5449    my ($id, $x, $y, $pos, $isub, $next_id, @operators) = @_;
5450    # $isub should be 0 for compressed messages, else subset number
5451
5452    my $flow = '';
5453    my $bm_idesc = '';
5454
5455    if ($y == 0 && $x =~ /^[12378]$/) {
5456        # 20[12378]000 Cancellation of a data descriptor operator
5457        _complain("$id Cancelling unused operator")
5458            if $Strict_checking and !grep {$_ == $x} @operators;
5459        @operators = grep {$_ != $x} @operators;
5460        if ($x == 1) {
5461            $self->{NUM_CHANGE_OPERATORS}-- if $self->{CHANGE_WIDTH};
5462            undef $self->{CHANGE_WIDTH};
5463        } elsif ($x == 2) {
5464            $self->{NUM_CHANGE_OPERATORS}-- if $self->{CHANGE_SCALE};
5465            undef $self->{CHANGE_SCALE};
5466        } elsif ($x == 3) {
5467            $self->{NUM_CHANGE_OPERATORS}-- if $self->{NEW_REFVAL_OF};
5468            undef $self->{NEW_REFVAL_OF};
5469        } elsif ($x == 7) {
5470            $self->{NUM_CHANGE_OPERATORS}-- if $self->{CHANGE_SRW};
5471            undef $self->{CHANGE_SRW};
5472        } elsif ($x == 8) {
5473            $self->{NUM_CHANGE_OPERATORS}-- if $self->{CHANGE_CCITTIA5_WIDTH};
5474            undef $self->{CHANGE_CCITTIA5_WIDTH};
5475        }
5476        $self->_spew(4, "$id * Reset %s",
5477                     ("width of CCITTIA5 field","data width","scale","reference values",0,0,0,
5478                     "increase of scale, reference value and data width")[$x % 8]) if $Spew;
5479        $flow = 'next';
5480    } elsif ($x == 1) {
5481        # ^201 Change data width
5482        _croak "201 operator cannot be nested within 207 operator"
5483            if grep {$_ == 7} @operators;
5484        $self->{NUM_CHANGE_OPERATORS}++ if !$self->{CHANGE_WIDTH};
5485        $self->{CHANGE_WIDTH} = $y-128;
5486        $self->_spew(4, "$id * Change data width: %d", $self->{CHANGE_WIDTH}) if $Spew;
5487        push @operators, $x;
5488        $flow = 'next';
5489    } elsif ($x == 2) {
5490        # ^202 Change scale
5491        _croak "202 operator cannot be nested within 207 operator"
5492            if grep {$_ == 7} @operators;
5493        $self->{NUM_CHANGE_OPERATORS}++ if !$self->{CHANGE_SCALE};
5494        $self->{CHANGE_SCALE} = $y-128;
5495        $self->_spew(4, "$id * Change scale: %d", $self->{CHANGE_SCALE}) if $Spew;
5496        push @operators, $x;
5497        $flow = 'next';
5498    } elsif ($x == 3 && $y == 255) {
5499        # 203255 Terminate change reference value definition
5500        $self->_spew(4, "$id * Terminate reference value definition %s",
5501                     '203' . (defined $self->{CHANGE_REFERENCE_VALUE}
5502                     ? sprintf("%03d", $self->{CHANGE_REFERENCE_VALUE}) : '???')) if $Spew;
5503        _complain("$id no current change reference value to terminate")
5504            unless defined $self->{CHANGE_REFERENCE_VALUE};
5505        undef $self->{CHANGE_REFERENCE_VALUE};
5506        $flow = 'next';
5507    } elsif ($x == 3) {
5508        # ^203 Change reference value
5509        _croak "203 operator cannot be nested within 207 operator"
5510            if grep {$_ == 7} @operators;
5511        $self->_spew(4, "$id * Change reference value") if $Spew;
5512        # Get reference value from data stream ($y == number of bits)
5513        $self->{NUM_CHANGE_OPERATORS}++ if !$self->{CHANGE_REFERENCE_VALUE};
5514        $self->{CHANGE_REFERENCE_VALUE} = $y;
5515        push @operators, $x;
5516        $flow = 'next';
5517    } elsif ($x == 4) {
5518        # ^204 Add associated field
5519        if ($y > 0) {
5520            _croak "$id Nesting of Add associated field is not implemented"
5521                if $self->{ADD_ASSOCIATED_FIELD};
5522            $self->{ADD_ASSOCIATED_FIELD} = $y;
5523            $flow = 'next';
5524        } else {
5525            _complain "$id No previous Add associated field"
5526                unless defined $self->{ADD_ASSOCIATED_FIELD};
5527            undef $self->{ADD_ASSOCIATED_FIELD};
5528            $flow = 'next';
5529        }
5530    } elsif ($x == 5) {
5531        # ^205 Signify character (i.e. the following $y bytes is
5532        # character information)
5533        $flow = 'signify_character';
5534    } elsif ($x == 6) {
5535        # ^206 Signify data width for the immediately following local
5536        # descriptor. If we find this local descriptor in BUFR table B
5537        # with data width $y bits, we assume we can use this table
5538        # entry to decode/encode the value properly, and can just
5539        # ignore the operator descriptor. Else we skip the local
5540        # descriptor and the corresponding value if decoding, or have
5541        # to give up if encoding
5542        my $ff = substr($next_id,0,1);
5543        _croak("Descriptor $next_id following Signify data width"
5544                  . "  operator $_ is not an element descriptor")
5545            if $ff != 0;
5546        if ($Strict_checking) {
5547            my $xx = substr($next_id,1,2);
5548            my $yy = substr($next_id,3,3);
5549            _complain("Descriptor $next_id following Signify data width"
5550                  . "  operator $id is not a local descriptor")
5551                if ($xx < 48 && $yy < 192);
5552        }
5553        if (exists $self->{B_TABLE}->{$next_id}
5554            and (split /\0/, $self->{B_TABLE}->{$next_id})[-1] == $y) {
5555            $self->_spew(4, "Found $next_id with data width $y, ignoring $id") if $Spew;
5556            $flow = 'next';
5557        } else {
5558            _croak "Cannot encode descriptor $next_id (following $id), not found in table B"
5559                if $self->{CODING} eq 'ENCODE';
5560            $self->_spew(4, "$_: Did not find $next_id in table B."
5561                         . " Skipping $id and $next_id.") if $Spew;
5562            $pos += $y;  # Skip next $y bits in bitstream if decoding
5563            $flow = 'skip';
5564        }
5565
5566    } elsif ($x == 7) {
5567        # ^207 Increase scale, reference value and data width
5568        _croak "207 operator cannot be nested within 201/202/203 operators"
5569            if grep {$_ == 1 || $_ == 2 || $_ == 3} @operators;
5570        $self->{NUM_CHANGE_OPERATORS}++ if !$self->{CHANGE_SRW};
5571        $self->{CHANGE_SRW} = $y;
5572        $self->_spew(4, "$id * Increase scale, reference value and data width: %d", $y) if $Spew;
5573        push @operators, $x;
5574        $flow = 'next';
5575    } elsif ($x == 8) {
5576        # ^208 Change data width for ascii data
5577        $self->{NUM_CHANGE_OPERATORS}++ if !$self->{CHANGE_CCITTIA5_WIDTH};
5578        $self->{CHANGE_CCITTIA5_WIDTH} = $y*8;
5579        $self->_spew(4, "$id * Change width for CCITTIA5 field: %d bytes", $y) if $Spew;
5580        push @operators, $x;
5581        $flow = 'next';
5582    } elsif ($x == 9) {
5583        # ^209 IEEE floating point representation
5584        _croak "$id IEEE floating point representation (not implemented)";
5585    } elsif ($x == 21) {
5586        # ^221 Data not present
5587        _croak "$id Data not present (not implemented)";
5588    } elsif ($x == 22 && $y == 0) {
5589        # 222000 Quality information follows
5590        push @{ $self->{BITMAP_OPERATORS} }, '222000';
5591        $self->{NUM_BITMAPS}++;
5592        # Mark that a bit map probably needs to be built
5593        $self->{BUILD_BITMAP} = 1;
5594        $self->{BITMAP_INDEX} = 0;
5595        $flow = $Noqc ? 'last' : 'no_value';
5596    } elsif ($x == 23 && $y == 0) {
5597        # 223000 Substituted values follow, each one following a
5598        # descriptor 223255. Which value they are a substitute for is
5599        # defined by a bit map, which already may have been defined
5600        # (if descriptor 23700 is encountered), or will shortly be
5601        # defined by data present indicators (031031)
5602        push @{ $self->{BITMAP_OPERATORS} }, '223000';
5603        $self->{NUM_BITMAPS}++;
5604        # Mark that a bit map probably needs to be built
5605        $self->{BUILD_BITMAP} = 1;
5606        $self->{BITMAP_INDEX} = 0;
5607        $flow = 'no_value';
5608    } elsif ($x == 23 && $y == 255) {
5609        # 223255 Substituted values marker operator
5610        _croak "$id No bit map defined"
5611            unless (defined $self->{CURRENT_BITMAP} || defined $self->{REUSE_BITMAP})
5612            && $self->{BITMAP_OPERATORS}[-1] eq '223000';
5613        if (defined $self->{REUSE_BITMAP}) {
5614            _croak "More 223255 encountered than current bit map allows"
5615                unless @{ $self->{REUSE_BITMAP}->[$isub] };
5616            $bm_idesc = undef;
5617        } else {
5618            _croak "More 223255 encountered than current bit map allows"
5619                unless @{$self->{CURRENT_BITMAP}};
5620            $bm_idesc = shift @{$self->{CURRENT_BITMAP}};
5621        }
5622        $flow = 'redo_bitmap';
5623    } elsif ($x == 24 && $y == 0) {
5624        # 224000 First order statistical values follow
5625        push @{ $self->{BITMAP_OPERATORS} }, '224000';
5626        $self->{NUM_BITMAPS}++;
5627        # Mark that a bit map probably needs to be built
5628        $self->{BUILD_BITMAP} = 1;
5629        $self->{BITMAP_INDEX} = 0;
5630        $flow = 'no_value';
5631    } elsif ($x == 24 && $y == 255) {
5632        # 224255 First order statistical values marker operator
5633        _croak "$id No bit map defined"
5634            unless (defined $self->{CURRENT_BITMAP} || defined $self->{REUSE_BITMAP})
5635            && $self->{BITMAP_OPERATORS}[-1] eq '224000';
5636        if (defined $self->{REUSE_BITMAP}) {
5637            _croak "More 224255 encountered than current bit map allows"
5638                unless @{ $self->{REUSE_BITMAP}->[$isub] };
5639            $bm_idesc = undef;
5640        } else {
5641            _croak "More 224255 encountered than current bit map allows"
5642                unless @{$self->{CURRENT_BITMAP}};
5643            $bm_idesc = shift @{$self->{CURRENT_BITMAP}};
5644        }
5645        $flow = 'redo_bitmap';
5646    } elsif ($x == 25 && $y == 0) {
5647        # 225000 Difference statistical values follow
5648        push @{ $self->{BITMAP_OPERATORS} }, '225000';
5649        $self->{NUM_BITMAPS}++;
5650        # Mark that a bit map probably needs to be built
5651        $self->{BUILD_BITMAP} = 1;
5652        $self->{BITMAP_INDEX} = 0;
5653        $flow = 'no_value';
5654    } elsif ($x == 25 && $y == 255) {
5655        # 225255 Difference statistical values marker operator
5656        _croak "$id No bit map defined\n"
5657            unless (defined $self->{CURRENT_BITMAP} || defined $self->{REUSE_BITMAP})
5658            && $self->{BITMAP_OPERATORS}[-1] eq '225000';
5659        if (defined $self->{REUSE_BITMAP}) {
5660            _croak "More 225255 encountered than current bit map allows"
5661                unless @{ $self->{REUSE_BITMAP}->[$isub] };
5662            $bm_idesc = undef;
5663        } else {
5664            _croak "More 225255 encountered than current bit map allows"
5665                unless @{$self->{CURRENT_BITMAP}};
5666            $bm_idesc = shift @{$self->{CURRENT_BITMAP}};
5667        }
5668        # Must remember to change data width and reference value
5669        $self->{NUM_CHANGE_OPERATORS}++ if !$self->{DIFFERENCE_STATISTICAL_VALUE};
5670        $self->{DIFFERENCE_STATISTICAL_VALUE} = 1;
5671        $flow = 'redo_bitmap';
5672    } elsif ($x == 32 && $y == 0) {
5673        # 232000 Replaced/retained values follow, each one following a
5674        # descriptor 232255. Which value they are a replacement for is
5675        # defined by a bit map, which already may have been defined
5676        # (if descriptor 23700 is encountered), or will shortly be
5677        # defined by data present indicators (031031)
5678        push @{ $self->{BITMAP_OPERATORS} }, '232000';
5679        $self->{NUM_BITMAPS}++;
5680        # Mark that a bit map probably needs to be built
5681        $self->{BUILD_BITMAP} = 1;
5682        $self->{BITMAP_INDEX} = 0;
5683        $flow = 'no_value';
5684    } elsif ($x == 32 && $y == 255) {
5685        # 232255 Replaced/retained values marker operator
5686        _croak "$id No bit map defined"
5687            unless (defined $self->{CURRENT_BITMAP} || defined $self->{REUSE_BITMAP})
5688            && $self->{BITMAP_OPERATORS}[-1] eq '232000';
5689        if (defined $self->{REUSE_BITMAP}) {
5690            _croak "More 232255 encountered than current bit map allows"
5691                unless @{ $self->{REUSE_BITMAP}->[$isub] };
5692            $bm_idesc = undef;
5693        } else {
5694            _croak "More 232255 encountered than current bit map allows"
5695                unless @{$self->{CURRENT_BITMAP}};
5696            $bm_idesc = shift @{$self->{CURRENT_BITMAP}};
5697        }
5698        $flow = 'redo_bitmap';
5699    } elsif ($x == 35 && $y == 0) {
5700        # 235000 Cancel backward data reference
5701        undef $self->{REUSE_BITMAP};
5702        $self->{BACKWARD_DATA_REFERENCE} = $self->{NUM_BITMAPS} + 1;
5703        $flow = 'no_value';
5704    } elsif ($x == 36 && $y == 0) {
5705        # 236000 Define data present bit map
5706        undef $self->{CURRENT_BITMAP};
5707        $self->{BUILD_BITMAP} = 1;
5708        $self->{BITMAP_INDEX} = 0;
5709        $flow = 'no_value';
5710    } elsif ($x == 37 && $y == 0) {
5711        # 237000 Use defined data present bit map
5712        _croak "$id No previous bit map defined"
5713            unless defined $self->{BITMAPS};
5714        my %hash = @{ $self->{BITMAPS}->[$self->{NUM_BITMAPS}-1]->[$isub] };
5715        $self->{REUSE_BITMAP}->[$isub] = [sort {$a <=> $b} keys %hash];
5716        $flow = 'no_value';
5717    } elsif ($x == 37 && $y == 255) {
5718        # 237255 Cancel 'use defined data present bit map'
5719        _complain("$id No data present bit map to cancel")
5720            unless defined $self->{REUSE_BITMAP};
5721        undef $self->{REUSE_BITMAP};
5722        $flow = 'next';
5723    } elsif ($x == 41 && $y == 0) {
5724        # 241000 Define event
5725        _croak "$id Define event (not implemented)";
5726    } elsif ($x == 41 && $y == 255) {
5727        # 241255 Cancel define event
5728        _croak "$id Cancel define event (not implemented)";
5729    } elsif ($x == 42 && $y == 0) {
5730        # 242000 Define conditioning event
5731        _croak "$id Define conditioning event (not implemented)";
5732    } elsif ($x == 42 && $y == 255) {
5733        # 242255 Cancel define conditioning event
5734        _croak "$id Cancel define conditioning event (not implemented)";
5735    } elsif ($x == 43 && $y == 0) {
5736        # 243000 Categorial forecast values follow
5737        _croak "$id Categorial forecast values follow (not implemented)";
5738    } elsif ($x == 43 && $y == 255) {
5739        # 243255 Cancel categorial forecast values follow
5740        _croak "$id Cancel categorial forecast values follow (not implemented)";
5741    } else {
5742        _croak "$id Unknown data description operator";
5743    }
5744
5745    return ($pos, $flow, $bm_idesc, @operators);
5746}
5747
5748## Extract data from selected subsets in selected bufr objects, joined
5749## into a single ($data_refs, $desc_refs), to later be able to make a
5750## single BUFR message by calling encode_message. Also returns number
5751## of subsets extracted.
5752sub join_subsets {
5753    my $self = shift;
5754    my (@bufr, @subset_list);
5755    my $last_arg_was_bufr;
5756    my $num_objects = 0;
5757    while (@_) {
5758        my $arg = shift;
5759        if (ref($arg) eq 'Geo::BUFR') {
5760            $bufr[$num_objects++] = $arg;
5761            $last_arg_was_bufr = 1;
5762        } elsif (ref($arg) eq 'ARRAY') {
5763            _croak "Wrong input (multiple array refs) to join_subsets"
5764                unless $last_arg_was_bufr;
5765            $subset_list[$num_objects-1] = $arg;
5766            $last_arg_was_bufr = 0;
5767        } else {
5768            _croak "Input is not Geo::BUFR object or array ref in join_subsets";
5769        }
5770    }
5771
5772    my ($data_refs, $desc_refs);
5773    my $n = 1; # Number of subsets included
5774    # Ought to check for common section 3 also?
5775    for (my $i=0; $i < $num_objects; $i++) {
5776        $bufr[$i]->rewind();
5777        my $isub = 1;
5778        if (!exists $subset_list[$i]) { # grab all subsets from this object
5779            while (not $bufr[$i]->eof()) {
5780                my ($data, $descriptors) = $bufr[$i]->next_observation();
5781                last if !$data;
5782                $self->_spew(2, "Joining subset %d from bufr object %d", $isub, $i) if $Spew;
5783                $data_refs->[$n] = $data;
5784                $desc_refs->[$n++] = $descriptors;
5785                $isub++;
5786            }
5787        } else { # grab the subsets specified, also inserting them in the specified order
5788            my $num_found = 0;
5789            while (not $bufr[$i]->eof()) {
5790                my ($data, $descriptors) = $bufr[$i]->next_observation();
5791                last if !$data;
5792                my $index = _get_index_in_list($subset_list[$i], $isub);
5793                if (defined $index) {
5794                    $self->_spew(2, "Joining subset %d from subset %d"
5795                                 . " in bufr object %d", $isub, $index, $i) if $Spew;
5796                    $data_refs->[$n + $index] = $data;
5797                    $desc_refs->[$n + $index] = $descriptors;
5798                    $num_found++;
5799                }
5800                $isub++;
5801            }
5802            _croak "Mismatch between number of subsets found ($num_found) and "
5803                . "expected from argument [@{$subset_list[$i]}] to join_subsets"
5804                    if $num_found != @{$subset_list[$i]};
5805            $n += $num_found;
5806        }
5807        $bufr[$i]->rewind();
5808    }
5809    $n--;
5810    return ($data_refs, $desc_refs, $n)
5811}
5812
58131;  # Make sure require or use succeeds.
5814
5815
5816__END__
5817# Below is documentation for the module. You'd better read it!
5818
5819=pod
5820
5821=encoding utf8
5822
5823=head1 NAME
5824
5825Geo::BUFR - Perl extension for handling of WMO BUFR files.
5826
5827=head1 SYNOPSIS
5828
5829  # A simple program to print decoded contents of a BUFR file. Note
5830  # that a more sophisticated program (bufrread.pl) is included in the
5831  # package
5832
5833  use Geo::BUFR;
5834
5835  Geo::BUFR->set_tableformat('BUFRDC'); # ECCODES is also possible
5836  Geo::BUFR->set_tablepath('path to BUFR tables');
5837
5838  my $bufr = Geo::BUFR->new();
5839
5840  $bufr->fopen('name of BUFR file');
5841
5842  while (not $bufr->eof()) {
5843      my ($data, $descriptors) = $bufr->next_observation();
5844      print $bufr->dumpsections($data, $descriptors) if $data;
5845  }
5846
5847  $bufr->fclose();
5848
5849
5850=head1 DESCRIPTION
5851
5852B<BUFR> = B<B>inary B<U>niversal B<F>orm for the B<R>epresentation of
5853meteorological data. BUFR is approved by WMO (World Meteorological
5854Organization) as the standard universal exchange format for
5855meteorological observations, gradually replacing a lot of older
5856alphanumeric data formats.
5857
5858This module provides methods for decoding and encoding BUFR messages,
5859and for displaying information in BUFR B and D tables and in BUFR flag
5860and code tables.
5861
5862Installing this module also installs some programs: C<bufrread.pl>,
5863C<bufrresolve.pl>, C<bufrextract.pl>, C<bufrencode.pl>,
5864C<bufr_reencode.pl> and C<bufralter.pl>. See
5865L<https://wiki.met.no/bufr.pm/start> for examples of use. For the
5866majority of potential users of Geo::BUFR I would expect these programs
5867to be all that you will need Geo::BUFR for.
5868
5869Note that being Perl, this module cannot compete in speed with for
5870example the (free) ECMWF BUFRDC Fortran library. Still, some effort
5871has been invested in making the module reasonable fast in that the
5872core routines for encoding and decoding bitstreams are implemented in
5873C.
5874
5875
5876=head1 METHODS
5877
5878The C<get_> methods will return undef if the requested information is
5879not available. The C<set_> methods as well as C<fopen>, C<fclose>,
5880C<copy_from> and C<rewind> will always return 1, or croak if failing.
5881
5882Create a new object:
5883
5884  $bufr = Geo::BUFR->new();
5885  $bufr = Geo::BUFR->new($BUFRmessages);
5886
5887The second form of C<new> is useful if you want to provide the BUFR
5888messages to decode directly as an input buffer (string). Note that
5889merely calling C<new($BUFRmessages)> will not decode anything in the
5890BUFR messages, for that you need to call C<next_observation()> from
5891the newly created object. You also have the option of providing the
5892BUFR messages in a file, using the no argument form of C<new()> and
5893then calling C<fopen>.
5894
5895Associate the object with a file for reading of BUFR messages:
5896
5897  $bufr->fopen($filename);
5898
5899Close the associated file that was opened by fopen:
5900
5901  $bufr->fclose();
5902
5903Check for end-of-file (or end of the input buffer provided as argument
5904to C<new>):
5905
5906  $bufr->eof();
5907
5908Returns true if end-of-file (or end of input buffer) is reached, false
5909if not.
5910
5911Ensure that next call to C<next_observation> will decode first subset
5912in first BUFR message:
5913
5914  $bufr->rewind();
5915
5916Copy from an existing object:
5917
5918  $bufr1->copy_from($bufr2,$what);
5919
5920If $what is 'all' or not provided, will copy everything in $bufr2 into
5921$bufr1, i.e. making a clone. If $what is 'metadata', only the metadata
5922in section 0, 1 and 3 will be copied (and all of section 2 if present).
5923
5924Load B and D tables:
5925
5926  $bufr->load_BDtables($table);
5927
5928$table is optional, and should for BUFRDC be (base)name of a file
5929containing a BUFR table B or D, using the ECMWF BUFRDC naming
5930convention, i.e. [BD]'table_version'.TXT. For ECCODES, use last part
5931of path, e.g. on UNIX-like systems '0/wmo/18' for master tables and
5932'0/local/8/78/236' for local tables, or both if that is needed,
5933e.g. '0/wmo/18,0/local/8/78/236'. If no argument is provided,
5934C<load_BDtables()> will use BUFR section 1 information in the $bufr
5935object to decide which tables to load (which for ECCODES might be up
5936to 4 table files, both local and master tables). Previously loaded
5937tables are kept in memory, and C<load_BDtables> will return
5938immediately if the tables already have been loaded. Will die (croak)
5939if tables cannot be found, but (in the no argument version) not if
5940these are local tables (Local table version number > 0) and the
5941corresponding master tables exist (Local table version number = 0),
5942which then will be loaded instead. Returns table version for the
5943tables loaded (see C<get_table_version>).
5944
5945Load C table:
5946
5947  $bufr->load_Ctable($table,$default_table);
5948
5949Both $table and $default_table are optional. This will load the flag
5950and code tables (if not already loaded), which in ECMWF BUFRDC are put
5951in tables C'table_version'.TXT (not to be confused with WMO BUFR table
5952C, which contains the operator descriptors). $default_table will be
5953used if $table is not found. For $table and $default_table in ECCODES,
5954use (just like for C<load_BDtables>) last part of path, e.g. on
5955UNIX-like systems '0/wmo/18' for master tables and '0/local/8/78/236'
5956for local tables, or both if that is needed,
5957e.g. '0/wmo/18,0/local/8/78/236'. Will for ECCODES then load all
5958tables in the codetables subdirectory. If no arguments are provided,
5959C<load_Ctable()> will use BUFR section 1 information in the $bufr
5960object to decide which table(s) to load. Will die (croak) if table
5961cannot be found, but not if this is a local table and the
5962corresponding master table exists, which then will be loaded
5963instead. Returns table version for the table loaded.
5964
5965Get next observation (next subset in current BUFR message or first subset
5966in next message):
5967
5968  ($data, $descriptors) = $bufr->next_observation();
5969
5970where $descriptors is a reference to the array of fully expanded
5971descriptors for this subset, $data is a reference to the corresponding
5972values. This method is meant to be used to iterate through all BUFR
5973messages in the file or input buffer (see C<new>) associated with the
5974$bufr object, see example program in L</SYNOPSIS>. Whenever a new BUFR
5975message is reached, section 0-3 will also be decoded, the contents of
5976which is then available through the access methods listed below. This
5977is the main BUFR decoding routine in Geo::BUFR, and will call
5978C<load_BDtables()> internally (unless decoding of section 4 has been
5979turned off by use of C<set_nodata> or C<set_filter_db>), but not
5980C<load_Ctable>. Consult L</"DECODING/ENCODING"> if you want more
5981precise info about what is returned in $data and $descriptors.
5982
5983C<next_observation> will return the empty list (so both $data and
5984$descriptors will be undef) in the following cases: if there are no
5985more BUFR messages in file/input buffer (so next call to C<eof()> will
5986return false), if no decoding of section 4 was requested in
5987C<set_nodata>, if filtering was turned on in C<set_filter_db> and the
5988BUFR message met the filter criteria in the user defined callback
5989function, or if the BUFR message contained 0 subsets. If you need to
5990distinguish the first case from the rest, one way would be to check
5991C<get_current_subset_number()> which will return 0 only in this first
5992case.
5993
5994If an error is met during decoding, it is possible to trap the error
5995in an eval and then continue calling C<next_observation> (as
5996demonstrated in source code of C<bufrread.pl>). Care has been taken
5997that BUFR messages with incorrectly stated BUFR length should not
5998cause later proper BUFR messages to be skipped. But the possibility of
5999an erroneous last BUFR message in file led to abandonment of the
6000convenient feature retained until Geo::BUFR version 1:25 of C<eof>
6001always returning false if there were no more BUFR messages in
6002file/input buffer. Instead you should expect last call to
6003C<next_observation> to return false (empty list).
6004
6005Filter BUFR messages:
6006
6007  $bufr->set_filter_cb(\&callback,@args);
6008
6009Here user is responsible for defining the callback subroutine. This
6010subroutine will then be called in C<next_observation> (with arguments
6011@args if provided) right after section 3 is decoded, and, if returning
6012true, will cause C<next_observation> to return immediately, without
6013even trying to decode section 4 (the data section). Here is a simple
6014example of such a callback (without arguments), filtering on AHL and
6015Data category (table A) of the BUFR message.
6016
6017  sub callback {
6018      my $obj = shift;
6019      return 1 if $obj->get_data_category != 0;
6020      my $ahl = $obj->get_current_ahl() || '';
6021      return ($ahl =~ /^IS.... (ENMI|TEST)/);
6022  }
6023
6024Check result of filtering:
6025
6026  $bufr->is_filtered();
6027
6028Will return true (1) if C<next_observation> returned immediately as
6029described for C<set_filter_cb> above. But calling C<is_filtered>
6030should rarely be needed, as in most cases the simple check 'next if
6031!$data' after calling C<next_observation> would be the natural way to
6032proceed.
6033
6034Print the contents of a subset in BUFR message:
6035
6036  print $bufr->dumpsections($data,$descriptors,$options);
6037
6038$options is optional. If this is first subset in message, will start
6039by printing message number and, if this is first message in a GTS
6040bulletin, AHL (Abbreviated Header Line), as well as contents of
6041sections 0, 1 and 3. For section 4, will also print subset
6042number. $options should be an anonymous hash with possible keys
6043'width' and 'bitmap', e.g. { width => 20, bitmap => 0 }. 'bitmap'
6044controls which of C<dumpsection4> and C<dumpsection4_with_bitmaps>
6045will be called internally by C<dumpsections>. Default value for
6046'bitmap' is 1, causing C<dumpsection4_with_bitmaps> to be
6047called. 'width' controls the value of $width used by the
6048C<dumpsection4...> methods, default is 15. If you intend to provide
6049the output from C<dumpsections> as input to C<reencode_message>, be
6050sure to set 'bitmap' to 0, and 'width' not smaller than the largest
6051data width in bytes among the descriptors with unit CCITTIA5 occuring
6052in the message.
6053
6054Normally C<dumpsections> is called after C<next_observation>, with
6055same arguments $data,$descriptors as returned from this call. From the
6056examples given at L<https://wiki.met.no/bufr.pm/start#bufrreadpl> you
6057can get an impression of what the output might look like. If
6058C<dumpsections> does not give you exactly what you want, you might
6059prefer to instead call the individual dumpsection methods below.
6060
6061Print the contents of sections 0-3 in BUFR message:
6062
6063  print $bufr->dumpsection0();
6064  print $bufr->dumpsection1();
6065  print $bufr->dumpsection2($sec2_code_ref);
6066  print $bufr->dumpsection3();
6067
6068C<dumpsection2> returns an empty string if there is no optional
6069section in the message. The argument should be a reference to a
6070subroutine which takes the optional section as (a string) argument and
6071returns the text you want displayed after the 'Length of section:'
6072line. For general BUFR messages probably the best you can do is
6073displaying a hex dump, in which case
6074
6075  sub {return '    Hex dump:' . ' 'x26 . unpack('H*',substr(shift,4))}
6076
6077might be a suitable choice for $sec2_code_ref. For most applications
6078there should be no real need to call C<dumpsection2>.
6079
6080Print the data of a subset (descriptor, value, name and unit):
6081
6082  print $bufr->dumpsection4($data,$descriptors,$width);
6083  print $bufr->dumpsection4_with_bitmaps($data,$descriptors,$width);
6084
6085$width fixes the number of characters used for displaying the data
6086values, and is optional (defaults to 15). $data and $descriptors are
6087references to arrays of data values and BUFR descriptors respectively,
6088likely to have been fetched from C<next_observation>. Code and flag
6089values will be resolved if a C table has been loaded, i.e. if
6090C<load_Ctable> has been called earlier on. C<dumpsection4_with_bitmaps>
6091will display the bit-mapped values side by side with the corresponding
6092data values. If there is no bit-map in the BUFR message,
6093C<dumpsection4_with_bitmaps> will provide same output as
6094C<dumpsection4>. See L</"DECODING/ENCODING"> for some more information
6095about what is printed, and
6096L<https://wiki.met.no/bufr.pm/start#bufrreadpl> for real life examples
6097of output.
6098
6099Set verbose level:
6100
6101  Geo::BUFR->set_verbose($level); # 0 <= $level <= 6
6102  $bufr->set_verbose($level);
6103
6104Some info about what is going on in Geo::BUFR will be printed to
6105STDOUT if $level > 0. With $level set to 1, all that is printed is the
6106B, C and D tables used (with full path). Each line of verbose output
6107starts with 'BUFR.pm: ', except for the level 6 specific
6108output. Setting verbose level > 1 might be helpful when debugging, or
6109for example if you want to extract as much information as possible
6110from an incorrectly formatted BUFR message.
6111
6112No decoding of section 4 (data section):
6113
6114  Geo::BUFR->set_nodata($n);
6115 - $n=1 (or not provided): Skip decoding of section 4 (might speed up
6116   processing considerably if only metadata in section 1-3 is sought for)
6117 - $n=0: Decode section 4 (default in Geo::BUFR)
6118
6119No decoding of quality information:
6120
6121  Geo::BUFR->set_noqc($n);
6122 - $n=1 (or not provided): Don't decode quality information (more
6123   specifically: skip all descriptors after 222000)
6124 - $n=0: Decode quality information (default in Geo::BUFR)
6125
6126Enable/disable strict checking of BUFR format for recoverable errors
6127(like using BUFR compression for one subset message etc):
6128
6129  Geo::BUFR->set_strict_checking($n);
6130 - $n=0: disable checking (default in Geo::BUFR)
6131 - $n=1: warn (carp) if error but continue decoding
6132 - $n=2: die (croak) if error
6133
6134Confer L</STRICT CHECKING> for details of what is being checked if
6135strict checking is enabled.
6136
6137Show all BUFR table C operators (data description operators, F=2) as well
6138as all replication descriptors (F=1) when calling dumpsection4:
6139
6140  Geo::BUFR->set_show_all_operators($n);
6141 - $n=1 (or not provided): Show replication descriptors and all operators
6142 - $n=0: Show no replication descriptors and only the really informative
6143         data description operators (default in Geo::BUFR)
6144
6145C<set_show_all_operators(1)> cannot be combined with C<dumpsections>
6146with bitmap option set (which is the default).
6147
6148Set or get tableformat:
6149
6150  Geo::BUFR->set_tableformat($tableformat);
6151  $tableformat = Geo::BUFR->get_tableformat();
6152
6153Set or get tablepath:
6154
6155  Geo::BUFR->set_tablepath($tablepath);
6156  $tablepath = Geo::BUFR->get_tablepath();
6157
6158Get table version:
6159
6160  $table_version = $bufr->get_table_version($table);
6161
6162$table is optional. Return table version from $table if provided, or
6163else from section 1 information in the currently processed BUFR
6164message. For BUFRDC, this is a stripped down version of table name. If
6165for example $table = 'B0000000000088013001.TXT', will return
6166'0000000000088013001'. For ECCODES, this is last path of table
6167location (e.g. '0/wmo/29'), and a stringified list of two such paths
6168(master and local) if local tables are used
6169(e.g. '0/wmo/29,0/local/8/78/236'). Returns undef if impossible to
6170determine table version.
6171
6172Get number of subsets:
6173
6174  $nsubsets = $bufr->get_number_of_subsets();
6175
6176Get current subset number:
6177
6178  $subset_no = $bufr->get_current_subset_number();
6179
6180If decoding of section 4 has been skipped (due to use of C<set_nodata>
6181or C<set_filter_cb>), will return number of subsets. For a BUFR
6182message with 0 subsets, will actually return 1 (a bit weird perhaps,
6183but then this is a really weird kind of BUFR message to handle).
6184
6185Get current message number:
6186
6187  $message_no = $bufr->get_current_message_number();
6188
6189Get current BUFR message:
6190
6191    $binary_msg = get_bufr_message();
6192
6193This returns the original raw (binary, not the decoded) BUFR
6194message. An empty string will be returned if no BUFR message is found,
6195or if the currently processed BUFR message is erroneous (even if
6196section 4 is not decoded, there will at least be a check for finding
6197'7777' at expected end of BUFR message, as calculated from length of
6198BUFR message decoded from section 0).
6199
6200Get Abbreviated Header Line (AHL) before current message:
6201
6202  $ahl = $bufr->get_current_ahl();
6203
6204If there is no AHL immediately preceding current message, default is
6205for C<get_current_ahl> to return undef. Sometimes that might not be
6206what you want, e.g. when processing a file with GTS bulletins with
6207possibly more than one BUFR message in each bulletin, and especially
6208so if filtering on AHL using C<set_filter_cb>.
6209
6210  Geo::BUFR->reuse_current_ahl($n);
6211 - $n=1 (or not provided): Will cause C<get_current_ahl> to return last
6212   AHL extracted and not undef if currently processed BUFR message has
6213   no (immediately preceding) AHL
6214 - $n=0: Reset C<get_current_ahl> to default behaviour as described
6215   above
6216
6217Check if AHL has been reused:
6218
6219   $bufr->ahl_is_reused();
6220
6221Will return true (1) if the AHL returned by C<get_current_ahl> is a
6222reused one, i.e. the AHL is not immediately preceding the current BUFR
6223message.
6224
6225Check length of BUFR message (as stated in section 0):
6226
6227    $bufr->bad_bufrlength();
6228
6229Will return true (1) if no '7777' is found at the end of BUFR message
6230(as calculated from the stated length of BUFR message in section 0),
6231which usually means that the BUFR message is badly corrupted
6232(e.g. truncated). But note that there should be no need to call
6233C<bad_bufrlength> if section 4 is decoded, as in this case you should
6234expect C<next_observation> to die with a more precise error message
6235describing the kind of corruption found. If no decoding of section 4
6236is done (because C<set_nodata> or C<set_filter_cb> were called),
6237however, C<next_observation> is likely not to throw an error, and you
6238can use C<bad_bufrlength> to decide what to do next (see source code of
6239C<bufrextract.pl> for example of use).
6240
6241Accessor methods for section 0-3:
6242
6243  $bufr->set_<variable>($variable);
6244  $variable = $bufr->get_<variable>();
6245
6246where E<lt>variableE<gt> is one of
6247
6248  bufr_length (get only)
6249  bufr_edition
6250  master_table
6251  subcentre
6252  centre
6253  update_sequence_number
6254  optional_section (0 or 1)
6255  data_category
6256  int_data_subcategory
6257  loc_data_subcategory
6258  data_subcategory
6259  master_table_version
6260  local_table_version
6261  year_of_century
6262  year
6263  month
6264  day
6265  hour
6266  minute
6267  second
6268  local_use
6269  number_of_subsets
6270  observed_data (0 or 1)
6271  compressed_data (0 or 1)
6272  descriptors_unexpanded
6273
6274C<set_year_of_century(0)> will set year of century to 100.
6275C<get_year_of_century> will for BUFR edition 4 calculate year of
6276century from year in section 1.
6277
6278
6279Encode a new BUFR message:
6280
6281  $new_message = $bufr->encode_message($data_refs,$desc_refs);
6282
6283where $desc_refs->[$i] is a reference to the array of fully expanded
6284descriptors for subset number $i ($i=1 for first subset),
6285$data_refs->[$i] is a reference to the corresponding values, using
6286undef for missing values. The required metadata in section 0, 1 and 3
6287must have been set in $bufr before calling this method. See
6288L</"DECODING/ENCODING"> for meaning of 'fully expanded descriptors'.
6289
6290Encode a (single subset) NIL message:
6291
6292  $new_message = $bufr->encode_nil_message($stationid_ref,$delayed_repl_ref);
6293
6294$delayed_repl_ref is optional. In section 4 all values will be set to
6295missing except delayed replication factors and the (descriptor, value)
6296pairs in the hashref $stationid_ref. $delayed_repl_ref (if provided)
6297should be a reference to an array of data values for all descriptors
6298031001 and 031002 occuring in the message (these values must all be
6299nonzero), e.g. [3,1,2] if there are 3 such descriptors which should
6300have values 3, 1 and 2, in that succession. If $delayed_repl_ref is
6301omitted, all delayed replication factors will be set to 1. The
6302required metadata in section 0, 1 and 3 must have been set in $bufr
6303before calling this method (although number of subsets and BUFR
6304compression will automatically be set to 1 and 0 respectively,
6305whatever value they had before).
6306
6307Reencode BUFR message(s):
6308
6309  $new_messages = $bufr->reencode_message($decoded_messages,$width);
6310
6311$width is optional. Takes a text $decoded_messages as argument and
6312returns a (binary) string of BUFR messages which, when printed to file
6313and then processed by C<bufrread.pl> with no output modifying options set
6314(except possibly C<--width>), would give output equal to
6315$decoded_messages. If C<bufrread.pl> is to be called with C<--width
6316$width>, this $width must be provided to C<reencode_message> also.
6317
6318Join subsets from several messages:
6319
6320 ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr_1,$subset_ref_1,
6321     ... $bufr_n,$subset_ref_n);
6322
6323where each $subset_ref_i is optional. Will return the data and
6324descriptors needed by C<encode_message> to encode a multi subset
6325message, extracting the subsets from the first message of each $bufr_i
6326object. All subsets in (first message of) $bufr_i will be used, unless
6327next argument is an array reference $subset_ref_i, in which case only
6328the subset numbers listed will be included, in the order specified. On
6329return $nsub will contain the total number of subsets thus
6330extracted. After a call to C<join_subsets>, the metadata (of the first
6331message) in each object will be available through the C<get_>-methods,
6332while a call to C<next_observation> will start extracting the first
6333subset in the first message. Here is an example of use, fetching first
6334subset from bufr object 1, all subsets from bufr object 2, and subsets
63354 and 2 from bufr object 3, then building up a new multi subset BUFR
6336message (which will succeed only if the bufr objects all have the same
6337descriptors in section 3):
6338
6339  my ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr1,
6340      [1],$bufr2,$bufr3,[4,2]);
6341  my $new_bufr = Geo::BUFR->new();
6342  # Get metadata from one of the objects, then reset those metadata
6343  # which might not be correct for the new message
6344  $new_bufr->copy_from($bufr1,'metadata');
6345  $new_bufr->set_number_of_subsets($nsub);
6346  $new_bufr->set_update_sequence_number(0);
6347  $new_bufr->set_compressed_data(0);
6348  my $new_message = $new_bufr->encode_message($data_refs,$desc_refs);
6349
6350Extract BUFR table B information for an element descriptor:
6351
6352  ($name,$unit,$scale,$refval,$width) = $bufr->element_descriptor($desc);
6353
6354Will fetch name, unit, scale, reference value and data width in bits
6355for element descriptor $desc in the last table B loaded in the $bufr
6356object. Returns false if the descriptor is not found.
6357
6358Extract BUFR table D information for a sequence descriptor:
6359
6360  @descriptors = $bufr->sequence_descriptor($desc);
6361  $string = $bufr->sequence_descriptor($desc);
6362
6363Will return the descriptors in a direct (nonrecursive) lookup for the
6364sequence descriptor $desc in the last table D loaded in the $bufr
6365object. In scalar context the descriptors will be returned as a space
6366separated string. Returns false if the descriptor is not found.
6367
6368Resolve BUFR table descriptors (for printing):
6369
6370  print $bufr->resolve_descriptor($how,@descriptors);
6371
6372where $how is one of 'fully', 'partially', 'simply' and 'noexpand'.
6373Returns a text string suitable for printing information about the BUFR
6374table descriptors given. $how = 'fully': Expand all D descriptors
6375fully into B descriptors, with name, unit, scale, reference value and
6376width (each on a numbered line, except for replication operators which
6377are not numbered). $how = 'partially': Like 'fully', but expand D
6378descriptors only once and ignore replication. $how = 'noexpand': Like
6379'partially', but do not expand D descriptors at all. $how = 'simply':
6380Like 'partially', but list the descriptors on one single line with no
6381extra information provided. The relevant B/D table must have been
6382loaded before calling C<resolve_descriptor>.
6383
6384Resolve flag table value (for printing):
6385
6386  print $bufr->resolve_flagvalue($value,$flag_table,$B_table,
6387                                 $default_B_table,$num_leading_spaces);
6388
6389Last 2 arguments are optional. $default_B_table will be used if
6390$B_table is not found, $num_leading_spaces defaults to 0.
6391Examples:
6392
6393  print $bufr->resolve_flagvalue(4,8006,'B0000000000098013001.TXT') # BUFRDC
6394  print $bufr->resolve_flagvalue(4,8006,'0/wmo/13')       # ECCODES, master table
6395  print $bufr->resolve_flagvalue(4,8193,'0/local/1/98/0') # ECCODES, local table
6396
6397Print the contents of BUFR code (or flag) table:
6398
6399  print $bufr->dump_codetable($code_table,$table,$default_table);
6400
6401where in BUFRDC $table is (base)name of the C...TXT file containing the code
6402tables, optionally followed by a default table which will be used if
6403$table is not found.
6404
6405C<resolve_flagvalue> and C<dump_codetable> will return empty string if
6406flag value or code table is not found.
6407
6408
6409Manipulate binary data (these are implemented in C for speed and primarily
6410intended as module internal subroutines):
6411
6412  $value = Geo::BUFR->bitstream2dec($bitstream,$bitpos,$num_bits);
6413
6414Extracts $num_bits bits from $bitstream, starting at bit $bitpos. The
6415extracted bits are interpreted as a nonnegative integer.  Returns
6416undef if all bits extracted are 1 bits.
6417
6418  $ascii = Geo::BUFR->bitstream2ascii($bitstream,$bitpos,$num_bytes);
6419
6420Extracts $num_bytes bytes from bitstream, starting at $bitpos, and
6421interprets the extracted bytes as an ascii string. Returns undef if
6422the extracted bytes are all 1 bits.
6423
6424  Geo::BUFR->dec2bitstream($value,$bitstream,$bitpos,$bitlen);
6425
6426Encodes nonnegative integer value $value in $bitlen bits in
6427$bitstream, starting at bit $bitpos. Last byte will be padded with 1
6428bits. $bitstream must have been initialized to a string long enough to
6429hold $value. The parts of $bitstream before $bitpos and after last
6430encoded byte are not altered.
6431
6432  Geo::BUFR->ascii2bitstream($ascii,$bitstream,$bitpos,$width);
6433
6434Encodes ASCII string $ascii in $width bytes in $bitstream, starting at
6435$bitpos. Last byte will be padded with 1 bits. $bitstream must have
6436been initialized to a string long enough to hold $ascii. The parts of
6437$bitstream before $bitpos and after last encoded byte are not altered.
6438
6439  Geo::BUFR->null2bitstream($bitstream,$bitpos,$num_bits);
6440
6441Sets $num_bits bits in bitstream starting at bit $bitpos to 0 bits.
6442Last byte affected will be padded with 1 bits. $bitstream must be at
6443least $bitpos + $num_bits bits long. The parts of $bitstream before
6444$bitpos and after last encoded byte are not altered.
6445
6446=head1 DECODING/ENCODING
6447
6448The term 'fully expanded descriptors' used in the description of
6449C<encode_message> (and C<next_observation>) in L</METHODS> might need
6450some clarification. The short version is that the list of descriptors
6451should be exactly those which will be written out by running
6452C<dumpsection4> (or C<bufrread.pl> without any modifying options set)
6453on the encoded message. If you don't have a similar BUFR message at
6454hand to use as an example when wanting to encode a new message, you
6455might need a more specific prescription. Which is that for every data
6456value which occurs in the section 4 bitstream, you should include the
6457corresponding BUFR descriptor, using the artificial 999999 for
6458associated fields following the 204Y operator, I<and> including the
6459data operator descriptors 22[2345]000 and 23[2567]000 with data value
6460set to the empty string, if these occurs among the descriptors in
6461section 3 (rather: in the expansion of these, use C<bufrresolve.pl> to
6462check!). Element descriptors defining new reference values (following
6463the 203Y operator) will have F=0 (first digit in descriptor) replaced
6464with F=9 in C<next_observation>, while in C<encode_message> both F=0
6465and F=9 will be accepted for new reference values. When encoding
6466delayed repetition you should repeat the set of data (and descriptors)
6467to be repeated the number of times indicated by 031011 or 031012 (if
6468given the feedback that this is considered cumbersome, an option for
6469including the set of data/descriptors just once might be added later,
6470both for encoding end decoding).
6471
6472Some words about the procedure used for decoding and encoding data in
6473section 4 might shed some light on this choice of design.
6474
6475When decoding section 4 for a subset, first of all the BUFR
6476descriptors provided in section 3 are expanded as far as possible
6477without looking at the actual bitstream, i.e. by eliminating
6478nondelayed replication descriptors (F=1) and by using BUFR table D to
6479expand sequence descriptors (F=3). Then, for each of the thus expanded
6480descriptors, the data value is fetched from the bitstream according to
6481the prescriptions in BUFR table B, applying the data operator
6482descriptors (F=2) from BUFR table C as they are encountered, and
6483reexpanding the remaining descriptors every time a delayed replication
6484factor is fetched from bitstream. The resulting set of data values is
6485returned in an array @data, with the corresponding B (and sometimes
6486also some C) BUFR table descriptors in an array
6487@descriptors. C<next_observation> returns references to these two
6488arrays. For convenience, some of the data operator descriptors without
6489a corresponding data value (like 222000) are included in the
6490@descriptors because they are considered to provide valuable
6491information to the user, with corresponding value in @data set to the
6492empty string. These descriptors without a value are written by the
6493dumpsection4 methods on unnumbered lines, thereby distinguishing them
6494from descriptors corresponding to 'real' data values in section 4,
6495which are numbered consecutively.
6496
6497Encoding a subset is done in a very similar way, by expanding the
6498descriptors in section 3 as described above, but instead fetching the
6499data values from the @data array that the user supplies (actually
6500@{$data_refs->{$i}} where $i is subset number), and then finally
6501encoding this value to bitstream.
6502
6503The input parameter $desc_ref to C<encode_message> is in fact not
6504strictly necessary to be able to encode a new BUFR message. But there
6505is a good reason for requiring it. During encoding the descriptors
6506from expanding section 3 will consecutively be compared with the
6507descriptors in the user supplied $desc_ref, and if these at some point
6508differ, encoding will be aborted with an error message stating the
6509first descriptor which deviated from the expected one. By requiring
6510$desc_ref as input, the risk for encoding an erroneous section 4 is
6511thus greatly reduced, and also provides the user with highly valuable
6512debugging information if encoding fails.
6513
6514When decoding character data (unit CCITTIA5), any null characters
6515found are silently (unless $Strict_checking is set) removed, as well
6516as leading and trailing white space.
6517
6518=head1 BUFR TABLE FILES
6519
6520The BUFR table files should follow the format and naming conventions
6521used by one of these two ECMWF software packages: either BUFRDC
6522(download from https://confluence.ecmwf.int/display/BUFR/Releases), or
6523ecCodes (download from https://confluence.ecmwf.int/display/ECC/Releases).
6524
6525The utility programs in Geo::BUFR will look for table files by default
6526in the standard installation directories, which in Unix-like systems
6527will be /usr/local/lib/bufrtables for BUFRDC and
6528/usr/local/share/eccodes/definitions/bufr/tables for eCcodes. You can
6529change that behaviour by either providing the environment variable
6530BUFR_TABLES, or setting path explicitly by using the
6531C<--tablepath>. Note that while BUFR_TABLES is a well known concept in
6532BUFRDC software, the closest you get in eCcodes is probably
6533ECCODES_DEFINITION_PATH (see
6534e.g. https://confluence.ecmwf.int/display/ECC/BUFR%3A+Local+configuration),
6535for which BUFR_TABLES should (or could) be set to
6536ECCODES_DEFINITION_PATH/bufr/tables (again in Unix-like systems).
6537
6538=head1 STRICT CHECKING
6539
6540The package global $Strict_checking defaults to
6541
6542  0: Ignore recoverable errors in BUFR format met during decoding or encoding
6543
6544but can be changed to
6545
6546  1: Issue warning (carp) but continue decoding/encoding
6547
6548  2: Croak (die) instead of carp
6549
6550by calling C<set_strict_checking>. The following is checked for when
6551$Strict_checking is set to 1 or 2:
6552
6553=over
6554
6555=item *
6556
6557Total length of BUFR message as stated in section 0 bigger than actual length
6558
6559=item *
6560
6561Excessive bytes in section 4 (section longer than computed from section 3)
6562
6563=item *
6564
6565Compression set in section 3 for one subset message (BUFR reg. 94.6.3.2)
6566
6567=item *
6568
6569Bits 3-8 in octet 7 in section 3 not set to zero
6570
6571=item *
6572
6573Local reference value for compressed character data not having all
6574bits set to zero (94.6.3.2.i)
6575
6576=item *
6577
6578Illegal flag values (rightmost bit set for non-missing values) (Note (9)
6579to Table B in FM 94 BUFR)
6580
6581=item *
6582
6583Character data not being CCITTIA5 (Note (9) in FM 94 BUFR first page)
6584
6585=item *
6586
6587Null characters in CCITTIA5 data (Note (4) to Table B in FM 94 BUFR)
6588
6589=item *
6590
6591Missing CCITTIA5 value encoded as spaces
6592
6593=item *
6594
6595Invalid date and/or time in section 1
6596
6597=item *
6598
6599Cancellation operators (20[1-4]00, 203255 etc) when there is nothing to cancel
6600
6601=item *
6602
66030 subsets in message. This may not break any formal rules, but is
6604likely to cause problems in further data processing (and Geo::BUFR
6605will not allow you to encode or reencode such a message anyway).
6606
6607=item *
6608
6609Leaving out descriptors to be repeated when corresponding delayed
6610replication/repetition factor in section 4 is 0 and this is last data
6611item. E.g. ending 'Data descriptors unexpanded' in section 3 with
6612'106000 031001' when data value for 031001 is 0. This (mal)practice,
6613however, defies the very point of replication operations (BUFR
6614reg. 94.5.4). Presumably the purpose is to save some space in the BUFR
6615message, but then why not leave out also '106000 031001' and the (0)
6616data value for 031001?
6617
6618=item *
6619
6620Value encoded using BUFR compression which would be too big to encode
6621without compression. For example, for a data descriptor with data
6622width 9 bits a value of 510 ought to be the biggest value possible to
6623encode, but in a multisubset message using BUFR compression it is
6624possible to encode almost arbitrarily large values in single subsets
6625as long as the average over all subsets is contained within 9
6626bits. This is not breaking any formal rules, but almost certainly not
6627desirable.
6628
6629=back
6630
6631Plus some few more checks not considered interesting enough to be
6632mentioned here.
6633
6634=begin more_on_strict_checking
6635
6636These are:
6637- Replication of 0 descriptors (F=1, X=0)
6638- year_of_century > 100
6639- 206Y operator is not followed by a local descriptor
6640
6641
6642=end more_on_strict_checking
6643
6644=head1 BUGS OR MISSING FEATURES
6645
6646Some BUFR table C operators are not implemented or are untested,
6647mainly because I do not have access to BUFR messages containing such
6648operators. If you happen to come over a BUFR message which the current
6649module fails to decode properly, I would therefore highly appreciate
6650if you could mail me this.
6651
6652=head1 AUTHOR
6653
6654Pål Sannes E<lt>pal.sannes@met.noE<gt>
6655
6656=head1 CREDITS
6657
6658I am very grateful to Alvin Brattli, who (while employed as a
6659researcher at the Norwegian Meteorological Institute) wrote the first
6660version of this module, with the sole purpose of being able to decode
6661some very specific BUFR satellite data, but still provided the main
6662framework upon which this module is built.
6663
6664=head1 SEE ALSO
6665
6666Guide to WMO Table Driven Code Forms: FM 94 BUFR and FM 95 CREX; Layer 3:
6667Detailed Description of the Code Forms (for programmers of encoder/decoder
6668software)
6669
6670L<https://wiki.met.no/bufr.pm/start>
6671
6672=head1 COPYRIGHT
6673
6674Copyright (C) 2010-2020 MET Norway
6675
6676This module is free software; you can redistribute it and/or
6677modify it under the same terms as Perl itself.
6678
6679=cut
6680