1Spreadsheet::ReadSXC - Extract OpenOffice 1.x spreadsheet data
2
3DESCRIPTION
4
5Spreadsheet::ReadSXC extracts data from OpenOffice 1.x spreadsheet
6files (.sxc). It exports the function read_sxc() which takes a
7filename and an optional reference to a hash of options as
8arguments and returns a reference to a hash of references to
9two-dimensional arrays. The hash keys correspond to the names of
10worksheets in the OpenOffice workbook. The two-dimensional arrays
11correspond to rows and cells in the respective spreadsheets. If
12you don't like this because the order of sheets is not preserved
13in a hash, read on. The 'OrderBySheet' option provides an array
14of hashes instead.
15
16If you prefer to unpack the .sxc file yourself, you can use the
17function read_xml_file() instead and pass the path to content.xml
18as an argument. Or you can extract the XML string from content.xml
19and pass the string to the function read_xml_string(). Both
20functions also take a reference to a hash of options as an
21optional second argument.
22
23Spreadsheet::ReadSXC uses XML::Twig to parse the XML
24contained in .sxc files. Only the contents of text:p elements are
25returned, not the actual values of table:value attributes. For
26example, a cell might have a table:value-type attribute of
27"currency", a table:value attribute of "-1500.99" and a
28table:currency attribute of "USD". The text:p element would
29contain "-$1,500.99". This is the string which is returned by the
30read_sxc() function, not the value of -1500.99.
31
32Spreadsheet::ReadSXC was written with data import into an SQL
33database in mind. Therefore empty spreadsheet cells correspond to
34undef values in array rows. The example code above shows how to
35replace undef values with empty strings.
36
37If the .sxc file contains an empty spreadsheet its hash element will
38point to an empty array (unless you use the 'NoTruncate' option in
39which case it will point to an array of an array containing one
40undefined element).
41
42OpenOffice uses UTF-8 encoding. It depends on your environment how
43the data returned by the XML Parser is best handled:
44
45  use Unicode::String qw(latin1 utf8);
46  $unicode_string = utf8($$workbook_ref{"Sheet1"}[0][0])->as_string;
47
48  # this will not work for characters outside ISO-8859-1:
49
50  $latin1_string = utf8($$workbook_ref{"Sheet1"}[0][0])->latin1;
51
52Of course there are other modules than Unicode::String on CPAN that
53handle conversion between encodings. It's your choice.
54
55Table rows in .sxc files may have a "table:number-rows-repeated"
56attribute, which is often used for consecutive empty rows. When you
57format whole rows and/or columns in OpenOffice, it sets the numbers
58of rows in a worksheet to 32,000 and the number of columns to 256, even
59if only a few lower-numbered rows and cells actually contain data.
60Spreadsheet::ReadSXC truncates such sheets so that there are no empty
61rows after the last row containing data and no empty columns after the
62last column containing data (unless you use the 'NoTruncate' option).
63
64Still it is perfectly legal for an .sxc file to apply the
65"table:number-rows-repeated" attribute to rows that actually contain
66data (although I have only been able to produce such files manually,
67not through OpenOffice itself). To save on memory usage in these cases,
68Spreadsheet::ReadSXC does not copy rows by value, but by reference
69(remember that multi-dimensional arrays in Perl are really arrays of
70references to arrays). Therefore, if you change a value in one row, it
71is possible that you find the corresponding value in the next row
72changed, too:
73
74  $$workbook_ref{"Sheet1"}[0][0] = 'new string';
75  print $$workbook_ref{"Sheet1"}[1][0];
76
77As of version 0.20 the references returned by read_sxc() et al. remain
78valid after subsequent calls to the same function. In earlier versions,
79calling read_sxc() with a different file as the argument would change
80the data referenced by the original return value, so you had to
81derefence it before making another call. Thanks to H. Merijn Brand for
82fixing this.
83
84
85INSTALLATION
86
87This is a Perl module distribution. It should be installed with whichever
88tool you use to manage your installation of Perl, e.g. any of
89
90  cpanm .
91  cpan  .
92  cpanp -i .
93
94Consult https://www.cpan.org/modules/INSTALL.html for further instruction.
95Should you wish to install this module manually, the procedure is
96
97  perl Makefile.PL
98  make
99  make test
100  make install
101
102
103
104
105
106
107BUG TRACKER
108
109Please report bugs in this module via the Github bug queue at
110L<https://github.com/Corion/Spreadsheet-ReadSXC/issues>
111
112
113SEE ALSO
114
115L<https://www.openoffice.org/xml/general.html> has extensive documentation
116of the OpenOffice 1.x XML file format (soon to be replaced by the
117OASIS file format (ODS), see L<http://docs.oasis-open.org/office/v1.2/OpenDocument-v1.2.pdf>).
118
119AUTHOR
120
121Christoph Terhechte, E<lt>terhechte@cpan.orgE<gt>
122
123
124COPYRIGHT AND LICENSE
125
126Copyright 2005-2019 by Christoph Terhechte
127
128Copyright 2019- by Max Maischein
129
130This library is free software; you can redistribute it and/or modify
131it under the same terms as Perl itself.
132