1Spreadsheet::ReadSXC - Extract OpenOffice 1.x spreadsheet data 2 3DESCRIPTION 4 5Spreadsheet::ReadSXC extracts data from OpenOffice 1.x spreadsheet 6files (.sxc). It exports the function read_sxc() which takes a 7filename and an optional reference to a hash of options as 8arguments and returns a reference to a hash of references to 9two-dimensional arrays. The hash keys correspond to the names of 10worksheets in the OpenOffice workbook. The two-dimensional arrays 11correspond to rows and cells in the respective spreadsheets. If 12you don't like this because the order of sheets is not preserved 13in a hash, read on. The 'OrderBySheet' option provides an array 14of hashes instead. 15 16If you prefer to unpack the .sxc file yourself, you can use the 17function read_xml_file() instead and pass the path to content.xml 18as an argument. Or you can extract the XML string from content.xml 19and pass the string to the function read_xml_string(). Both 20functions also take a reference to a hash of options as an 21optional second argument. 22 23Spreadsheet::ReadSXC uses XML::Twig to parse the XML 24contained in .sxc files. Only the contents of text:p elements are 25returned, not the actual values of table:value attributes. For 26example, a cell might have a table:value-type attribute of 27"currency", a table:value attribute of "-1500.99" and a 28table:currency attribute of "USD". The text:p element would 29contain "-$1,500.99". This is the string which is returned by the 30read_sxc() function, not the value of -1500.99. 31 32Spreadsheet::ReadSXC was written with data import into an SQL 33database in mind. Therefore empty spreadsheet cells correspond to 34undef values in array rows. The example code above shows how to 35replace undef values with empty strings. 36 37If the .sxc file contains an empty spreadsheet its hash element will 38point to an empty array (unless you use the 'NoTruncate' option in 39which case it will point to an array of an array containing one 40undefined element). 41 42OpenOffice uses UTF-8 encoding. It depends on your environment how 43the data returned by the XML Parser is best handled: 44 45 use Unicode::String qw(latin1 utf8); 46 $unicode_string = utf8($$workbook_ref{"Sheet1"}[0][0])->as_string; 47 48 # this will not work for characters outside ISO-8859-1: 49 50 $latin1_string = utf8($$workbook_ref{"Sheet1"}[0][0])->latin1; 51 52Of course there are other modules than Unicode::String on CPAN that 53handle conversion between encodings. It's your choice. 54 55Table rows in .sxc files may have a "table:number-rows-repeated" 56attribute, which is often used for consecutive empty rows. When you 57format whole rows and/or columns in OpenOffice, it sets the numbers 58of rows in a worksheet to 32,000 and the number of columns to 256, even 59if only a few lower-numbered rows and cells actually contain data. 60Spreadsheet::ReadSXC truncates such sheets so that there are no empty 61rows after the last row containing data and no empty columns after the 62last column containing data (unless you use the 'NoTruncate' option). 63 64Still it is perfectly legal for an .sxc file to apply the 65"table:number-rows-repeated" attribute to rows that actually contain 66data (although I have only been able to produce such files manually, 67not through OpenOffice itself). To save on memory usage in these cases, 68Spreadsheet::ReadSXC does not copy rows by value, but by reference 69(remember that multi-dimensional arrays in Perl are really arrays of 70references to arrays). Therefore, if you change a value in one row, it 71is possible that you find the corresponding value in the next row 72changed, too: 73 74 $$workbook_ref{"Sheet1"}[0][0] = 'new string'; 75 print $$workbook_ref{"Sheet1"}[1][0]; 76 77As of version 0.20 the references returned by read_sxc() et al. remain 78valid after subsequent calls to the same function. In earlier versions, 79calling read_sxc() with a different file as the argument would change 80the data referenced by the original return value, so you had to 81derefence it before making another call. Thanks to H. Merijn Brand for 82fixing this. 83 84 85INSTALLATION 86 87This is a Perl module distribution. It should be installed with whichever 88tool you use to manage your installation of Perl, e.g. any of 89 90 cpanm . 91 cpan . 92 cpanp -i . 93 94Consult https://www.cpan.org/modules/INSTALL.html for further instruction. 95Should you wish to install this module manually, the procedure is 96 97 perl Makefile.PL 98 make 99 make test 100 make install 101 102 103 104 105 106 107BUG TRACKER 108 109Please report bugs in this module via the Github bug queue at 110L<https://github.com/Corion/Spreadsheet-ReadSXC/issues> 111 112 113SEE ALSO 114 115L<https://www.openoffice.org/xml/general.html> has extensive documentation 116of the OpenOffice 1.x XML file format (soon to be replaced by the 117OASIS file format (ODS), see L<http://docs.oasis-open.org/office/v1.2/OpenDocument-v1.2.pdf>). 118 119AUTHOR 120 121Christoph Terhechte, E<lt>terhechte@cpan.orgE<gt> 122 123 124COPYRIGHT AND LICENSE 125 126Copyright 2005-2019 by Christoph Terhechte 127 128Copyright 2019- by Max Maischein 129 130This library is free software; you can redistribute it and/or modify 131it under the same terms as Perl itself. 132