1=head1 NAME 2 3ODF::lpOD - An OpenDocument management interface 4 5=head1 SYNOPSIS 6 7 use ODF::lpOD: 8 9 my $document = odf_document->get("report.odt"); 10 11 my $meta = $document->get_part(META); 12 $meta->set_title("The best document format"); 13 14 my $content = $document->get_part(CONTENT); 15 my $context = $content->get_body; 16 my $paragraph = $context->get_paragraph( 17 content => "I look for it" 18 ); 19 $paragraph->set_text("I found it"); 20 $paragraph->set_style("Standout"); 21 my $new_paragraph = odf_paragraph->create ( 22 style => "Standard", 23 text => "A new content" 24 ); 25 $context->append_element($new_paragraph); 26 my $table = odf_table->create ( 27 "Main Figures", height => 20, width => 16 28 ); 29 $context->insert_element($table, before => $paragraph); 30 my $cell = $table->get_cell("B4"); 31 $cell->set_text("Here B4"); 32 33 $document->save; 34 exit; 35 36The code example above loads a document from an existing "report.odt" file, 37updates various data in the document, then saves the changes. The following 38actions are done in the document: 39 401) The title is set to "The best document format"; 41 422) The first paragraph containing "I look for it" is retrieved (this paragraph 43is supposed to exist; otherwise get_paragraph would return undef); 44 453) The content of the found paragraph is replaced by "I found it", and its 46style is set to "Standout" (this style is supposed to exist or to be defined 47later); 48 494) A new paragraph, whose text is "A new content" and style is "Standard", 50is created then appended to the document body; 51 525) A new table whose name is "Main Figures" and size is 20x16 is created then 53inserted just before the first retrieved paragraph; 54 556) The "B4" cell (i.e. the cell belonging to the 4th row and the 2nd column, 56whatever the document type) is retrieved, and its content is set to "Here B4" 57(the cell data type is automatically set to C<'string'>). 58 59=head1 DESCRIPTION 60 61This module is an office document management interface. It allows the users to 62create or transform office documents, or to extract data from them. It can 63handle files which comply with the ODF standard and whose type is I<text> (odt), 64I<spreadsheet> (ods), I<presentation> (odp) or I<drawing> (odg). It interacts 65directly with the files and doesn't depend on a particular office software. 66 67=head1 ABOUT lpOD 68 69This is the Perl implementation of the lpOD project. 70 71lpOD is a Free Software project that offers, for high level use cases, an 72application programming interface dedicated to document processing with the 73Python, Perl and Ruby languages. It's complying with the I<OASIS Open Document 74Format> (ODF), i.e. the I<ISO/IEC 26300> international standard. 75 76lpOD is designed according to a top-down approach. The API is bound to the 77document functional structure and the user's point of view. As a consequence, 78it may be used without full knowledge of the ODF specification, and allows the 79application developer to be focused on the business needs instead of the low 80level storage concerns. 81 82The lpOD API is object oriented. 83 84=head1 Basic document access principles 85 86The general access to the documents uses the C<odf_document> class. Before 87processing a document, an odf_document instance must be created using one of 88the allowed constructors. While an odf_document object encapsulates the physical 89resource access logic, the real data must be handled through document I<parts>, 90knowing that each part represents a specialized aspect of the document. 91 92Each part contains a set of C<odf_element> objects, knowing that odf_element is 93the common base class for any kind of document simple or complex element (an 94odf_element may be a visible object, such as a paragraph or a table, as well as 95a piece of data that specifies the layout or the behavior of other objects, 96such as a text style or a page layout). Each part contains a I<root> element, 97that is a special odf_element containing all the elements of the part. A part 98may contain a I<body> element, that is a more restricted but in some cases more 99interesting context than the root. 100 101lpOD is a read-write API. However, the changes made by the applications aren't 102automatically persistent. The API provides methods that insert, delete, or 103update elements in memory, but these changes must be explicitly committed using 104other, package-oriented methods, in order to become persistent. 105 106=head2 Global document initialization 107 108A few specialized constructors may be used in order to create odf_document 109objects. All these constructors return an odf_document object in case of 110success, a FALSE value otherwise. 111 112One an odf_document is created, it's content may be wrote back to a persistent 113storage using its C<save> method. 114 115=head3 odf_get_document(source) 116 117Instantiates an C<odf_document> object which is a read-write interface to 118an existing ODF package corresponding to the given source. The package should 119be an ODF-compliant zip file (odt, ods, odp, and so on). Example: 120 121 my $document = odf_get_document("C:\Path\Doc.odt"); 122 123C<odf_get_document()> is just a functional way to call the C<get()> constructor 124of the C<odf_document> class; so the example above produce the same effect as 125the following one: 126 127 my $document = odf_document->get("C:\Path\Doc.odt"); 128 129The source argument must be provided either as a regular file path or as a 130C<IO::File> object. 131 132=head3 odf_new_document(document_type) 133 134Returns a new odf_document corresponding to the given ODF document type. 135Allowed document types are presently C<'text'>, C<'spreadsheet'>, 136C<'presentation'>, and C<'drawing'>). Example: 137 138 my $document = odf_new_document('spreadsheet'); 139 140Knowing that this functional constructor is just a way to call the C<create()> 141method of the C<odf_document> class, the following code is equivalent: 142 143 my $document = odf_document->create('spreadsheet'); 144 145Technically, the new document is generated as a clone of an existing template 146document, provided with the lpOD distribution. It operates in the same way as 147C<odf_new_document_from_template>, but the user doesn't need to provide the 148template document. 149 150=head3 odf_new_document_from_template(source) 151 152Returns a new odf_document instantiated from an existing ODF template 153package. Same as C<odf_get_document>, but the source package is read-only. 154 155=head3 save([destination]) 156 157This function is a I<method>. It must be called from an odf_document instance. 158 159Without argument, it attempts to write it's content back to the resource that 160was used to create it. A warning is issued and nothing is done if the document 161has been created without source file or from a read-only template (i.e. through 162C<odf_new_document> or C<odf_new_document_from_template>). 163 164This method produces a file whose basic format is the same as the format of 165the source document or template (whatever the target file name, if any). 166 167If the optional parameter C<target> is provided, it's regarded as the storage 168destination. Its value may be a regular file path or a C<IO::File>. This 169parameter is mandatory if the C<odf_document> instance has been created 170through C<odf_new_document_from_template> or C<odf_new_document_from_type>. 171 172Example: 173 174 $document->save(target => "/myfiles/target.odt"); 175 176=head2 Document part initialization and handling 177 178A regular ODF document contains various I<parts>, some of them mandatory. 179The interesting parts in the lpOD scope are C<'content'>, C<'styles'>, C<'meta'>, 180C<'settings'>, and C<'manifest'>. 181 182The odf_document class provides a C<get_part()> method, that must be used with 183an argument that specifies the needed part. Example: 184 185 my $content = $document->get_part(CONTENT); 186 my $meta = $document->get_part(META); 187 188The sequence above gives access to the content and meta parts of a previously 189created C<odf_document> instance. 190 191Beware: if C<get_part()> is called twice or more from the same C<odf_document> 192instance and with the same part designation, it returns the same object. As a 193consequence, after the sequence below, C<$p1> and C<$p2> will be synonyms: 194 195 my $p1 = $document->get_part(CONTENT); 196 my $p2 = $document->get_part(CONTENT); 197 198C<serialize()> returns an XML export of the whole part (the application is then 199responsible of the fate of this export). An optional C<pretty> argument, if set 200to TRUE, specifies that the XML output must be human-readable. Example: 201 202 my $content = $document->get_part(CONTENT); 203 # here some content processing 204 my $xml = $content->serialize(pretty => TRUE); 205 206=head1 Basic ODF element handling 207 208Every C<odf_part> objects provides a low level C<get_element> method whose 209first argument is an XPath expression and the second one a numeric position. 210The numeric argument specifies the order number of the required element among 211the set of elements matching the XPath. If the order number is negative, the 212position is regarded as counted backward from the end. The position is zero- 213based (i.e. a zero value means the first matching element). As an example, the 214code below returns the last paragraph of the document. 215 216 my $document = odf_document->get($source); 217 my $content = $document->get_part(CONTENT); 218 my $p = $part->get_element("//text:p", -1); 219 220However, this way is not the smartest one because it requires the knowledge 221of the ODF schema (and some XPath skills for more complicated cases). There 222are better ways to select the last paragraph of a document (and various other 223objects at any position in a document). 224 225lpOD provides more user-friendly, XPath-free methods for the most used elements 226in the C<CONTENT> part of a document. These methods are provided through the 227C<odf_element> class. Any individual element in a part is an C<odf_element> 228object. There is a shortcut to get the top (or root) element of any part: the 229C<get_root()> method. Once selected, the top element provides all the I<context 230methods> of the lpOD API. 231 232A I<context method> is a method owned by an element (the context) and whose 233effect is related to the children and descendants of this element. So, the 234C<get_xxx> method of a given element is a retrieval method intended to 235select something I<below> the current element. Thanks to the C<get_paragraph> 236element provided by the C<odf_element> class, the last example could be wrote 237as shown below: 238 239 my $document = odf_document->get($source); 240 my $context = $document->get_part(CONTENT)->get_root; 241 my $p = $context->get_paragraph(-1); 242 243In most cases (including the previous example), C<get_root> may be replaced 244by C<get_body>, that return a context containing all the visible elements 245(including the paragraphs). 246 247There is a generic context-based C<get_element> that differs from the part-based 248one. It allows the user to select an element according to its text content, one 249of its attributes, and/or its sequential position in the context. As an example, 250the sequence below displays the name of the last page that uses the draw page 251style "dp1" (assuming we are using a presentation or drawing document): 252 253 my $context = $document->get_part(CONTENT)->get_body; 254 my $page = $context->get_element( 255 'draw:page', 256 attribute => 'style name', 257 value => 'dp1', 258 position => -1 259 ); 260 say $page->get_attribute('name'); 261 262lpOD provides special name-based retrieval methods for some elements that own 263unique names. For example the instruction below selects the table whose name 264is "T1" (if any): 265 266 $table = $context->get_table_by_name("T1"); 267 268The C<meta> document part, unlike others such as the C<content> one, provides 269direct C<get> and C<set> accessors for the content of the usual metadata, so 270there is no need of a context element, as shown below in the following example 271that displays the title of a document: 272 273 my $document = odf_document->get($source); 274 my $meta = $document->get_part(META); 275 say $meta->get_title; 276 277The title (like an other metadata value) may be updated or created with the 278corresponding C<set> accessor: 279 280 $meta->set_title("The new title"); 281 282All the properties of a previously selected element are stored in one or more 283I<attributes> and in a I<text>. So, for any C<odf_element> lpOD provides 284corresponding C<get> and C<set> accessors. 285 286C<get_text> returns the current text, while C<set_text> replaces the current 287content by a new text (possibly empty). Without argument, C<get_text> returns 288the text directly contained in the calling element, but with a C<recursive> 289optional named parameter set to C<TRUE>, it returns the concatenated texts of 290all the descendants of the calling element. On the other hand, C<set_text> 291deletes any previous content (i.e. direct text content and embedded elements 292such as bookmarks, variable fields, text segments with special styles, and 293so on). 294 295The C<get_attribute> method requires the name of the needs attribute. This name 296may be the technical name according to the OpenDocument specification, or a more 297simple and significant name. For example, assuming C<$item> is a I<list item>, 298and knowing that such an object may own a so-called C<text:restart-numbering> 299attribute telling that the list numbering must be restarted at this point from 300a given value, the following instruction sets this value to 6: 301 302 $item->set_attribute('restart numbering' => 6); 303 304C<set_attribute> deletes an existing attribute as soon as the given value is 305C<undef>; so the instruction below cancels the C<restart numbering> feature: 306 307 $item->set_attribute('restart numbering' => undef); 308 309Note that C<set_attribute>, provided with a non-null value, automatically 310I<creates> the attribute if it doesn't exist; there is no need to separately 311check an attribute for existence and create it before setting a value. 312 313It's possible to get or set more than one attributes in a single call using 314C<get_attributes> or C<set_attributes>. The first one returns the attributes 315as a hash reference (with the real ODF names), while the second one requires 316a hash reference as argument. 317 318An element may be removed (with all its descendants) using its C<delete> method. 319(Beware: the deletion of a high level element may destroy a lot of content !). 320It's possible to delete the whole content of an element without removing the 321element itself by issuing a C<set_text> with an empty string. 322 323The user is allowed to create a new element using the C<odf_create_element> 324constructor, that requires an appropriate ODF tag (corresponding to the type 325of element) or a valid XML string. Fortunately, lpOD provides a set of 326specialized constructors (such as C<odf_create_paragraph>, C<odf_create_table>, 327and so on) that may be used without knowledge of the XML stuff. Once created 328through such a constructor, the new element is not automatically included in 329a document. To do so, lpOD provides the C<insert_element> and C<append_element> 330methods, both context-based, i.e. called from an existing element that will 331become the parent of the new element. As an example, the sequence below creates 332a new paragraph (with given style and content), then appends it to a selected 333section: 334 335 my $document = odf_document->($source); 336 my $context = $document->get_part(CONTENT)->get_body; 337 my $section = $context->get_section("Prologue"); 338 my $paragraph = odf_paragraph->create( 339 style => "Standard", text => "The End of the Beginning" 340 ); 341 $section->append_element($paragraph); 342 343Elements may be created by replication of existing elements, thanks to the 344C<clone> method. The result of the instruction below is a copy of an existing 345section (with all its content); this copy is a "free" element (i.e. it's not 346included in any document, and it has no link with its prototype element), so 347it may be inserted elsewhere in the same document or in another document: 348 349 my $section = $context->get_section("Reusable"); 350 my $free_section = $section->clone; 351 352=head1 Getting started 353 354=head2 The "Hello Word" example 355 356Unsurprisingly, we propose you to test your lpOD installation and your knowledge 357of the big picture through this simple program: 358 359 use ODF::lpOD; 360 361 my $doc = odf_document->create('text'); 362 my $content = $doc->get_part(CONTENT); 363 my $context = $content->get_body; 364 $context->append_element( 365 odf_paragraph->create( 366 style => "Standard", 367 text => "Hello World !" 368 ) 369 ); 370 $doc->save(target => "helloworld.odt"); 371 exit; 372 373If this script runs without warning, open the "helloworld.odt" file using your 374favorite ODF-compliant text processor, and look at the text content. You may 375then introduce more sophistication using the metadata part of the document. 376To do so, you can (for example) insert the lines below somewhere before the 377C<save> instruction (and after the C<odf_document->create()> one). 378 379 my $meta = $doc->get_part(META); 380 $meta->set_title("Hello World Test"); 381 $meta->set_creator("Me"); 382 383After execution of the extended version, check the author's name and the 384title through the I<File/Properties> dialog of your ODF text editor. 385 386=head2 Using the documentation 387 388The L<ODF::lpOD::Tutorial> is a recommended first reading that may help 389to quickly gain a basic understanding and get started with lpOD. The 390reference documentation is split into the following manual chapters: 391 392=over 393 394=item * 395 396L<ODF::lpOD::Document>: General document packaging and metadata handling. 397 398=item * 399 400L<ODF::lpOD::Element>: Common features, available with any element. 401 402=item * 403 404L<ODF::lpOD::TextElement>: Text containers (paragraphs, headings), and various 405elements that may take place in paragraphs (bookmarks, index marks, 406bibliography marks, text variables and fields). 407 408=item * 409 410L<ODF::lpOD::Table>: Access to tables and their content. 411 412=item * 413 414L<ODF::lpOD::StructuredContainer>: High-level structures such as sections, 415lists, draw pages, shapes, image or text frames, tables of contents. 416 417=item * 418 419L<ODF::lpOD::Style>: Style retrieval, update, or creation 420 421=item * 422 423L<ODF::lpOD::Common>: Common utility functions 424 425=back 426 427An alternative tutorial, intended for French-reading users, is available at 428L<http://jean.marie.gouarne.online.fr/doc/introduction_lpod_perl.pdf> 429 430=head1 AUTHOR/COPYRIGHT 431 432Developer/Maintainer: Jean-Marie Gouarne L<http://jean.marie.gouarne.online.fr> 433Contact: jmgdoc@cpan.org 434 435Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend. 436Copyright (c) 2014 Jean-Marie Gouarne. 437 438This work was sponsored by the Agence Nationale de la Recherche 439(L<http://www.agence-nationale-recherche.fr>). 440 441License: GPL v3, Apache v2.0 (see LICENSE). 442 443=cut 444