1=head1	NAME
2
3ODF::lpOD - An OpenDocument management interface
4
5=head1	SYNOPSIS
6
7        use ODF::lpOD:
8
9        my $document = odf_document->get("report.odt");
10
11        my $meta = $document->get_part(META);
12        $meta->set_title("The best document format");
13
14        my $content = $document->get_part(CONTENT);
15        my $context = $content->get_body;
16        my $paragraph = $context->get_paragraph(
17                content => "I look for it"
18                );
19        $paragraph->set_text("I found it");
20        $paragraph->set_style("Standout");
21        my $new_paragraph = odf_paragraph->create (
22                                style => "Standard",
23                                text => "A new content"
24                                );
25        $context->append_element($new_paragraph);
26        my $table = odf_table->create (
27                "Main Figures", height => 20, width => 16
28                );
29        $context->insert_element($table, before => $paragraph);
30        my $cell = $table->get_cell("B4");
31        $cell->set_text("Here B4");
32
33        $document->save;
34        exit;
35
36The code example above loads a document from an existing "report.odt" file,
37updates various data in the document, then saves the changes. The following
38actions are done in the document:
39
401) The title is set to "The best document format";
41
422) The first paragraph containing "I look for it" is retrieved (this paragraph
43is supposed to exist; otherwise get_paragraph would return undef);
44
453) The content of the found paragraph is replaced by "I found it", and its
46style is set to "Standout" (this style is supposed to exist or to be defined
47later);
48
494) A new paragraph, whose text is "A new content" and style is "Standard",
50is created then appended to the document body;
51
525) A new table whose name is "Main Figures" and size is 20x16 is created then
53inserted just before the first retrieved paragraph;
54
556) The "B4" cell (i.e. the cell belonging to the 4th row and the 2nd column,
56whatever the document type) is retrieved, and its content is set to "Here B4"
57(the cell data type is automatically set to C<'string'>).
58
59=head1	DESCRIPTION
60
61This module is an office document management interface. It allows the users to
62create or transform office documents, or to extract data from them. It can
63handle files which comply with the ODF standard and whose type is I<text> (odt),
64I<spreadsheet> (ods), I<presentation> (odp) or I<drawing> (odg). It interacts
65directly with the files and doesn't depend on a particular office software.
66
67=head1  ABOUT lpOD
68
69This is the Perl implementation of the lpOD project.
70
71lpOD is a Free Software project that offers, for high level use cases, an
72application programming interface dedicated to document processing with the
73Python, Perl and Ruby languages. It's complying with the I<OASIS Open Document
74Format> (ODF), i.e. the I<ISO/IEC 26300> international standard.
75
76lpOD is designed according to a top-down approach. The API is bound to the
77document functional structure and the user's point of view. As a consequence,
78it may be used without full knowledge of the ODF specification, and allows the
79application developer to be focused on the business needs instead of the low
80level storage concerns.
81
82The lpOD API is object oriented.
83
84=head1  Basic document access principles
85
86The general access to the documents uses the C<odf_document> class. Before
87processing a document, an odf_document instance must be created using one of
88the allowed constructors. While an odf_document object encapsulates the physical
89resource access logic, the real data must be handled through document I<parts>,
90knowing that each part represents a specialized aspect of the document.
91
92Each part contains a set of C<odf_element> objects, knowing that odf_element is
93the common base class for any kind of document simple or complex element (an
94odf_element may be a visible object, such as a paragraph or a table, as well as
95a piece of data that specifies the layout or the behavior of other objects,
96such as a text style or a page layout). Each part contains a I<root> element,
97that is a special odf_element containing all the elements of the part. A part
98may contain a I<body> element, that is a more restricted but in some cases more
99interesting context than the root.
100
101lpOD is a read-write API. However, the changes made by the applications aren't
102automatically persistent. The API provides methods that insert, delete, or
103update elements in memory, but these changes must be explicitly committed using
104other, package-oriented methods, in order to become persistent.
105
106=head2  Global document initialization
107
108A few specialized constructors may be used in order to create odf_document
109objects. All these constructors return an odf_document object in case of
110success, a FALSE value otherwise.
111
112One an odf_document is created, it's content may be wrote back to a persistent
113storage using its C<save> method.
114
115=head3  odf_get_document(source)
116
117Instantiates an C<odf_document> object which is a read-write interface to
118an existing ODF package corresponding to the given source. The package should
119be an ODF-compliant zip file (odt, ods, odp, and so on). Example:
120
121        my $document = odf_get_document("C:\Path\Doc.odt");
122
123C<odf_get_document()> is just a functional way to call the C<get()> constructor
124of the C<odf_document> class; so the example above produce the same effect as
125the following one:
126
127        my $document = odf_document->get("C:\Path\Doc.odt");
128
129The source argument must be provided either as a regular file path or as a
130C<IO::File> object.
131
132=head3  odf_new_document(document_type)
133
134Returns a new odf_document corresponding to the given ODF document type.
135Allowed document types are presently C<'text'>, C<'spreadsheet'>,
136C<'presentation'>, and C<'drawing'>). Example:
137
138        my $document = odf_new_document('spreadsheet');
139
140Knowing that this functional constructor is just a way to call the C<create()>
141method of the C<odf_document> class, the following code is equivalent:
142
143        my $document = odf_document->create('spreadsheet');
144
145Technically, the new document is generated as a clone of an existing template
146document, provided with the lpOD distribution. It operates in the same way as
147C<odf_new_document_from_template>, but the user doesn't need to provide the
148template document.
149
150=head3  odf_new_document_from_template(source)
151
152Returns a new odf_document instantiated from an existing ODF template
153package. Same as C<odf_get_document>, but the source package is read-only.
154
155=head3  save([destination])
156
157This function is a I<method>. It must be called from an odf_document instance.
158
159Without argument, it attempts to write it's content back to the resource that
160was used to create it. A warning is issued and nothing is done if the document
161has been created without source file or from a read-only template (i.e. through
162C<odf_new_document> or C<odf_new_document_from_template>).
163
164This method produces a file whose basic format is the same as the format of
165the source document or template (whatever the target file name, if any).
166
167If the optional parameter C<target> is provided, it's regarded as the storage
168destination. Its value may be a regular file path or a C<IO::File>. This
169parameter is mandatory if the C<odf_document> instance has been created
170through C<odf_new_document_from_template>  or C<odf_new_document_from_type>.
171
172Example:
173
174        $document->save(target => "/myfiles/target.odt");
175
176=head2  Document part initialization and handling
177
178A regular ODF document contains various I<parts>, some of them mandatory.
179The interesting parts in the lpOD scope are C<'content'>, C<'styles'>, C<'meta'>,
180C<'settings'>, and C<'manifest'>.
181
182The odf_document class provides a C<get_part()> method, that must be used with
183an argument that specifies the needed part. Example:
184
185        my $content = $document->get_part(CONTENT);
186        my $meta = $document->get_part(META);
187
188The sequence above gives access to the content and meta parts of a previously
189created C<odf_document> instance.
190
191Beware: if C<get_part()> is called twice or more from the same C<odf_document>
192instance and with the same part designation, it returns the same object. As a
193consequence, after the sequence below, C<$p1> and C<$p2> will be synonyms:
194
195        my $p1 = $document->get_part(CONTENT);
196        my $p2 = $document->get_part(CONTENT);
197
198C<serialize()> returns an XML export of the whole part (the application is then
199responsible of the fate of this export). An optional C<pretty> argument, if set
200to TRUE, specifies that the XML output must be human-readable. Example:
201
202        my $content = $document->get_part(CONTENT);
203        # here some content processing
204        my $xml = $content->serialize(pretty => TRUE);
205
206=head1  Basic ODF element handling
207
208Every C<odf_part> objects provides a low level C<get_element> method whose
209first argument is an XPath expression and the second one a numeric position.
210The numeric argument specifies the order number of the required element among
211the set of elements matching the XPath. If the order number is negative, the
212position is regarded as counted backward from the end. The position is zero-
213based (i.e. a zero value means the first matching element). As an example, the
214code below returns the last paragraph of the document.
215
216        my $document = odf_document->get($source);
217        my $content = $document->get_part(CONTENT);
218        my $p = $part->get_element("//text:p", -1);
219
220However, this way is not the smartest one because it requires the knowledge
221of the ODF schema (and some XPath skills for more complicated cases). There
222are better ways to select the last paragraph of a document (and various other
223objects at any position in a document).
224
225lpOD provides more user-friendly, XPath-free methods for the most used elements
226in the C<CONTENT> part of a document. These methods are provided through the
227C<odf_element> class. Any individual element in a part is an C<odf_element>
228object. There is a shortcut to get the top (or root) element of any part: the
229C<get_root()> method. Once selected, the top element provides all the I<context
230methods> of the lpOD API.
231
232A I<context method> is a method owned by an element (the context) and whose
233effect is related to the children and descendants of this element. So, the
234C<get_xxx> method of a given element is a retrieval method intended to
235select something I<below> the current element. Thanks to the C<get_paragraph>
236element provided by the C<odf_element> class, the last example could be wrote
237as shown below:
238
239        my $document = odf_document->get($source);
240        my $context = $document->get_part(CONTENT)->get_root;
241        my $p = $context->get_paragraph(-1);
242
243In most cases (including the previous example), C<get_root> may be replaced
244by C<get_body>, that return a context containing all the visible elements
245(including the paragraphs).
246
247There is a generic context-based C<get_element> that differs from the part-based
248one. It allows the user to select an element according to its text content, one
249of its attributes, and/or its sequential position in the context. As an example,
250the sequence below displays the name of the last page that uses the draw page
251style "dp1" (assuming we are using a presentation or drawing document):
252
253        my $context = $document->get_part(CONTENT)->get_body;
254        my $page = $context->get_element(
255                'draw:page',
256                attribute       => 'style name',
257                value           => 'dp1',
258                position        => -1
259        );
260        say $page->get_attribute('name');
261
262lpOD provides special name-based retrieval methods for some elements that own
263unique names. For example the instruction below selects the table whose name
264is "T1" (if any):
265
266	$table = $context->get_table_by_name("T1");
267
268The C<meta> document part, unlike others such as the C<content> one, provides
269direct C<get> and C<set> accessors for the content of the usual metadata, so
270there is no need of a context element, as shown below in the following example
271that displays the title of a document:
272
273        my $document = odf_document->get($source);
274        my $meta = $document->get_part(META);
275        say $meta->get_title;
276
277The title (like an other metadata value) may be updated or created with the
278corresponding C<set> accessor:
279
280        $meta->set_title("The new title");
281
282All the properties of a previously selected element are stored in one or more
283I<attributes> and in a I<text>. So, for any C<odf_element> lpOD provides
284corresponding C<get> and C<set> accessors.
285
286C<get_text> returns the current text, while C<set_text> replaces the current
287content by a new text (possibly empty). Without argument, C<get_text> returns
288the text directly contained in the calling element, but with a C<recursive>
289optional named parameter set to C<TRUE>, it returns the concatenated texts of
290all the descendants of the calling element. On the other hand, C<set_text>
291deletes any previous content (i.e. direct text content and embedded elements
292such as bookmarks, variable fields, text segments with special styles, and
293so on).
294
295The C<get_attribute> method requires the name of the needs attribute. This name
296may be the technical name according to the OpenDocument specification, or a more
297simple and significant name. For example, assuming C<$item> is a I<list item>,
298and knowing that such an object may own a so-called C<text:restart-numbering>
299attribute telling that the list numbering must be restarted at this point from
300a given value, the following instruction sets this value to 6:
301
302        $item->set_attribute('restart numbering' => 6);
303
304C<set_attribute> deletes an existing attribute as soon as the given value is
305C<undef>; so the instruction below cancels the C<restart numbering> feature:
306
307        $item->set_attribute('restart numbering' => undef);
308
309Note that C<set_attribute>, provided with a non-null value, automatically
310I<creates> the attribute if it doesn't exist; there is no need to separately
311check an attribute for existence and create it before setting a value.
312
313It's possible to get or set more than one attributes in a single call using
314C<get_attributes> or C<set_attributes>. The first one returns the attributes
315as a hash reference (with the real ODF names), while the second one requires
316a hash reference as argument.
317
318An element may be removed (with all its descendants) using its C<delete> method.
319(Beware: the deletion of a high level element may destroy a lot of content !).
320It's possible to delete the whole content of an element without removing the
321element itself by issuing a C<set_text> with an empty string.
322
323The user is allowed to create a new element using the C<odf_create_element>
324constructor, that requires an appropriate ODF tag (corresponding to the type
325of element) or a valid XML string. Fortunately, lpOD provides a set of
326specialized constructors (such as C<odf_create_paragraph>, C<odf_create_table>,
327and so on) that may be used without knowledge of the XML stuff. Once created
328through such a constructor, the new element is not automatically included in
329a document. To do so, lpOD provides the C<insert_element> and C<append_element>
330methods, both context-based, i.e. called from an existing element that will
331become the parent of the new element. As an example, the sequence below creates
332a new paragraph (with given style and content), then appends it to a selected
333section:
334
335        my $document = odf_document->($source);
336        my $context = $document->get_part(CONTENT)->get_body;
337        my $section = $context->get_section("Prologue");
338        my $paragraph = odf_paragraph->create(
339                style => "Standard", text => "The End of the Beginning"
340                );
341        $section->append_element($paragraph);
342
343Elements may be created by replication of existing elements, thanks to the
344C<clone> method. The result of the instruction below is a copy of an existing
345section (with all its content); this copy is a "free" element (i.e. it's not
346included in any document, and it has no link with its prototype element), so
347it may be inserted elsewhere in the same document or in another document:
348
349        my $section = $context->get_section("Reusable");
350        my $free_section = $section->clone;
351
352=head1  Getting started
353
354=head2  The "Hello Word" example
355
356Unsurprisingly, we propose you to test your lpOD installation and your knowledge
357of the big picture through this simple program:
358
359        use ODF::lpOD;
360
361        my $doc = odf_document->create('text');
362        my $content = $doc->get_part(CONTENT);
363        my $context = $content->get_body;
364        $context->append_element(
365                odf_paragraph->create(
366                        style => "Standard",
367                        text => "Hello World !"
368                        )
369                );
370        $doc->save(target => "helloworld.odt");
371        exit;
372
373If this script runs without warning, open the "helloworld.odt" file using your
374favorite ODF-compliant text processor, and look at the text content. You may
375then introduce more sophistication using the metadata part of the document.
376To do so, you can (for example) insert the lines below somewhere before the
377C<save> instruction (and after the C<odf_document->create()> one).
378
379        my $meta = $doc->get_part(META);
380        $meta->set_title("Hello World Test");
381        $meta->set_creator("Me");
382
383After execution of the extended version, check the author's name and the
384title through the I<File/Properties> dialog of your ODF text editor.
385
386=head2  Using the documentation
387
388The L<ODF::lpOD::Tutorial> is a recommended first reading that may help
389to quickly gain a basic understanding and get started with lpOD. The
390reference documentation is split into the following manual chapters:
391
392=over
393
394=item *
395
396L<ODF::lpOD::Document>: General document packaging and metadata handling.
397
398=item *
399
400L<ODF::lpOD::Element>: Common features, available with any element.
401
402=item *
403
404L<ODF::lpOD::TextElement>: Text containers (paragraphs, headings), and various
405elements that may take place in paragraphs (bookmarks, index marks,
406bibliography marks, text variables and fields).
407
408=item *
409
410L<ODF::lpOD::Table>: Access to tables and their content.
411
412=item *
413
414L<ODF::lpOD::StructuredContainer>: High-level structures such as sections,
415lists, draw pages, shapes, image or text frames, tables of contents.
416
417=item *
418
419L<ODF::lpOD::Style>: Style retrieval, update, or creation
420
421=item *
422
423L<ODF::lpOD::Common>: Common utility functions
424
425=back
426
427An alternative tutorial, intended for French-reading users, is available at
428L<http://jean.marie.gouarne.online.fr/doc/introduction_lpod_perl.pdf>
429
430=head1	AUTHOR/COPYRIGHT
431
432Developer/Maintainer: Jean-Marie Gouarne L<http://jean.marie.gouarne.online.fr>
433Contact: jmgdoc@cpan.org
434
435Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend.
436Copyright (c) 2014 Jean-Marie Gouarne.
437
438This work was sponsored by the Agence Nationale de la Recherche
439(L<http://www.agence-nationale-recherche.fr>).
440
441License: GPL v3, Apache v2.0 (see LICENSE).
442
443=cut
444