• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

doc/H03-May-2022-498446

t/H29-Feb-2020-129

uulib/H29-Feb-2020-16,32211,764

COPYINGH A D27-Feb-2020660 1610

COPYING.ArtisticH A D03-Mar-20056 KiB13299

COPYING.GNUH A D12-Sep-200517.6 KiB341281

ChangesH A D29-Feb-202011.1 KiB260215

MANIFESTH A D29-Feb-2020707 4240

META.jsonH A D29-Feb-2020877 4342

META.ymlH A D29-Feb-2020483 2423

Makefile.PLH A D28-Feb-2020941 4233

READMEH A D29-Feb-202015.5 KiB436346

UUlib.pmH A D29-Feb-202017 KiB61865

UUlib.xsH A D28-Feb-202014 KiB659533

example-decoderH A D28-Feb-20203 KiB12571

perlmulticore.hH A D03-Mar-20197.8 KiB25659

typemapH A D28-Feb-2020303 1712

README

1NAME
2    Convert::UUlib - Perl interface to the uulib library (a.k.a.
3    uudeview/uuenview).
4
5SYNOPSIS
6     use Convert::UUlib ':all';
7
8     # read all the files named on the commandline and decode them
9     # into the CURRENT directory. See below for a longer example.
10     LoadFile $_ for @ARGV;
11
12     for my $uu (GetFileList) {
13        if ($uu->state & FILE_OK) {
14          $uu->decode;
15          print $uu->filename, "\n";
16        }
17     }
18
19DESCRIPTION
20    Read the file doc/library.pdf from the distribution for in-depth
21    information about the C-library used in this interface, and the rest of
22    this document and especially the non-trivial decoder program at the end.
23
24EXPORTED CONSTANTS
25  Action code constants
26      ACT_IDLE      we don't do anything
27      ACT_SCANNING  scanning an input file
28      ACT_DECODING  decoding into a temp file
29      ACT_COPYING   copying temp to target
30      ACT_ENCODING  encoding a file
31
32  Message severity levels
33      MSG_MESSAGE   just a message, nothing important
34      MSG_NOTE      something that should be noticed
35      MSG_WARNING   important msg, processing continues
36      MSG_ERROR     processing has been terminated
37      MSG_FATAL     decoder cannot process further requests
38      MSG_PANIC     recovery impossible, app must terminate
39
40  Options
41      OPT_VERSION   version number MAJOR.MINORplPATCH (ro)
42      OPT_FAST      assumes only one part per file
43      OPT_DUMBNESS  switch off the program's intelligence
44      OPT_BRACKPOL  give numbers in [] higher precendence
45      OPT_VERBOSE   generate informative messages
46      OPT_DESPERATE try to decode incomplete files
47      OPT_IGNREPLY  ignore RE:plies (off by default)
48      OPT_OVERWRITE whether it's OK to overwrite ex. files
49      OPT_SAVEPATH  prefix to save-files on disk
50      OPT_IGNMODE   ignore the original file mode
51      OPT_DEBUG     print messages with FILE/LINE info
52      OPT_ERRNO     get last error code for RET_IOERR (ro)
53      OPT_PROGRESS  retrieve progress information
54      OPT_USETEXT   handle text messages
55      OPT_PREAMB    handle Mime preambles/epilogues
56      OPT_TINYB64   detect short B64 outside of Mime
57      OPT_ENCEXT    extension for single-part encoded files
58      OPT_REMOVE    remove input files after decoding (dangerous)
59      OPT_MOREMIME  strict MIME adherence
60      OPT_DOTDOT    ".."-unescaping has not yet been done on input files
61      OPT_RBUF      set default read I/O buffer size in bytes
62      OPT_WBUF      set default write I/O buffer size in bytes
63      OPT_AUTOCHECK automatically check file list after every loadfile
64
65  Result/Error codes
66      RET_OK        everything went fine
67      RET_IOERR     I/O Error - examine errno
68      RET_NOMEM     not enough memory
69      RET_ILLVAL    illegal value for operation
70      RET_NODATA    decoder didn't find any data
71      RET_NOEND     encoded data wasn't ended properly
72      RET_UNSUP     unsupported function (encoding)
73      RET_EXISTS    file exists (decoding)
74      RET_CONT      continue -- special from ScanPart
75      RET_CANCEL    operation canceled
76
77  File States
78     This code is zero, i.e. "false":
79
80      UUFILE_READ   Read in, but not further processed
81
82     The following state codes are or'ed together:
83
84      FILE_MISPART  Missing Part(s) detected
85      FILE_NOBEGIN  No 'begin' found
86      FILE_NOEND    No 'end' found
87      FILE_NODATA   File does not contain valid uudata
88      FILE_OK       All Parts found, ready to decode
89      FILE_ERROR    Error while decoding
90      FILE_DECODED  Successfully decoded
91      FILE_TMPFILE  Temporary decoded file exists
92
93  Encoding types
94      UU_ENCODED    UUencoded data
95      B64_ENCODED   Mime-Base64 data
96      XX_ENCODED    XXencoded data
97      BH_ENCODED    Binhex encoded
98      PT_ENCODED    Plain-Text encoded (MIME)
99      QP_ENCODED    Quoted-Printable (MIME)
100      YENC_ENCODED  yEnc encoded (non-MIME)
101
102EXPORTED FUNCTIONS
103  Initializing and cleanup
104    Initialize is automatically called when the module is loaded and
105    allocates quite a small amount of memory for todays machines ;) CleanUp
106    releases that again.
107
108    On my machine, a fairly complete decode with DBI backend needs about
109    10MB RSS to decode 20000 files.
110
111    CleanUp
112        Release memory, file items and clean up files. Should be called
113        after a decoidng run, if you want to start a new one.
114
115  Setting and querying options
116    $option = GetOption OPT_xxx
117    SetOption OPT_xxx, opt-value
118
119    See the "OPT_xxx" constants above to see which options exist.
120
121  Setting various callbacks
122    SetMsgCallback [callback-function]
123    SetBusyCallback [callback-function]
124    SetFileCallback [callback-function]
125    SetFNameFilter [callback-function]
126
127  Call the currently selected FNameFilter
128    $file = FNameFilter $file
129
130  Loading sourcefiles, optionally fuzzy merge and start decoding
131    ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
132        Load the given file and scan it for encoded contents. Optionally tag
133        it with the given id, and if $delflag is true, delete the file after
134        it is no longer necessary. If you are certain of the part number,
135        you can specify it as the last argument.
136
137        A better (usually faster) way of doing this is using the
138        "SetFNameFilter" functionality.
139
140    $retval = Smerge $pass
141        If you are desperate, try to call "Smerge" with increasing $pass
142        values, beginning at 0, to try to merge parts that usually would not
143        have been merged.
144
145        Most probably this will result in garbled files, so never do this by
146        default, except:
147
148        If the "OPT_AUTOCHECK" option has been disabled (by default it is
149        enabled) to speed up file loading, then you *have* to call "Smerge
150        -1" after loading all files as an additional pre-pass (which is
151        normally done by "LoadFile").
152
153    $item = GetFileListItem $item_number
154        Return the $item structure for the $item_number'th found file, or
155        "undef" of no file with that number exists.
156
157        The first file has number 0, and the series has no holes, so you can
158        iterate over all files by starting with zero and incrementing until
159        you hit "undef".
160
161        This function has to walk the linear list of fils on each access, so
162        if you want to iterate over all items, it is usually faster to use
163        "GetFileList".
164
165    @items = GetFileList
166        Similar to "GetFileListItem", but returns all files in one go.
167
168  Decoding files
169    $retval = $item->rename ($newname)
170        Change the ondisk filename where the decoded file will be saved.
171
172    $retval = $item->decode_temp
173        Decode the file into a temporary location, use "$item->infile" to
174        retrieve the temporary filename.
175
176    $retval = $item->remove_temp
177        Remove the temporarily decoded file again.
178
179    $retval = $item->decode ([$target_path])
180        Decode the file to its destination, or the given target path.
181
182    $retval = $item->info (callback-function)
183
184  Querying (and setting) item attributes
185    $state = $item->state
186    $mode = $item->mode ([newmode])
187    $uudet = $item->uudet
188    $size = $item->size
189    $filename = $item->filename ([newfilename})
190    $subfname = $item->subfname
191    $mimeid = $item->mimeid
192    $mimetype = $item->mimetype
193    $binfile = $item->binfile
194
195  Information about source parts
196    $parts = $item->parts
197        Return information about all parts (source files) used to decode the
198        file as a list of hashrefs with the following structure:
199
200         {
201           partno   => <integer describing the part number, starting with 1>,
202           # the following member sonly exist when they contain useful information
203           sfname   => <local pathname of the file where this part is from>,
204           filename => <the ondisk filename of the decoded file>,
205           subfname => <used to cluster postings, possibly the posting filename>,
206           subject  => <the subject of the posting/mail>,
207           origin   => <the possible source (From) address>,
208           mimetype => <the possible mimetype of the decoded file>,
209           mimeid   => <the id part of the Content-Type>,
210         }
211
212        Usually you are interested mostly the "sfname" and possibly the
213        "partno" and "filename" members.
214
215  Functions below are not documented and not very well tested - feedback welcome
216      QuickDecode
217      EncodeMulti
218      EncodePartial
219      EncodeToStream
220      EncodeToFile
221      E_PrepSingle
222      E_PrepPartial
223
224  EXTENSION FUNCTIONS
225    Functions found in this module but not documented in the uulib
226    documentation:
227
228    $msg = straction ACT_xxx
229        Return a human readable string representing the given action code.
230
231    $msg = strerror RET_xxx
232        Return a human readable string representing the given error code.
233
234    $str = strencoding xxx_ENCODED
235        Return the name of the encoding type as a string.
236
237    $str = strmsglevel MSG_xxx
238        Returns the message level as a string.
239
240    SetFileNameCallback $cb
241        Sets (or queries) the FileNameCallback, which is called whenever the
242        decoding library can't find a filename and wants to extract a
243        filename from the subject line of a posting. The callback will be
244        called with two arguments, the subject line and the current
245        candidate for the filename. The latter argument can be "undef",
246        which means that no filename could be found (and likely no one
247        exists, so it is safe to also return "undef" in this case). If it
248        doesn't return anything (not even "undef"!), then nothing happens,
249        so this is a no-op callback:
250
251           sub cb {
252              return ();
253           }
254
255        If it returns "undef", then this indicates that no filename could be
256        found. In all other cases, the return value is taken to be the
257        filename.
258
259        This is a slightly more useful callback:
260
261          sub cb {
262             return unless $_[1]; # skip "Re:"-plies et al.
263             my ($subject, $filename) = @_;
264             # if we find some *.rar, take it
265             return $1 if $subject =~ /(\w+\.rar)/;
266             # otherwise just pass what we have
267             return ();
268          }
269
270LARGE EXAMPLE DECODER
271    The general workflow for decoding is like this:
272
273    1. Configure options with "SetOption" or "SetXXXCallback".
274    2. Load all source files with "LoadFile".
275    3. Optionally "Smerge".
276    4. Iterate over all "GetFileList" items (i.e. result files).
277    5. "CleanUp" to delete files and free items.
278
279    What follows is the file "example-decoder" from the distribution that
280    illustrates the above worklfow in a non-trivial example.
281
282       #!/usr/bin/perl
283
284       # decode all the files in the directory uusrc/ and copy
285       # the resulting files to uudst/
286
287       use Convert::UUlib ':all';
288
289       sub namefilter {
290          my ($path) = @_;
291
292          $path=~s/^.*[\/\\]//;
293
294          $path
295       }
296
297       sub busycb {
298          my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
299          $_[0]=straction($action);
300          print "busy_callback(", (join ",",@_), ")\n";
301          0
302       }
303
304       SetOption OPT_RBUF, 128*1024;
305       SetOption OPT_WBUF, 1024*1024;
306       SetOption OPT_IGNMODE, 1;
307       SetOption OPT_IGNMODE, 1;
308       SetOption OPT_VERBOSE, 1;
309
310       # show the three ways you can set callback functions. I normally
311       # prefer the one with the sub inplace.
312       SetFNameFilter \&namefilter;
313
314       SetBusyCallback "busycb", 333;
315
316       SetMsgCallback sub {
317          my ($msg, $level) = @_;
318          print uc strmsglevel $_[1], ": $msg\n";
319       };
320
321       # the following non-trivial FileNameCallback takes care
322       # of some subject lines not detected properly by uulib:
323       SetFileNameCallback sub {
324          return unless $_[1]; # skip "Re:"-plies et al.
325          local $_ = $_[0];
326
327          # the following rules are rather effective on some newsgroups,
328          # like alt.binaries.games.anime, where non-mime, uuencoded data
329          # is very common
330
331          # if we find some *.rar, take it as the filename
332          return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
333
334          # one common subject format
335          return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
336
337          # - filename.par (04/55)
338          return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
339
340          # - (xxx) No. 1 sayuri81.jpg 756565 bytes
341          # - (20 files) No.17 Roseanne.jpg [2/2]
342          return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
343
344          # try to detect some common forms of filenames
345          return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
346
347          # otherwise just pass what we have
348          ()
349       };
350
351       # now read all files in the directory uusrc/*
352       for (<uusrc/*>) {
353          my ($retval, $count) = LoadFile ($_, $_, 1);
354          print "file($_), status(", strerror $retval, ") parts($count)\n";
355       }
356
357       SetOption OPT_SAVEPATH, "uudst/";
358
359       # now wade through all files and their source parts
360       for my $uu (GetFileList) {
361          print "file ", $uu->filename, "\n";
362          print " state ", $uu->state, "\n";
363          print " mode ", $uu->mode, "\n";
364          print " uudet ", strencoding $uu->uudet, "\n";
365          print " size ", $uu->size, "\n";
366          print " subfname ", $uu->subfname, "\n";
367          print " mimeid ", $uu->mimeid, "\n";
368          print " mimetype ", $uu->mimetype, "\n";
369
370          # print additional info about all parts
371          print " parts";
372          for ($uu->parts) {
373             for my $k (sort keys %$_) {
374                print " $k=$_->{$k}";
375             }
376             print "\n";
377          }
378
379          $uu->remove_temp;
380
381          if (my $err = $uu->decode) {
382             print " ERROR ", strerror $err, "\n";
383          } else {
384             print " successfully saved as uudst/", $uu->filename, "\n";
385          }
386       }
387
388       print "cleanup...\n";
389
390       CleanUp;
391
392PERLMULTICORE SUPPORT
393    This module supports the perlmulticore standard (see
394    <http://perlmulticore.schmorp.de/> for more info) for the following
395    functions - generally these are functions accessing the disk and/or
396    using considerable CPU time:
397
398       LoadFile
399       $item->decode
400       $item->decode_temp
401       $item->remove_temp
402       $item->info
403
404    The perl interpreter will be reacquired/released on every callback
405    invocation, so for performance reasons, callbacks should be avoided if
406    that is costly.
407
408    Future versions might enable multicore support for more functions.
409
410BUGS AND LIMITATIONS
411    The original uulib library this module uses was written at a time where
412    main memory of measured in megabytes and buffer overflows as a security
413    thign didn't exist. While a lot of security fixes have been applied over
414    the years (includign some defense in depth mechanism that can shield
415    against a lot of as-of-yet undetected bugs), using this library for
416    security purposes requires care.
417
418    Likewise, file sizes when the uulib library was written were tiny
419    compared to today, so do not expect this library to handle files larger
420    than 2GB.
421
422    Lastly, this module uses a very "C-like" interface, which means it
423    doesn't protect you from invalid points as you might expect from "more
424    perlish" modules - for example, accessing a file item object after
425    callinbg "CleanUp" will likely result in crashes, memory corruption, or
426    worse.
427
428AUTHOR
429    Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
430    written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
431    heavily bugfixed by Marc Lehmann.
432
433SEE ALSO
434    perl(1), uudeview homepage at <http://www.fpx.de/fp/Software/UUDeview/>.
435
436