• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

t/H28-Mar-2004-338256

ArbBiLex.pmH A D28-Mar-200432.9 KiB926247

ChangeLogH A D28-Mar-20041.6 KiB4831

MANIFESTH A D28-Mar-2004226 1312

MANIFEST.SKIPH A D24-Mar-200453 86

META.ymlH A D28-Mar-2004301 119

Makefile.PLH A D25-Mar-2004641 2617

READMEH A D28-Mar-20046.2 KiB179128

README

1README for Sort::ArbBiLex
2                                        Time-stamp: "2004-03-27 17:19:01 AST"
3
4			    Sort::ArbBiLex
5	      (arbitrary bi-level lexicographic sorting)
6
7[Partially excerpted from the POD.]
8
9Sort::ArbBiLex -- make sort functions for arbitrary sort orders
10
11NAME
12    Sort::ArbBiLex -- make sort functions for arbitrary sort orders
13
14SYNOPSIS
15      use Sort::ArbBiLex;
16
17      *fulani_sort = Sort::ArbBiLex::maker(  # defines &fulani_sort
18        "a A
19         c C
20         ch Ch CH
21         ch' Ch' CH'
22         e E
23         l L
24         lh Lh LH
25         n N
26         r R
27         s S
28         u U
29         z Z
30        "
31      );
32      @words = <>;
33      @stuff = fulani_sort(@words);
34      foreach (@stuff) { print "<$_>\n" }
35
36CONCEPTS
37    Writing systems for different languages usually have specific sort
38    orders for the glyphs (characters, or clusters of characters) that each
39    writing system uses. For well-known national languages, these different
40    sort orders (or someone's idea of them) are formalized in the locale for
41    each such language, on operating system flavors that support locales.
42    However, there are problems with locales; cf. the perllocale manpage.
43    Chief among the problems relevant here are:
44
45    * The basic concept of "locale" conflates language/dialect, writing
46    system, and character set -- and country/region, to a certain extent.
47    This may be inappropriate for the text you want to sort. Notably, this
48    assumes standardization where none may exist (what's THE sort order for
49    a language that has five different Roman-letter-based writing systems in
50    use?).
51
52    * On many OS flavors, there is no locale support.
53
54    * Even on many OS flavors that do suport locales, the user cannot create
55    his own locales as needed.
56
57    * The "scope" of a locale may not be what the user wants -- if you want,
58    in a single program, to sort the array @foo by one locale, and an array
59    @bar by another locale, this may prove difficult or impossible.
60
61    In other words, locales (even if available) may not sort the way you
62    want, and are not portable in any case.
63
64    This module is meant to provide an alternative to locale-based sorting.
65
66    This module makes functions for you that implement bi-level
67    lexicographic sorting according to a sort order you specify.
68    "Lexicographic sorting" means comparing the letters (or properly,
69    "glyphs") in strings, starting from the start of the string (so that
70    "apple" comes after "apoplexy", say) -- as opposed to, say, sorting by
71    numeric value. "Lexicographic sorting" is sometimes used to mean just
72    "ASCIIbetical sorting", but I use it to mean the sort order used by
73    *lexicograph*ers, in dictionaries (at least for alphabetic languages).
74
75    Consider the words "resume" and "r�sum�" (the latter should display on
76    your POD viewer with acute accents on the e's). If you declare a sort
77    order such that e-acute ("�") is a letter after e (no accent), then
78    "r�sum�" (with accents) would sort after every word starting with "re"
79    (no accent) -- so "r�sum�" (with accents) would come after "reward".
80
81    If, however, you treated e (no accent) and e-acute as the same letter,
82    the ordering of "resume" and "r�sum�" (with accents) would be
83    unpredictable, since they would count as the same thing -- whereas
84    "resume" should always come before "r�sum�" (with accents) in English
85    dictionaries.
86
87    What bi-level lexicographic sorting means is that you can stipulate that
88    two letters like e (no accent) and e-acute ("�") generally count as the
89    same letter (so that they both sort before "reward"), but that when
90    there's a tie based on comparison that way (like the tie between
91    "resume" and "r�sum�" (with accents)), the tie is broken by a
92    stipulation that at a second level, e (no accent) does come before e-
93    acute ("�").
94
95    (Some systems of sort order description allow for any number of levels
96    in sort orders -- but I can't imagine a case where this gets you
97    anything over a two-level sort.)
98
99    Moreover, the units of sorting for a writing system may not be
100    characters exactly. In some forms of Spanish, ch, while two characters,
101    counts as one glyph -- a "letter" after c (at the first level, not just
102    the second, like the e in the paragraph above). So "cuerno" comes
103    *before* "chile". A character-based sort would not be able to see that
104    "ch" should count as anything but "c" and "h". So this library doesn't
105    assume that the units of comparison are individual characters.
106
107[end POD excerpt]
108
109
110PREREQUISITES
111
112This suite requires Perl 5; I've only used it under Perl 5.004, so for
113anything lower, you're on your own.
114
115Sort::ArbBiLex doesn't use any nonstandard modules.
116
117
118INSTALLATION
119
120You install Sort::ArbBiLex, as you would install any perl module
121library, by running these commands:
122
123   perl Makefile.PL
124   make
125   make test
126   make install
127
128If you want to install a private copy of Sort::ArbBiLex in your home
129directory, then you should try to produce the initial Makefile with
130something like this command:
131
132  perl Makefile.PL LIB=~/perl
133
134Then you may need something like
135  setenv PERLLIB "$HOME/perl"
136in your shell initialization file (e.g., ~/.cshrc).
137
138For further information, see perldoc perlmodinstall
139
140
141DOCUMENTATION
142
143POD-format documentation is included in ArbBiLex.pm.  POD is readable
144with the 'perldoc' utility.  See ChangeLog for recent changes.
145
146
147MACPERL INSTALLATION NOTES
148
149Don't bother with the makefiles.  Just make a Sort directory in your
150MacPerl site_lib or lib directory, and move ArbBiLex.pm into there.
151
152
153SUPPORT
154
155Questions, bug reports, useful code bits, and suggestions for
156Sort::ArbBiLex should just be sent to me at sburke@cpan.org
157
158
159AVAILABILITY
160
161The latest version of Sort::ArbBiLex is available from the
162Comprehensive Perl Archive Network (CPAN).  Visit
163<http://www.perl.com/CPAN/> to find a CPAN site near you.
164
165
166COPYRIGHT
167
168Copyright 1999-2004, Sean M. Burke <sburke@cpan.org>, all rights
169reserved.  This program is free software; you can redistribute it
170and/or modify it under the same terms as Perl itself.
171
172This program is distributed in the hope that it will be useful, but
173without any warranty; without even the implied warranty of
174merchantability or fitness for a particular purpose.
175
176AUTHOR
177
178Sean M. Burke <sburke@cpan.org>
179