1=encoding utf8
2
3=head1 OSCON scripts
4
5Because I misplaced the first of these I wrote, I ended up writing them twice.
6
7=head2 OSCON scripts description #1/2
8
9    HOLY TRIO OF INDISPENSABLE TOOLS FOR UNDERSTANDING THE UCD & UNICODE IN GENERAL
10        unichars - show which code points match arbitrary criteria
11        uniprops - show which props a code point has (by number or name, etc)
12        uninames - intelligrep the now-excised NameList.txt (included)
13
14    REWRITES OF CRITICAL UNIX PROGRAMS:
15        uniquote - replacement for od(1) or -v option to cat(1), but for Unicode
16        tcgrep - very ancient grep(1) replacment, needs rewrite but now supports named character
17        unilook - look(1) rewrite but with grep and agrep support; requires included words.utf8 file
18        ucsort - sort(1) rewrite using the UCA, includes Unicode locales, and intelligent --pre stuff
19        unifmt - fmt(1) rewrite, using the ULA; both smarter and dumber than Damian's
20        rename - ancient rewrite of Larry's old rename(1) rewrite; might help Unicode filesyssues
21        uniwc - wc(1) rewrite for Unicode, includes \R support, graphemes, etc; needs refactoring
22
23    PROGRAMS FOR NORMALIZATION FILTERS, CHECKER
24        nfd, nfc, nfkd, nfkc - Unicode normalization filters
25        nfcheck - report which which of NF{,K}[DC} apply to any given file
26          % nfcheck leo hantest nunez tc macroman
27            leo:        NFC      NFD
28            hantest:    NFC
29            nunez:      NFC NFKC
30            tc:         NFC NFKC NFD NFKD
31
32    (RE)CASING FILTER PROGRAMS:
33        lc - filter to do the Unicode toLower casemapping
34            % echo "Filter to Convert a Title's Words to the Right Case" | lc
35              filter to convert a title's words to the right case
36        tc - filter to do the Unicode toTitle casemapping (intelligently)
37            % echo "filter to convert a title's words to the right case" | tc
38              Filter To Convert A Title's Words To The Right Case
39        titulate - \u\L-converts string args to English **HEADLINE** case (NB: headline != titlecase)
40            % titulate "filter to CONVERT a title's words to the right case"
41              Filter to Convert a Title's Words to the Right Case
42        uc - filter to do the Unicode toUpper casemapping
43            % echo "filter to convert a title's words to the right case" | uc
44              FILTER TO CONVERT A TITLE'S WORDS TO THE RIGHT CASE
45
46    FONT GAME PROGRAMS:
47        leo - uʍopəpᴉsdn sƃuᴉɥʇ əʇᴉɹʍ oʇ ɹəʇlᴉɟ
48        unifont - filter for showing all Unicode "alternate font" letters
49            % echo hic sunt data unicodica | unifont
50                    Double-Struck: ������ �������� �������� ������������������
51                        Monospace: ������ �������� �������� ������������������
52                       Sans-Serif: ������ �������� �������� ������������������
53                Sans-Serif Italic: ������ �������� �������� ������������������
54                  Sans-Serif Bold: ������ �������� �������� ������������������
55           Sans-Serif Bold Italic: ������ �������� �������� ������������������
56                           Script: ������ �������� �������� ��������ℴ��������
57                           Italic: h���� �������� �������� ������������������
58                             Bold: ������ �������� �������� ������������������
59                      Bold Italic: ������ �������� �������� ������������������
60                          Fraktur: ������ �������� �������� ������������������
61                     Bold Fraktur: ������ �������� �������� ������������������
62        unicaps - Fɪʟᴛᴇʀ ᴛᴏ ᴄᴏɴᴠᴇʀᴛ ᴛᴏ sᴍᴀʟʟ ᴄᴀᴘs
63        unisubs, unisupers - filter to show subscripted₁₉₈₇ and ˢᵘᵖᵉʳˢᶜʳⁱᵖᵗᵉᵈ versions
64        unititle - prototype to over/underline things (real version inprogress)
65        uniwide, uninarrow - reversable filters for converting to FULLWIDTH equivs
66
67    TEST AND DEMO PROGRAMS:
68        macroman - show mapping between MacRoman and Unicode
69        byte2uni - early prototype of general-purpose version of the macroman
70            DEMO: byte2uni -a -ecp1252
71        es-sort - how to do fancy UCA sorts, using Iberian city-names
72        hantest - demo of Unihan stuff and Unicode::{LineBreak, GCString}
73        havshpx - vs lbh unir gb nfx, lbh qb abg jnag gb xabj
74        hypertest - demo support trans-Unicode code point support
75        nunez - demo accent-insensitive searches; ¡MUY BIEN COMENTADO!
76        vowel-sigs - show how to create your own properties; also, regex subroutines
77
78    MODULES
79        ForbidUnderscore.pm - "no Underscore;" forbids unlocalized $_ access
80        FixString.pm - tries to sort text items with numbers, including Roman, intelligently,
81                       includes support for Unicode Romans, and for Romans written in Latin
82                       script, but requires Roman.pm module for the latter.  Falls back to the UCA.
83        tchrist-unicode-charclasses__alpha.java - EGAD! I talked them into making most of
84                       this functionality part of JDK7.
85
86    LIBRARIES:
87        unicore/{all,html,uwords}_alias.pl - a forgotten charnames facility
88
89    FILES:
90        words.utf8 - dictionary list needed for for unilook
91
92=head2 OSCON scripts description #2/2
93
94    Modules:
95        FixString.pm            - program & module to do "logical" sorting w/numbers
96        ForbidUnderscore.pm     - forbid unlocalized $_ with no Underscore
97
98    Libraries:
99        unicore/html_alias.pl   - allows for customer charclass names \N{egrave} etc
100        unicore/uwords_alias.pl - ditto with specials for unilook, like \N{spu}
101        unicore/all_alias.pl    - both the above
102
103    Programs for probing the UCD:
104        unichars                - list characters for one or more properties
105        uniprops                - list regex properties of one or more characters
106        uninames                - search the current Unicode NamesList
107
108    Encoding Demos
109        macroman                - how the MacRoman encoding maps to Unicode
110        byte2uni                - generalized `macroman` program; try `byteuni -a -cp1252`
111
112    Unix Tool Rewrites
113        (not fmt  but) unifmt       - like `fmt` but uses the Unicode Linebreaking Algorithm (ULA)
114        (not grep but) tcgrep       - like `grep`, but groks unicode patterns and data
115        (not look but) unilook      - improved `look` + `grep` + `agrep` on included `words.utf8`
116        (not mv   but) rename       - a better version of rename, takes a perl pattern
117        (not od   but) uniquote     - like `cat -v` or `od`, but better
118        (not sort but) ucsort       - `sort` input records according to the Unicode Collation Algorithm (UCA)
119        (not wc   but) uniwc        - Unicode rewrite a `wc` (needs nonslurpy rewrite)
120
121    Casing Filters
122        lc                          - filter into Unicode lowercase
123        uc                          - filter into Unicode uppercase
124        tc                          - filter into Unicode titlecase+lowercase
125        titulate                    - like tc but used English headline rulers
126
127    Normalization Filters
128        nfc                         - convert to NFD
129        nfd                         - convert to NFD
130        nfkc                        - convert to NFKC
131        nfkd                        - convert to NFKD
132        nfcheck                     - report which NF forms file(s) are in
133
134    Font Games
135        unifont                     - display equivalent Math/fonted versions
136        leo                         - write like Leonardo
137        unicaps                     - convert lowercase to Unicode small caps
138        unisubs                     - show equivalent subscripts where available
139        unisupers                   - show equivalent supercripts where available
140        uniwide                     - convert regular text to full-width if possible
141        uninarrow                   - convert full-wdith to regular width if possible
142        unititle                    - prototype to use combining underlines
143
144    Demos and Test Programs
145        es-sort                 - demo how to use a custom UCA on Spanish cities
146        hantest                 - demo various Unihan bits, including the ULA
147        havshpx                 - you have to figure this one out yourself
148        hypertest                       - demo forbidden Unicode chars, like supers and hypers
149        vowel-sigs                      - get the CVCVVC signatures for word
150        nunez                   - demo how to use the UCA for accent-insensitive searching
151
152=head1 Uses of Unicode in Perl identifiers in OSCON scripts
153
154A few use Unicode not just in literals, but in identifiers, too:
155
156    % tcgrep '((^\h*sub\h+)|[\$\@%&])\p{ASCII}*\P{ASCII}' *
157    byte2uni:           $display_char = "\N{SYMBOL FOR SUBSTITUTE FORM TWO}",        # ␦
158    hantest:$path = "婴儿服饰";
159    hypertest:my @ὑπέρμεγας = (
160    hypertest:    ὑπέρμεγας           => \@ὑπέρμεγας,
161    leo:        my    $ʇndʇno = uʍopəpᴉƨdn($input);
162    leo:        say   $ʇndʇno;
163    leo:sub uʍopəpᴉƨdn($) {
164    leo:    tr [-¯_#&'"“”‘’!¡?¿,.]
165    mismaps:my @ɪsᴏ = map { "iso-$_" } ratsort qw{
166    mismaps:my @μsoft = map { "cp$_"} ratsort qw{
167    mismaps:my @鯉 = ratsort <koi8-{f,u,r}>;
168    mismaps:my @all_tests =  (@μsoft, @ɪsᴏ, @apple, @鯉, @etc);
169    mismaps:        dos         => \@μsoft,
170    mismaps:        microsoft   => \@μsoft,
171    mismaps:        ms          => \@μsoft,
172    mismaps:        windows     => \@μsoft,
173    mismaps:        win         => \@μsoft,
174    mismaps:        posix       => \@ɪsᴏ,
175    mismaps:        iso         => \@ɪsᴏ,
176    mismaps:        standard    => \@ɪsᴏ,
177    mismaps:        std         => \@ɪsᴏ,
178    mismaps:        koi         => \@鯉,
179    nunez:my $INCLUÍR_NINGUNOS               = 0;
180    nunez:my $SI_IMPORTAN_MARCAS_DIACRÍTICAS = 0;
181    nunez:sub sí_ó_no(_) { $_[0] ? "sí" : "no" }
182    nunez:my @ciudades_españolas = ordenar_a_la_española(<<'LA_ÚLTIMA' =~ /\S.*\S/g);
183    nunez:my $cmáx = -(2 + max map { length } @ciudades_españolas);
184    nunez:my @búsquedas = < {A,E,I,O,U}N AL >;
185    nunez:my $bmáx = -(2 + max map { length } @búsquedas);
186    nunez:for my $aldea (@ciudades_españolas) {
187    nunez:    my $déjà_imprimée;  # Mais oui!  C’est en français celle‐ci!
188    nunez:    for my $búsqueda (@búsquedas) {
189    nunez:        my @resultados = $ordenador->gmatch($aldea, $búsqueda);
190    nunez:        next unless @resultados || $INCLUÍR_NINGUNOS;
191    nunez:                $cmáx => !$déjà_imprimée++ && encomillar($aldea),
192    nunez:                $bmáx => "/$búsqueda/",
193    nunez:sub cuántos_sitios {
194    nunez:sub ordenar_a_la_española {
195    nunez:    state $ordenador_a_la_española = new Unicode::Collate::
196    nunez:    return $ordenador_a_la_española->sort(@lista);
197    ucsort:    ($OFS, $IFS)  if /\x{FFFF}/;  # déjà vu
198    uniquote:        my $fh = $file;   # is *so* a lexical filehandle! ☺
199    uniquote:sub commaʼd_list {
200
201=head1 Demos of OSCON Unicode scripts
202
203There's absolutely nothing like examples, so here are five sets.
204
205=head2  Demo of uniprops
206
207    uniprops '['
208    uniprops '[' '{' ')'
209    uniprops '[' '{' '<'
210    uniprops '[' '{' '>'
211    uniprops ']'
212    uniprops 08
213    uniprops a8
214    uniprops 00ff
215    uniprops 180B
216    uniprops 180B 303E
217    uniprops 180B 303E FE01
218    uniprops 180B 303E FE01 E0101
219    uniprops 2026
220    uniprops 202e
221    uniprops 2058
222    uniprops 2060
223    uniprops 2062
224    uniprops 20a8
225    uniprops 20e0
226    uniprops 2163
227    uniprops 2241
228    uniprops 2421
229    uniprops 2461
230    uniprops 26bd
231    uniprops 2e2c
232    uniprops 3000
233    uniprops fb01
234    uniprops feff
235    uniprops ffef
236    uniprops FFFD
237    uniprops 1011c
238    uniprops 1101c
239    uniprops 12000
240    uniprops 13000
241    uniprops 1F42A
242    uniprops 1F42A '$'
243    uniprops 1F42A '$' % @
244    uniprops 1F4A9
245    uniprops 1F608
246    uniprops 4
247    uniprops -a 03 08
248    uniprops -a 1F42a
249    uniprops -a 2062
250    uniprops -a 20e0
251    uniprops -a 3350
252    uniprops -a 3 8
253    uniprops -a 4
254    uniprops -a FFFD
255    uniprops -a 'MATHEMATICAL BOLD FRAKTUR CAPITAL T'
256    uniprops -ga 4
257    uniprops -gl | less -r
258    uniprops HYPHEN
259    uniprops -l
260    uniprops 'LADY BEETLE'
261    uniprops -l | less -r
262    uniprops -taw75 20e0
263    uniprops -w75 -a 20e0
264
265=head2  Demo of uninames
266
267    uninames ancient
268    uninames ankh
269    uninames arrow
270    uninames ass
271    uninames AT
272    uninames atom
273    uninames AT SIGN
274    uninames '\bAA\b'
275    uninames ball
276    uninames BALL
277    uninames balls
278    uninames '\bALPHA\b'
279    uninames '\band\b'
280    uninames '\bAND\b'
281    uninames '\bAT\b'
282    uninames beetle
283    uninames '\b[IJ]\b' -WITH
284    uninames bird
285    uninames black letter
286    uninames BLACK LETTER
287    uninames BLACK-LETTER
288    uninames '\bNO\b'
289    uninames BOLD TWO
290    uninames book
291    uninames brac
292    uninames brace
293    uninames brok
294    uninames '\bSCRIPT\b'
295    uninames '\bSCRIPT\b' -MATHEMATICAL
296    uninames '\bTAU\b'
297    uninames '\bT\b' WITH
298    uninames bug
299    uninames bullet
300    uninames burro
301    uninames '\bY\b' | tcgrep -v '^\t' | ucsort | less -r
302    uninames '\bz\b'
303    uninames '\bZ\b'
304    uninames '\bz\b' | tcgrep -v '^\t' | ucsort | less -r
305    uninames '\bZ\b' | tcgrep -v '^\t' | ucsort | less -r
306    uninames camel
307    uninames CAMEL
308    uninames care
309    uninames CARE
310    uninames caution
311    uninames chi
312    uninames CHI
313    uninames CIRCL
314    uninames circled
315    uninames circle k
316    uninames circle 'one|two'
317    uninames clown
318    uninames colon
319    uninames COMB 'HOOK|TAIL|CURV'
320    uninames COMBIN
321    uninames combin ferm
322    uninames combin hacek
323    uninames combining
324    uninames COMBINING
325    uninames COMBINING DOTS
326    uninames combining enclosing
327    uninames combining enclosing prohib
328    uninames COMBIN REV SOL
329    uninames COMB LINE
330    uninames COMB 'MACRO|LINE'
331    uninames commer
332    uninames commonly abbreviated
333    uninames coptic
334    uninames cross
335    uninames crying
336    uninames CUEN | tcgrep -v '^\t'
337    uninames cun
338    uninames CUN
339    uninames CUNE
340    uninames CUNEI | tcgrep -v '^\t'
341    uninames CUNI
342    uninames CUNIE
343    uninames curren
344    uninames currenc
345    uninames d7
346    uninames dagger
347    uninames dash
348    uninames dead
349    uninames desert
350    uninames destr
351    uninames diaer
352    uninames DIAER
353    uninames diag
354    uninames divis
355    uninames does not prevent
356    uninames does not prevent | fmt
357    uninames dog
358    uninames donkey
359    uninames DOT ABOVE
360    uninames DOT CIRC
361    uninames dots
362    uninames double
363    uninames DOUBLE
364    uninames DOUBLE HY
365    uninames DOUBLE MATH
366    uninames DOUBLE QUOT
367    uninames DOUBLE STR
368    uninames DOUBLE STR CAPITAL -MATH
369    uninames DOUBLE STR CAPITAL -MATH | tail
370    uninames double struc
371    uninames DOUBLE STRUCK
372    uninames double struct
373    uninames DOUBL ITALI
374    uninames earth
375    uninames edit
376    uninames EIGHTEEN
377    uninames ellip
378    uninames em
379    uninames ENC CIR
380    uninames EQUAL
381    uninames EQUIV
382    uninames evil
383    uninames example
384    uninames exclam
385    uninames EYE
386    uninames face
387    uninames FACE
388    uninames fair
389    uninames farthing
390    uninames feather
391    uninames fem
392    uninames fermata
393    uninames ff lig
394    uninames flash
395    uninames flip
396    uninames fl lig
397    uninames four
398    uninames fractu
399    uninames FRAKT
400    uninames fraktu
401    uninames fraktur
402    uninames fullwidth
403    uninames gothic
404    uninames GOTHIC
405    uninames GREEK LETTER WITH
406    uninames GREEK LETTER WITH | tcgrep -v '^\t' | ucsort | less -r
407    uninames GREEK PHI
408    uninames GREEK SUBSCRIPT
409    uninames greek yp
410    uninames GREEK YP
411    uninames gun
412    uninames hallo
413    uninames hazard
414    uninames head
415    uninames HEAD
416    uninames heart
417    uninames HEART
418    uninames HIERO
419    uninames horse
420    uninames hurt
421    uninames hyphen
422    uninames HYPHEN
423    uninames ideo stop
424    uninames insect
425    uninames insters
426    uninames INSULAR
427    uninames INSULAR | lc
428    uninames INSULAR | lc | tcgrep -v '^\t'
429    uninames INSULAR | uc
430    uninames INSULAR | uc | tcgrep -v '^\t'
431    uninames intro
432    uninames invis
433    uninames invisible
434    uninames iota sub
435    uninames iso
436    uninames jackol
437    uninames jong
438    uninames lake
439    uninames left bracket
440    uninames left single quot
441    uninames LESS
442    uninames LESS THAN
443    uninames LIG
444    uninames liga ff
445    uninames liga gg
446    uninames ligat fi
447    uninames ligature
448    uninames ligature -arabic
449    uninames light
450    uninames magic
451    uninames mah jong
452    uninames mah jong | tcgrep -v '^\t'
453    uninames male
454    uninames MATH CAPIT FRAK
455    uninames MATH DIGIT
456    uninames MATHE
457    uninames MATHEM
458    uninames MATHEM '\bA\b'
459    uninames MATHEM CAPITA '\bA\b'
460    uninames MATHEM CAPITA '\bA\b' | grep font
461    uninames MATHEM CAPITA '\bA\b' | grep -v font
462    uninames MATH FRACTU BOLD
463    uninames MATH SCRIPT CAPITAL
464    uninames -MATH SCRIPT E
465    uninames MATH SCRIPT E
466    uninames MATH SCRIPT SMALL
467    uninames -MATH -SUB -SUPER SCRIPT
468    uninames -MATH -SUB -SUPER SCRIPT E
469    uninames 'MODIFIER|(?i:superscript)'
470    uninames MODIFIER -LETTER
471    uninames moon
472    uninames mountain
473    uninames multi
474    uninames music
475    uninames music -combin
476    uninames music -combin | tcgrep -v '^\t'
477    uninames MUSIC SHARP
478    uninames NL
479    uninames no one under
480    uninames numeral
481    uninames oasis
482    uninames one
483    uninames one way
484    uninames ordina
485    uninames pain
486    uninames pen
487    uninames people
488    uninames person
489    uninames PG
490    uninames phi
491    uninames PILE POO
492    uninames '\pL' '\p{Latin}' | less -r
493    uninames plum
494    uninames PLUS
495    uninames poo
496    uninames POO
497    uninames power
498    uninames pumpkin
499    uninames punct
500    uninames quill
501    uninames quot
502    uninames radio
503    uninames right bracket
504    uninames right left
505    uninames RIGHT LEFT
506    uninames roman
507    uninames ROMAN NUM
508    uninames roman numeral
509    uninames round
510    uninames rx
511    uninames Rx
512    uninames SAME
513    uninames santa
514    uninames script
515    uninames SCRIPT
516    uninames set
517    uninames SET
518    uninames sex
519    uninames Sigma
520    uninames skull
521    uninames slash
522    uninames soccer
523    uninames space
524    uninames SPACE
525    uninames spanish
526    uninames sphere
527    uninames square
528    uninames st
529    uninames start
530    uninames st lig
531    uninames stlig
532    uninames subscript -SUBSCRIPT
533    uninames sun
534    uninames SUN
535    uninames 'SU(PER|B)SCRIPT|MODIFIER' '\b[AT]'
536    uninames 'SU(PER|B)SCRIPT|MODIFIER LETTER' '\b[AT]'
537    uninames superscript
538    uninames switch
539    uninames SYM DEL
540    uninames teste
541    uninames testi
542    uninames thin
543    uninames tilde
544    uninames times
545    uninames tongue
546    uninames traffic
547    uninames TWO DOT LEAD
548    uninames VARIATION
549    uninames VARIATION | grep -c VARIATION
550    uninames wand
551    uninames warn
552    uninames WIDTH
553    uninames -WITH '\bAND\b'
554    uninames WITH BAR
555    uninames WITH SLASH
556    uninames WITH STROKE
557    uninames WITH STROKE '\b[BROKENNESS]\b'
558    uninames wiz
559    uninames writ
560    uninames wrong
561    uninames yuan
562    uninames zero
563    uninames ZERO
564
565=head2  Demo of unichars
566
567unichars is the most important
568and useful program, so here are 861 of them, ucsorted, of course.
569
570    unichars -aBbs '\p{Age=6}'
571    unichars -aBbs '\p{Age=6}' '\P{Miscellaneous_Symbols_And_Pictographs}' > /tmp/u6
572    unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' > /tmp/na
573    unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 - > /tmp/ua
574    unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua
575    unichars -ac 'checkFCC' | less -r
576    unichars -ac 'checkFCC(NFD)'
577    unichars -ac 'checkFCC NFD' | less -r
578    unichars -ac 'checkFCC(NFD)' | less -r
579    unichars -ac 'checkFCD' | less -r
580    unichars -ac 'checkNFD'
581    unichars -ac '! checkNFD' | less
582    unichars -ac 'checkNFD' | less
583    unichars -ac '! checkNFD' | less -r
584    unichars -ac 'Comp_Ex' | less -r
585    unichars -ac 'Exclusion' | less -r
586    unichars -ac 'isExclusion' | less -r
587    unichars -ac 'isSingleton'
588    unichars -ac 'isSingleton()'
589    unichars -ac 'isSingleton()' | less -r
590    unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' > /tmp/n1
591    unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --level=1 --upper-before-lower --preprocess='s/..\K.*//' > /tmp/u2
592    unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | wc -l
593    unichars -ac 'NFC_NO' | less -r
594    unichars -ac 'NFD_NO' | less -r
595    unichars -ac 'NFD =~ /\pM/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less
596    unichars -ac 'NFD =~ /\pM/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less -r
597    unichars -ac 'NFD =~ /\pM/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less -r
598    unichars -ac 'NFD =~ /^\PM\pM*\z/ && NFD !~ /^(?:\p{Grapheme_Base}\p{Grapheme_Extend}*|\p{Grapheme_Extend})\z/' | less -r
599    unichars -ac 'NFD =~ /^\PM\pM*\z/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*\z/' | less -r
600    unichars -ac 'NFD =~ /^\X$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less
601    unichars -ac 'NFD =~ /^\X$/ && NFD !~ /^\PM\pM*\z/' | less -r
602    unichars -ac 'NFD =~ /^\X$/ && NFD =~ /^\PM\pM*\z/' | less -r
603    unichars -ac 'NFKC_NO' | less -r
604    unichars -ac 'NonStDecomp' | less -r
605    unichars -ac 'not checkFCC' | less -r
606    unichars -ac 'not checkFCD' | less -r
607    unichars -ac 'not checkNFD' | less
608    unichars -ac 'ord>0xffff && /\p{Latin}/'
609    unichars -ac '\p{cased}' '\PL' | less
610    unichars -ac '\p{cased}' '\P{upper}' | less
611    unichars -ac '\p{cased}' '\P{upper}' '\P{Lower}' | less
612    unichars -ac '\p{Greek}' | less
613    unichars -ac '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s
614    unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc
615    unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --normalization=NFKD > /tmp/uk
616    unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc
617    unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADP]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc
618    unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc
619    unichars -ac '/\pL/ && '[\p{Latin}\p{Common}]' && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc
620    unichars -ac '/\pL/ && [\p{Latin}\p{Common}] && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc
621    unichars -ac '/[\pM\pL]/ && NAME =~ /\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua
622    unichars -ac '/[\pM\pL]/ && NAME =~ /\b(MATHEMATICAL|COMBINING|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua
623    unichars -ac '\p{Upper}' 'NAME !~ /CAPITAL/'
624    unichars -acsbBCnf '\p{Cased}' '[^\p{Ll}\p{Lu}]'
625    unichars -acsbBCnf '/\p{CWCF}/ != /p{CWCM}/'
626    unichars -acsbBCnf '/\p{CWCF}/ != \p{CWCM}'
627    unichars -a -dgfs '\p{Cased}' '\PL'
628    unichars -a -gfs '\p{Cased}' '\PL'
629    unichars -a -gfs '\p{Cased}' '[^\p{Upper}\p{Title}]'
630    unichars -ags 'length NFKD > 5'
631    unichars -a -gs 'length(uc) > 1'
632    unichars -a -gs 'length(ucfirst) > 1' | wc -l
633    unichars -agsn NUM
634    unichars -ags '\p{lowercase}' '\P{Ll}'
635    unichars -ags '\p{lowercase}' '\P{Ll}' | wc -l
636    unichars -ags '\p{uppercase}' '\P{Lu}'
637    unichars -ags '\p{uppercase}' '\P{Lu}' | wc -l
638    unichars -a 'NAME =~ /BALL/'
639    unichars -a 'NAME =~ /EARTH GLOBE/'
640    unichars -anc 'NUM && (10*NUM) !~ /0/'
641    unichars -anc 'UCA =~  UCA("d")'
642    unichars -a 'NFKD =~ /\[/'
643    unichars -a -ngfs 'ord > 0xFFFF' '\p{Cased}'
644    unichars -a -ngfs 'ord > 0xFFFF' '\p{Cased}' '\PL'
645    unichars -a -ngfs '\p{Cased}' '\PL'
646    unichars -a 'ord > 0xffff' 'NAME =~ /FACE/'
647    unichars -a '\p{Age:6.0}' '\P{Numeric_Value=NaN}'
648    unichars -a '\P{Alnum}' '\w' | wc -l
649    unichars -a '\P{Bidi_Class=NSM}' '\p{Mn}'
650    unichars -a '\P{Bidi_Class=NSM}' '\p{Mn}' | wc -l
651    unichars -a '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' | wc -l
652    unichars -a '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' | wc -l
653    unichars -a '\p{Cased}' '\p{Lm}' | wc -l
654    unichars -a '\p{Cased}' '\PL' | wc -l
655    unichars -a '\p{InMiscellaneousSymbolsAnd_Pictographs}'
656    unichars -a '\p{InMiscellaneousSymbolsAnd_Pictographs}' > /tmp/emoji
657    unichars -a '\p{IsThai}' '\P{InThai}'
658    unichars -a '\P{IsThai}' '\p{InThai}'
659    unichars -a '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d
660    unichars -a '\p{Latin}' '\w' | wc -l
661    unichars -a '\p{Lower}' '\P{CWU}' | wc -l
662    unichars -a '\PL' '\p{Alphabetic}' | wc -l
663    unichars -a '\p{Nchar}'
664    unichars -a '\pN' '\W' | wc -l
665    unichars -a '\p{Other_Alphabetic}' '\PM' | less
666    unichars -a '\p{Other_Alphabetic}' '\PM' | M
667    unichars -a '[\p{Pf}\p{Pi}]'
668    unichars -a '[\p{Pi}]'
669    unichars -a '\p{Po}'
670    unichars -a '\p{Title}' '[^\p{CWL}\p{CWU}]' | wc -l
671    unichars -a '\p{Upper}' '\P{CWL}' | wc -l
672    unichars -a 'UCA1 eq UCA1("a")'
673    unichars -a 'UCA1 eq UCA1("a")' | cat -n
674    unichars -a 'UCA1 eq UCA1("a")' | less
675    unichars -a 'UCA1 eq UCA1("d")' | cat -n
676    unichars -a 'UCA1 eq UCA1("e")' | cat -n
677    unichars -a 'UCA1 eq UCA1("g")' | cat -n
678    unichars -a 'UCA1 eq UCA1("m")' | cat -n
679    unichars -a 'UCA1 eq UCA1("p")' | cat -n
680    unichars -a 'UCA eq UCA("d")'
681    unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i'
682    unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' | ucsort
683    unichars -a 'UCA eq UCA("d")' > /tmp/d
684    unichars -a '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r
685    unichars -a '\w' '[^_\p{Alphabetic}\p{Nd}]' | wc -l
686    unichars -a '\w' '\PM' 'ord > 0xffff' '\PN' | less
687    unichars -a '\w' '\PM' '\PL'
688    unichars -a '\w' '\PM' '\PL' '\PN' | less
689    unichars -Bbs '\p{Age=6}'
690    unichars -Bbs '\p{Age=6}'o
691    unichars -BCgsa '[\p{CCC=224}\p{CCC=226}'
692    unichars -BCgsa '[\p{CCC=224}\p{CCC=226}]'
693    unichars -BCgsa '[\p{CCC=Left}\p{CCC=Right}]'
694    unichars -BCgsa '\p{Mn}'
695    unichars --bmp --smp 'UCA1 eq UCA1("d")' | cat -n >> /tmp/lets
696    unichars --bmp --smp 'UCA1 eq UCA1("e")' | cat -n
697    unichars --bmp --smp 'UCA1 eq UCA1("e")' | cat -n >> /tmp/lets
698    unichars --bmp --smp 'UCA1(NFKD) eq UCA1("d")' | cat -n >> /tmp/lets2
699    unichars --bmp --smp 'UCA1(NFKD) eq UCA1("e")' | cat -n >> /tmp/lets2
700    unichars -bs .
701    unichars -bs 1
702    unichars '\bSCRIPT\b' '[CEFHILMRego]'
703    unichars -bs '\p{Age=6}'
704    unichars -Bs '\p{Bidiclass:M}'
705    unichars -Bs '\p{BidiM}'
706    unichars -Bs '\p{BI:M}'
707    unichars -B '\w'
708    unichars -B /\w'
709    unichars -c
710    unichars -c '\D' NUM
711    unichars -Cgas '\pM'
712    unichars -Cgas '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k1,1 | less -r
713    unichars -Cgas '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|SOLIDUS|STROKE|LINE/' | sort -t= -k4,4n -k2,2
714    unichars -Cgas '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2
715    unichars -Cgsa '\p{Mc}'
716    unichars -Cgsa '\p{Mn}'
717    unichars -Cgs '\p{Me}' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less
718    unichars -Cgs '\pM' 'NAME =~ /above/' | sort -t= -k4,4n -k2,2
719    unichars -Cgs '\pM' 'NAME =~ /ABOVE/' | sort -t= -k4,4n -k2,2
720    unichars -Cgs '\pM' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2
721    unichars -Cgs '\pM' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less
722    unichars -Cgs '\pM' 'NAME =~ /SLASH/' | sort -t= -k4,4n -k2,2
723    unichars -Cgs '\pM' 'NAME =~ /TILDE/' | sort -t= -k4,4n -k2,2
724    unichars -Cgs '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k2,2 | less -r
725    unichars -Cgs '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2
726    unichars -Cgs '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less
727    unichars -Cgs '\pM' | sort -t= -k4,4n -k2,2 | grep -i TILDE
728    unichars -Cgs '\pM' | sort -t= -k4,4n -k2,2 | less -r
729    unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/'
730    unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | less
731    unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | wc -l
732    unichars -c 'NAME =~ /LATIN LETTER SMALL CAPITAL/' | less -r
733    unichars -c 'NAME =~ /ORD/'
734    unichars -c 'NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/'
735    unichars -c 'NFD =~ /^\PM\pM*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less
736    unichars -c 'NFD =~ /^\X*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less
737    unichars -c 'NFD =~ /^\X+$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less
738    unichars -c 'NFD =~ /^\X$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less
739    unichars -c 'NFD =~ /^\X$/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less
740    unichars -c NUM
741    unichars -c 'NUM && (10*NUM) !~ /0/'
742    unichars -c 'NUM && 10*NUM !~ /0/'
743    unichars -c 'ord>0xffff && /\p{Latin}/'
744    unichars -c 'ord == 640'
745    unichars -c '\p{Alphabetic}'
746    unichars -c '\p{Alphabetic}' | head -1000 | tail
747    unichars -c '\p{Alphabetic}' | head -3000 | tail
748    unichars -c '\p{Alphabetic}' | less -r
749    unichars -c '\p{Alphabetic}' '\pM'
750    unichars -c '\p{Alphabetic}' '\pM' | less
751    unichars -c '\p{Alphabetic}' '\pM' | less -r
752    unichars -c '\p{cased}' '[^\p{CWU}\p{CWL}\p{CWT}]' | less -r
753    unichars -c '\p{cased}' '\PL'
754    unichars -c '\p{cased}' '\PL' | less
755    unichars -c '\p{Dash}' '\P{Pd}'
756    unichars -c '\p{Greek}'
757    unichars -c '\p{Greek}' | less
758    unichars -c '\p{Greek}' '\p{Lower}' 'ord <= ord("\N{greek:alpha}")' 'ord >= ord "\N{omega}"'
759    unichars -c '\p{Greek}' '\p{Lower}' 'ord() < ord("\N{greek:alpha}") || ord > ord "\N{omega}"'
760    unichars -c '\p{Greek}' '\p{Lower}' 'ord() <= ord("\N{greek:alpha}") || ord >= ord "\N{omega}"'
761    unichars -c '\p{Greek}' '\p{Lower}' 'ord() <= ord("\N{greek:alpha}")' 'ord >= ord "\N{omega}"'
762    unichars -c '\p{IDC}' '\W'
763    unichars -c '\p{IDC}' '\W' | cat -n
764    unichars -c '\p{IDC}' '\W' | wc -l
765    unichars -c '\p{InEnclosed_Alphanumerics}'
766    unichars -c '\p{InEnclosed_Alphanumerics}' '\p{lower}'
767    unichars -c '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/uu
768    unichars -c '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s
769    unichars -c '\p{lower}' '\P{CWU}'
770    unichars -c '\p{lower}' '\P{CWU}' | less
771    unichars -c '\p{lower}' '\p{Lm}' | less
772    unichars -c '\p{lower}' '\p{Lm}' | less -r
773    unichars -c '\p{lower}' '\p{Lm}' | | perl -pe 's/.//' | ucsort --reverse-fields | less
774    unichars -c '\p{lower}' '\p{Lm}' | perl -pe 's/.//' | ucsort --reverse-fields | less
775    unichars -c '\p{lower}' '\p{Lm}' | perl -pe 's/.//' | ucsort --reverse-fields | less -r
776    unichars -c '\p{Mc}'
777    unichars -c '\pM' '\P{Diacritic}'
778    unichars -c '\PM' '\p{Diacritic}'
779    unichars -c '\p{No}'
780    unichars -c '\p{No}' | head
781    unichars -c '\p{No}' | less
782    unichars -c '\p{No}' '\w'
783    unichars -c '\p{No}' '\W'
784    unichars -c '\p{No}' '\w' | head
785    unichars -c '\p{No}' '\W' | head
786    unichars -c '\p{No}' '\W' | less
787    unichars -c '[\p{Pi}\p{Ps}]' 'NAME =~ /VERTICAL/'
788    unichars -c '\pP' '\P{QMark}' 'NAME =~ /QUOT/'
789    unichars -c '\p{Upper}' 'NAME !~ /CAPITAL/'
790    unichars -cs /k/i
791    unichars -cs 'NFD \!~ /d/i' 'NFKD \!~ /d/i' 'UCA eq UCA("d")'
792    unichars -cs 'NFD \!~ /d/i' 'NFKD =~ /d/i' 'UCA eq UCA("d")'
793    unichars -cs 'NFD \!~ /d/i' 'NFKD =~ /d/' 'UCA eq UCA("d")'
794    unichars -cs 'NFD =~ /d/i' 'NFKD =~ /d/' 'UCA eq UCA("d")'
795    unichars -cs 'NFD \!~ /o/i' 'NFKD \!~ /o/i' 'UCA eq UCA("o")'
796    unichars -cs 'NFKD !~ /a/i' 'UCA eq UCA("a")'
797    unichars -cs 'NFKD \!~ /a/i' 'UCA eq UCA("a")'
798    unichars -cs 'NFKD \!~ /a/i' 'UCA =~ UCA("a")'
799    unichars -cs 'NFKD \!~ /a/i' 'UCA =~ UCA("ae")'
800    unichars -cs 'NFKD \!~ /b/i' 'UCA eq UCA("b")'
801    unichars -cs 'NFKD \!~ /c/i' 'UCA eq UCA("c")'
802    unichars -cs 'NFKD \!~ /d/i' 'UCA eq UCA("d")'
803    unichars -cs 'NFKD \!~ /e/i' 'UCA eq UCA("e")'
804    unichars -cs 'NFKD \!~ /f/i' 'UCA eq UCA("f")'
805    unichars -cs 'NFKD \!~ /f/i' 'UCA =~ UCA("f")'
806    unichars -cs 'NFKD \!~ /g/i' 'UCA eq UCA("g")'
807    unichars -cs 'NFKD \!~ /h/i' 'UCA eq UCA("h")'
808    unichars -cs 'NFKD \!~ /o/i' 'UCA eq UCA("o")'
809    unichars -cs 'NFKD \!~ /o/i' 'UCA =~ UCA("oe")'
810    unichars -cs '\P{ASCII}' '(lc() . uc) =~ /\p{ASCII}/'
811    unichars -cs '\P{ASCII}' 'NFD \!~ /\p{ASCII}/' 'NFKD =~ /\p{ASCII}'
812    unichars -cs '\P{ASCII}' 'NFD \!~ /\p{ASCII}/' 'NFKD =~ /\p{ASCII}/'
813    unichars -cs /s/i
814    unichars -cs 'UCA eq UCA("o")'
815    unichars -cs 'UCA =~ UCA ( "a" ) '
816    unichars -cs ' 'UCA =~ UCA ( "ae" ) '
817    unichars -cs 'UCA =~ UCA ( "ae" ) '
818    unichars -c '\w' '\W'
819    unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W'
820    unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' | head
821    unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' | less
822    unichars --debug '\p{No}' '\W' | head
823    unichars --debug '\p{No}' '\W' | less
824    unichars --debug '\p{No}' '\W' 'ord < 0xFF || die' | head
825    unichars --debug '\p{No}' '\W' 'ord > 0xFF && die' | head
826    unichars --debug '\p{No}' '\W' 'ord() < 0xFF || die' | head
827    unichars 'defined(NUM) && () ~~ [1..10]' | less -r
828    unichars 'defined(NUM) && [1..10] ~~ NUM' | less -r
829    unichars 'defined(NUM) && [1..1] ~~ NUM' | less -r
830    unichars 'defined(NUM) && ! (NUM() ~~ [0..10])' | less -r
831    unichars 'defined(NUM) && ! NUM() ~~ [0..10]' | less -r
832    unichars 'defined(NUM) && NUM <= 10'
833    unichars 'defined(NUM) && NUM <= 10' | less
834    unichars 'defined(NUM) && NUM <= 10' | less -r
835    unichars 'defined(NUM) && ! NUM() ~~ [1..10]' | less -r
836    unichars 'defined(NUM) && NUM() ~~ [1..10]' | less -r
837    unichars '\d' '\p{common}'
838    unichars '\d' '\p{Latin}
839    unichars '\d' '\p{Latin}'
840    unichars -fgas 'CF =~  "."'
841    unichars -fgas 'CF eq "C"'
842    unichars -fgas 'CF eq "F"'
843    unichars -fgas 'length(uc . ucfirst . lc) != 3'
844    unichars -fgas 'length(uc . ucfirst . lc) != length NFKD * 3'
845    unichars -fgas 'length(uc . ucfirst . lc) != length(NFKD) * 3'
846    unichars -fgas 'not /\U\Q$_/i'
847    unichars -fgas '/\U$_/i'
848    unichars -fgas '/\U\Q$_/i'
849    unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper}' | ucsort | cat -n | less -r
850    unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper})' | ucsort | cat -n | less -r
851    unichars -fgns '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r
852    unichars -fgns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r
853    unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}'
854    unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' | ucsort | cat -n | less -r
855    unichars -gacs '\p{Cased}' '\P{CWCM}' | cat -n | less -r
856    unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '^[\p{Ll}\p{Lu}]'
857    unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}]'
858    unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]'
859    unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l
860    unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l
861    unichars -ga '\P{ASCII}' '\p{Common}' '\pM'
862    unichars -ga '\P{ASCII}' '\p{Common}' '[\pP\pS]'
863    unichars -ga '\P{ASCII}' '\p{Inherited}' '\PL'
864    unichars -ga '\P{ASCII}' '\p{Inherited}' '\pM'
865    unichars -ga '\P{ASCII}' '[\pP\pS]'
866    unichars -ga '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/'
867    unichars -ga '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l
868    unichars -gas 'length(uc) > 1'
869    unichars -gas 'NFKD eq "."'
870    unichars -gasn 'not /\d/' 'NFKD =~ /\d/'
871    unichars -gasn 'not /\d/' 'NFKD =~ /\d/' | wc -l
872    unichars -gasn 'not /\pN/' 'NFKD =~ /^(?=\D*$)\pN/'
873    unichars -gasn 'not /\pN/' 'NFKD =~ /\pN/'
874    unichars -gasn 'not /\pN/' 'NFKD =~ /\pN/' | wc -l
875    unichars -gasn NUM
876    unichars -gasn 'NUM && NUM < 0'
877    unichars -gasn 'NUM || (/\pN/ && /\p{Enclosed_Alphanumerics}/)'
878    unichars -gasn 'NUM || /\p{pokey(tchrist)% ls -d1F uni*
879    unichars -gasn NUM > /tmp/num
880    unichars -gasn NUM | wc -l
881    unichars -gasn '\pN' 'not NUM'
882    unichars -gasn '\PN' 'NUM'
883    unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r
884    unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort --upper-before-lower | less -r
885    unichars -gas '\p{di}'
886    unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BK}]'
887    unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BK}]' '\V'
888    unichars -gas '[\P{LB=CR}\P{LB=LF}\P{LB=NL}\P{LB=BK}]' '\v'
889    unichars -gas '[\P{LB=CR}\P{LB=LF}\P{LB=NL}\P{LB=BK}]' '\V'
890    unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BR}'
891    unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BR}]'
892    unichars -gas '\p{LB=LF}'
893    unichars -gas '\p{LB=NL}'
894    unichars -gas '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r
895    unichars -gas '\pL' 'NAME =~ /\bSCRIPT/'
896    unichars -gas '\PL' 'NAME =~ /\bSCRIPT/'
897    unichars -gas '\p{Lower}' '\P{Ll}'
898    unichars -gas '\p{Lower}' '\P{Ll}' | ucsort | less
899    unichars -gas '\PL' 'uc =~ /\p{Upper}/'
900    unichars -gas '\pM'
901    unichars -gas '\p{Me}'
902    unichars -gas '\p{Other_Lowercase}'
903    unichars -gas '\p{Other_Lowercase}' | wc -l
904    unichars -gas '\p{SB=AT}'
905    unichars -gas '\p{SB=ST}'
906    unichars -gas '\p{sc=greek}' '\P{blk=greek}'
907    unichars -gas '\P{Upper}' '\PL' 'uc =~ /\p{Upper}/'
908    unichars -gas '\R'
909    unichars -gas 'UCA eq UCA("d")'
910    unichars -gas 'UCA eq UCA("d")' 'NFKD !~ /d/i'
911    unichars -gas 'UCA eq UCA("o")' 'NFKD !~ /o/i'
912    unichars -gbas '\p{sc=greek}' '\P{blk=greek}'
913    unichars -gbas '\P{sc=greek}' '\p{blk=greek}'
914    unichars -gcas '\pM'
915    unichars -gCas '\pM'
916    unichars -gc '\p{Control}'
917    unichars -gcs '\p{Cased}' '\P{CWCF}' | cat -n | less -r
918    unichars -gcs '\p{Cased}' '\P{CWCM}' | cat -n | less -r
919    unichars -gcs '\p{Cased}' '[^\p{CWU}\p{CWD' | cat -n | less -r
920    unichars -gcs '\p{Cased}' '\PL' | cat -n | less -r
921    unichars -gCs '\pM' '\P{CCC=0}' | sort -k5.3,5n | less -r
922    unichars -gCs '\pM' '\P{CCC=0}' | sort -k5.4,5n | less -r
923    unichars -gCs '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k1,1 | less -r
924    unichars -gCs '\pM' | sort -k5.4,5n | less -r
925    unichars -gCs '\pM' | sort -k5.4n | less -r
926    unichars -gCs '\pM' | sort -k5.5n | less
927    unichars -gCs '\pM' | sort -k5.5n | less -r
928    unichars -gcs '\p{Titlecase}'
929    unichars -gcs '\p{Titlecase}' | wc -l
930    unichars -gfns '/\p{Lower}/ && /\p{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r
931    unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r
932    unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort --upper | less -r
933    unichars -gfs '\p{Cased}'
934    unichars -gfs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]'
935    unichars -gfs '\p{Cased}' '[^\p{Upper}\p{Lower}]'
936    unichars -gfs '\p{Cased}' '[^\p{Upper}\p{Title}]'
937    unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r
938    unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | less -r
939    unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l
940    unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l
941    unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFKD !~ /\p{ASCII}/' | wc -l
942    unichars -g '\P{ASCII}' '\p{Common}' '[\pP\pS]'
943    unichars -g '\p{Cased}' '\P{Alphabetic}'
944    unichars -g '\p{Cased}' '\PL' | cat -n | less -r
945    unichars -g '\p{InHalfwidthAndFullwidthForms}' '\p{bidim}'
946    unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{alpha}+$
947    unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{alpha}+\z/'
948    unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{perlword}+\z/'
949    unichars -gs '/\A\p{alpha}+\z/ && NFD !~ /\A\p{perlword}+\z/'
950    unichars -gs '/\A\p{alpha}+\z/ && NFKC !~ /\A\p{perlword}+\z/'
951    unichars -gs '/\A\p{alpha}+\z/ && NFKD =~ /\W/'
952    unichars -gs '/\A\p{alpha}+\z/ && ! /\w/'
953    unichars -gsCB 'ord ~~ (0x345,0x37A)'
954    unichars -gsCB 'ord ~~ [0x345,0x37A]'
955    unichars -gsCB 'ord == 0x345 || ord == 0x37A'
956    unichars -gs '\d'
957    unichars -gsfB 'ord == 0x345 || ord == 0x37A'
958    unichars -gsf '/\p{Upper}/ && /\P{CWL}/'
959    unichars -gs 'length(lc) > 1' | wc -l
960    unichars -gs 'length NFKD == 2'
961    unichars -gs 'length NFKD > 4'
962    unichars -gs 'length NFKD > 5'
963    unichars -gs 'length NFKD > 6'
964    unichars -gs 'length NFKD > 7'
965    unichars -gs 'length NFL
966    unichars -gs 'length(uc) > 1'
967    unichars -gs 'length(uc) > 1' 'length(ucfirst) == 1'
968    unichars -gs 'length(uc) > 1' | wc -l
969    unichars -gs 'length(ucfirst) > 1'
970    unichars -gs --locale=de_phonebook ''UCA1 eq UCA1 ( "ae" ) '
971    unichars -gs --locale=de_phonebook 'UCA1 eq UCA1 ( "ae" ) '
972    unichars -gs '    NFKD !~ /d/i && UCA1 eq UCA1("d")'
973    unichars -gs 'NFKD !~ /d/i' 'UCA1 eq UCA1("d")'
974    unichars -gs 'NFKD !~ /f/i && UCA1 eq UCA1("f")'
975    unichars -gs 'NFKD !~ /h/i && UCA1 eq UCA1("h")'
976    unichars -gs 'NFKD !~ /,/i' 'UCA1 eq UCA1(",")'
977    unichars -gs 'NFKD !~ /;/i' 'UCA1 eq UCA1(";")'
978    unichars -gs 'NFKD !~ /\?/i' 'UCA1 eq UCA1("?")'
979    unichars -gs 'NFKD !~ /\./i' 'UCA1 eq UCA1(".")'
980    unichars -gs 'NFKD !~ /o/i && UCA1 eq UCA1("o")'
981    unichars -gs '    NFKD(string) !~ /d/i && UCA1 eq UCA1("d")'
982    unichars -gs 'NFKD(string) !~ /d/i' 'UCA1 eq UCA1("d")'
983    unichars -gsn NUM
984    unichars -gsn 'NUM && NUM < 0'
985    unichars -gs '\P{ASCII}' '\p{Common}' '\pP'
986    unichars -gs '\p{Bidi_Class=NSM}' '\P{Mn}'
987    unichars -gs '\P{Bidi_Class=NSM}' '\p{Mn}'
988    unichars -gs '\p{bidim}'
989    unichars -gs '[\p{bidim}\p{Ps}]'
990    unichars -gs '[\p{bidim}\p{Ps}\p{Pe}]'
991    unichars -gs '\p{Cased}' 'Comp_Ex()'
992    unichars -gs '\p{Cased}' 'Exclusion()'
993    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]'
994    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL} NFKD =~ /\W/'
995    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less
996    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r
997    unichars -gs '\p{Cased}' 'Singleton()'
998    unichars -gs '\p{Inherited}'
999    unichars -gs '/(?=\P{Ll})\p{Lower}|(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r
1000    unichars -gs '/(?=\P{Ll})\p{Lower}|/(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r
1001    unichars -gs '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r
1002    unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r
1003    unichars -gs '\p{Lower}'
1004    unichars -gs '\p{lowercase}' '\P{Ll}'
1005    unichars -gs '\p{Lower}' '\P{CWCM}'
1006    unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less
1007    unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less -r
1008    unichars -gs '\p{Lower}' '\p{CWU}' | wc -l
1009    unichars -gs '\p{Lower}' '\P{Ll}' | ucsort | less -r
1010    unichars -gs '\PL' '\p{Lower}' '\p{CWCF}'
1011    unichars -gs '\PL' '\p{Lower}' '\p{CWCM}'
1012    unichars -gs '\PL' '\p{Lower}' '\P{CWCM}'
1013    unichars -gs '\PL' '\p{Lower}' '\p{CWU}'
1014    unichars -gs '\pL' '\p{Lower}' '\p{CWU}' | wc -l
1015    unichars -gs '\PL' '\p{Lower}' '\p{CWU}' | wc -l
1016    unichars -gs '\pL' '\p{Lower}' '\P{Ll}'
1017    unichars -gs '\pL' '\p{Lower}' '\P{Ll}' '\p{CWU}' | wc -l
1018    unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower
1019    unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r
1020    unichars -gs '\pS' 'NFKD !~ /\pS/'
1021    unichars -gs '[\pS\pP]' 'NFKD !~ /[\pS\pP]/'
1022    unichars -gs '\p{Symbol}'
1023    unichars -gs '\p{uppercase}' '\P{Lu}' | wc -l
1024    unichars -gs '/\p{Upper}/ && /\P{CWL}/'
1025    unichars -gs '/\p{Upper}/ && /\P{CWL}/' | ucsort | less
1026    unichars -gs '/\p{Upper}/ && /\P{CWT}/' | ucsort | less
1027    unichars -gsS '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]'
1028    unichars -gs 'UCA1 eq UCA1(";")'
1029    unichars -gs 'UCA eq UCA("&")'
1030    unichars -gs 'UCA eq UCA("d")'
1031    unichars -gua '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l
1032    unichars -gua '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l
1033    unichars --help
1034    unichars '/ij/i'
1035    unichars 'length(lc) > 1
1036    unichars 'length(lc) > 1'
1037    unichars 'length(lcfirst) != length'
1038    unichars 'length(lcfirst) != length(uc)'
1039    unichars 'length(lc) < length(uc)'
1040    unichars 'length(NFD) == 1 && length(NFC) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less -r
1041    unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less -r
1042    unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | ucsort
1043    unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | wc -l
1044    unichars 'length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' > /tmp/ndc &
1045    unichars 'length(uc) > 1'
1046    unichars 'length(ucfirst) > 1'
1047    unichars --locale=de__phonebook 'NFD =~ /a/i/'
1048    unichars --locale=de__phonebook 'NFD() =~ /a/i'
1049    unichars --locale=de__phonebook 'NFD() =~ /a/i' | less -r
1050    unichars --locale=de__phonebook 'UCA eq UCA("ae")' 'NFKD !~ /d/i' | ucsort
1051    unichars --locale=de__phonebook 'UCA eq UCA("ae")' 'NFKD !~ /d/i' | ucsort --locale=de__phonebook
1052    unichars --locale=de__phonebook 'UCA(NFKD) =~ UCA("a WITH DIAERESIS")'
1053    unichars --locale=de__phonebook 'UCA() =~ UCA("a")'
1054    unichars --locale=de__phonebook 'UCA() =~ UCA("a")' | less -r
1055    unichars --locale=de__phonebook 'UCA =~ UCA("a WITH DIAERESIS")'
1056    unichars --locale=en 'UCA eq UCA("ae")'
1057    unichars --locale=en "UCA eq UCA("ae")'
1058    unichars 'NAME =~ /BALL/'
1059    unichars 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1'
1060    unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/'
1061    unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l
1062    unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH\b.*\b[ABCD]\b/'
1063    unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH\b.*\b[ABCD]\b/' | less -r
1064    unichars 'NAME =~ /PRIME/'
1065    unichars -nc 'NUM && (10*NUM) !~ /0/'
1066    unichars -nc 'UCA eq UCA("d")'
1067    unichars 'NFD =~ /ij/i'
1068    unichars 'NFD ne NFKD'
1069    unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/'
1070    unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less
1071    unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' > /tmp/ndc &
1072    unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | wc -l
1073    unichars 'NFD !~ /\pM/ && NAME =~ /LATIN.*LETTER WITH/'
1074    unichars 'NFD =~ /^\PM\pM*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/'
1075    unichars 'NFKD =~ /\[/'
1076    unichars 'NFKD eq ","'
1077    unichars 'NFKD eq ":"'
1078    unichars 'NFKD eq ".."'
1079    unichars 'NFKD eq "*"'
1080    unichars 'NFKD eq 'comma'
1081    unichars 'NFKD eq "\N{PRIME}"'
1082    unichars 'NFKD =~ /ij/i'
1083    unichars 'NFKD \!~ /s/i and UCA =~ UCA "s"'
1084    unichars 'NFKD \!~ /s/i || UCA =~ UCA "s"'
1085    unichars 'NFKD =~ /s/i || UCA =~ UCA "s"'
1086    unichars -ngas 'NUM && not NUM ~~ [ 0..10 ]'
1087    unichars -ngas 'NUM && not NUM ~~ [ 1..10 ]'
1088    unichars --nopager -gaBsn 'NUM && int(NUM) != NUM'
1089    unichars --nopager -gaBsn 'NUM && NUM == 100'
1090    unichars --nopager -gasn 'NUM && NUM == 100'
1091    unichars --nopager -gsn 'NUM && int(NUM) != NUM'
1092    unichars --nopager -gsn 'NUM && NUM < 0'
1093    unichars --nopager --locale=de__phonebook 'UCA eq UCA("ae")'
1094    unichars --nopager --locale=en 'UCA eq UCA("ae")'
1095    unichars --nopager --locale=is 'UCA eq UCA("ae")'
1096    unichars --nopager 'UCA eq UCA("ae")'
1097    unichars --nopager 'UCA eq UCA("ae")' | ucsort
1098    unichars --nopager 'UCA eq UCA("ae")' | ucsort --upper
1099    unichars 'not /\d/' 'NFKD =~ /\d/'
1100    unichars 'not /\w/'
1101    unichars 'not /\w/' 'not /\W/'
1102    unichars -nsag '\p{Cased}' 'NUM'
1103    unichars -nsag '\p{Lower}' '\P{CWU}'
1104    unichars -ns 'UCA eq UCA("d")'
1105    unichars NUM
1106    unichars 'ord ~~ [ 0x2622, 0x26bd]'
1107    unichars 'ord ~~ [ 0x2622, 0x2bbd]'
1108    unichars 'ord ==   0x2622 || ord == 0x26bd'
1109    unichars 'ord>0xffff' '\p{Po}'
1110    unichars 'ord < 255' '\p{pattern_syntax}'
1111    unichars 'ord() < 255' '\p{pattern_syntax}'
1112    unichars 'ord() < 255' '\p{pattern_syntax}' | less
1113    unichars 'ord() < 255' '\p{pattern_syntax}' | wc -l
1114    unichars '\p{Age:6.0}'
1115    unichars '\p{Age:6.0}' '\p{Numeric_Value=NaN}'
1116    unichars '\p{Age:6.0}' '\P{Numeric_Value=NaN}'
1117    unichars '\p{alnum}' '\P{word}'
1118    unichars '\P{alnum}' '\p{word}'
1119    unichars '\p{alnum}' '\W'
1120    unichars '\p{Alnum}' '\W'
1121    unichars '\P{Alnum}' '\w'
1122    unichars '\P{Alnum}' '\w' | less
1123    unichars '\p{alnum}' '\W' | wc -l
1124    unichars '\P{alnum}' '\w' | wc -l
1125    unichars '\P{alnum}' '\W' | wc -l
1126    unichars '\P{Alnum}' '\w' | wc -l
1127    unichars '\p{Alphabetic}' '\P{XPosixAlpha}' | less
1128    unichars '\P{Alphabetic}' '\p{XPosixAlpha}' | less
1129    unichars '\p{alpha}' '\p{CI}' | less -r
1130    unichars '\p{alpha}' '\p{CI}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r
1131    unichars '\p{alpha}' '\P{XPosixAlpha}' | less
1132    unichars '\P{alpha}' '\p{XPosixAlpha}' | less
1133    unichars '\P{alpha}' '\P{XPosixAlpha}' | less
1134    unichars '\P{ASCII}' '(lc() . uc) =~ /\p{ASCII}/'
1135    unichars '\P{ASCII}' 'lc.uc =~ /\p{ASCII}/
1136    unichars '\P{ASCII}' 'lc.uc =~ /\p{ASCII}/'
1137    unichars '\P{ASCII}' 'ord() < 255' '\p{pattern_syntax}' | wc -l
1138    unichars '\P{ASCII}' 'ord() < 255' '\W' | wc -l
1139    unichars '\P{ASCII}' '\p{Common}' '\pP'
1140    unichars '\p{BC=ON}'
1141    unichars '\P{Bidi_Class=NSM}' '\p{Mn}'
1142    unichars '\p{BidiM}' '\pS'
1143    unichars '\p{Block=CombiningDiacriticalMarks}'
1144    unichars '\p{Block=CombiningDiacriticalMarks}' '\PM'
1145    unichars '\p{Block=CombiningDiacriticalMarks}' '\p{Mn}'
1146    unichars '\p{Block=CombiningDiacriticalMarks}' '\P{Mn}'
1147    unichars '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}'
1148    unichars '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' | wc -l
1149    unichars '\p{Cased}' '\P{Changes_When_Casefolded}'
1150    unichars '\p{Cased}' '\p{Changes_When_Casemapped}'
1151    unichars '\p{Cased}' '\P{Changes_When_Casemapped}'
1152    unichars '\P{Cased}' '\p{Changes_When_Casemapped}'
1153    unichars '\p{Cased}' '\p{Changes_When_Casemapped}' | less
1154    unichars '\p{Cased}' '\P{Changes_When_Casemapped}' | less
1155    unichars '\p{Cased}' '\p{Changes_When_Casemapped}' | less -r
1156    unichars '\p{Cased}' '\P{Changes_When_Casemapped}' | less -r
1157    unichars '\P{Cased}' '\p{Changes_When_Casemapped}' | less -r
1158    unichars '\p{Cased}' '\p{CI}'
1159    unichars '\p{Cased}' '\p{CI}' | less
1160    unichars '\p{Cased}' '\p{CI}' | less -r
1161    unichars '\p{cased}' '[^\p{CWU}\p{CWL}\p{CWT}]' | less -r
1162    unichars '\p{cased}' '[\^p{CWU}\p{CWL}\p{CWT}]' | less -r
1163    unichars '\p{cased}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r
1164    unichars '\p{cased}' '\PL'
1165    unichars '\p{Cased}' '\PL'
1166    unichars '\p{cased}' '\PL' | less
1167    unichars '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]'
1168    unichars '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' | wc -l
1169    unichars '\p{Cased}' '\p{Lm}' | wc -l
1170    unichars '\p{Cased}' '\PL' | wc -l
1171    unichars '\p{Cased}' '\pM'
1172    unichars '\p{cased}' '[^\p{upper}\p{lower}]' | less
1173    unichars '\p{cased}' '[^\p{upper}\p{lower}\p{title}]' | less
1174    unichars '\p{cased}' '[^\p{upper}\p{lower}\p{title}]' | less -r
1175    unichars '\p{CC=A}'
1176    unichars '\p{CCC=A}'
1177    unichars '\p{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}'
1178    unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}'
1179    unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' | less
1180    unichars '\p{Changes_When_Casefolded}' '\P{Changes_When_Casemapped}' | less -r
1181    unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' | less -r
1182    unichars '\p{CI}' | less -r
1183    unichars '\p{CI}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r
1184    unichars '\p{Common}' '\pP'
1185    unichars '\p{Control}'
1186    unichars '\p{Control_Pictures}'
1187    unichars '\p{CWL}' 'NAME =~ /LATIN LETTER SMALL CAPITAL/'
1188    unichars '\p{CWTC}' '\PL'
1189    unichars '\p{CWT}' '\PL'
1190    unichars '\p{CWU}' 'NAME =~ /LATIN LETTER SMALL CAPITAL/'
1191    unichars '\p{Dash}'
1192    unichars '\p{di}'
1193    unichars '\p{E
1194    unichars '\p{EA=W}'
1195    unichars '\p{Greek}' '\pP'
1196    unichars '\p{Greek}' '\pS'
1197    unichars '\p{InGreek}' '\P{IsGreek}' | wc -l
1198    unichars '\P{InGreek}' '\p{IsGreek}' | wc -l
1199    unichars '\p{InHalfwidthAndFullwidthForms}' '\p{bidim}'
1200    unichars '\p{InHiragana}' '\P{Hiragana}'
1201    unichars '\p{InHiragana}' '\P{Kana}'
1202    unichars '\p{InHirakana}' '\P{Kana}'
1203    unichars '\p{InKatakana}' '\P{Kana}'
1204    unichars '\P{InKatakana}' '\p{Kana}'
1205    unichars '\P{InKatakana}' '\p{Kana}' | less
1206    unichars '\p{InLatin}' '\P{IsLatin}' | wc -l
1207    unichars '\p{InMiscellaneousSymbolsAnd_Pictographs}'
1208    unichars '\p{InThai}' '\P{IsThai}'
1209    unichars '\p{IsThai}' '\P{InThai}'
1210    unichars '\p{Latin}
1211    unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r
1212    unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r
1213    unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r
1214    unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 > /tmp/u1
1215    unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/u4
1216    unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[C-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r
1217    unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' '$$CF{full} =~ / /'
1218    unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /./' '$$CF{full} =~ / /'
1219    unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /F/'
1220    unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /F/' '$$CF{full} =~ / /'
1221    unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /[SF]/' '$$CF{full} =~ / /'
1222    unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,4}$/' 'CF =~ /F/' '$$CF{full} =~ / /'
1223    unichars '\p{Latin}' 'NAME =~ /\b\pL{2,3}$/' 'CF =~ /F/' '$$CF{full} =~ / /'
1224    unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1'
1225    unichars '\p{Latin}''NAME =~ /\bWITH\b/' 'length(NFKD) == 1'
1226    unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort | less
1227    unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d
1228    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | less
1229    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | ucsort | less
1230    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | ucsort | less -r
1231    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | wc -l
1232    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b.*\bWITH\b/' | ucsort | less -r
1233    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b.*\bWITH\b/' | ucsort --level=1 | less -r
1234    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l
1235    unichars '\p{Latin} && NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l
1236    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort | less -r
1237    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=1 | less -r
1238    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=1 > /tmp/u1
1239    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=2 > /tmp/u2
1240    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=3 > /tmp/u3
1241    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=4 > /tmp/u4
1242    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --level=1 | less -r
1243    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --level=4 > /tmp/u4
1244    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --preprocess 's/..\K\h+\S+//' --level=1 | less -r
1245    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --preprocess 's/..\K.*//' --level=1 | less -r
1246    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r
1247    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r
1248    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/uu
1249    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[CD]\b.*\bWITH\b/' | wc -l
1250    unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[CD].*\bWITH\b/' | wc -l
1251    unichars '\p{Latin}' '\w'
1252    unichars '\p{Latin}' '\w' | wc -l
1253    unichars '\pL' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/'
1254    unichars '\pL' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l
1255    unichars '\pL' 'length(NFD) == 1' 'NAME =~ /WITH/'
1256    unichars '\p{Lm}' '\p{Cased}'
1257    unichars '\p{Lm}' '\p{Changes_When_Casemapped}'
1258    unichars '\p{Lm}' '\p{upper}'
1259    unichars '\pL' 'NAME =~ /\bSCRIPT/'
1260    unichars '\PL' 'NAME =~ /\bSCRIPT/'
1261    unichars '\pL' 'NAME =~ /SCRIPT/'
1262    unichars '\pL' 'NAME =~ /WITH/' | wc -l
1263    unichars '\p{Lower}' 'length(uc) > 1'
1264    unichars '\p{Lower}' 'NAME =~ /CAPITAL/'
1265    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | less
1266    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6}'
1267    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6.0}'
1268    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age=6.0}'
1269    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6.0.0}'
1270    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s
1271    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | wc -=l
1272    unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | wc -l
1273    unichars '\p{Lower}' 'NAME !~ /SMALL CAPITAL|CAPITAL LETTER/' | wc -l
1274    unichars '\p{Lower}' 'NAME =~ /SMALL CAPITAL|CAPITAL LETTER/' | wc -l
1275    unichars '\p{lower}' '\P{CWL}' | less -r
1276    unichars '\p{Lower}' '\P{CWU}'
1277    unichars '\p{lower}' '\P{CWU}' | less -r
1278    unichars '\p{Lower}' '\P{CWU}' | wc -l
1279    unichars '[\p{Lower}\p{Upper}' '[^\p{CWU}\p{CWL}]' | less -r
1280    unichars '[\p{Lower}\p{Upper}]' '[^\p{CWU}\p{CWL}]' | less -r
1281    unichars '\pL' '\p{Alphabetic}' | wc -l
1282    unichars '\PL' '\p{Alphabetic}' | wc -l
1283    unichars '\pL' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/'
1284    unichars '\pL' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l
1285    unichars '\pL' '\p{Latin}' 'NAME =~ /WITH/' | wc -l
1286    unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/'
1287    unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %1
1288    unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %2
1289    unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %2 > /tmp/a
1290    unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l
1291    unichars '\pL' '\P{Lm}' '\p{Latin}' 'NAME =~ /WITH/' | wc -l
1292    unichars '\PL' 'uc =~ /\p{Lower}/'
1293    unichars '\PL' 'uc =~ /\p{Upper}/'
1294    unichars '\p{Math}'
1295    unichars '\p{Mc}'
1296    unichars '\p{Me}'
1297    unichars '[\p{Miscellaneous_Symbols}\p{Miscellaneous_Symbols_and_Pictographs}]'
1298    unichars '\pM' '\P{Grapheme_Extend}'
1299    unichars '\PM' '\p{Grapheme_Extend}'
1300    unichars '\PM' '\P{Grapheme_Extend}'
1301    unichars '\p{Nchar}'
1302    unichars '\p{Nl}'
1303    unichars '\p{No}' '\w'
1304    unichars '\p{No}' '\W'
1305    unichars '\p{No}' '\W' | head
1306    unichars '\p{No}' '\W' | less
1307    unichars '\pN' '\P{Nd}' | less
1308    unichars '\p{Numeric_Value=NaN}'
1309    unichars '/\P{NV=NaN}/ && ! (NUM() ~~ [0..10])' | less -r
1310    unichars '/\P{NV=NaN/ && ! (NUM() ~~ [0..10])' | less -r
1311    unichars '/\P{NV=NAN/ && ! (NUM() ~~ [0..10])' | less -r
1312    unichars '\pN' '\W'
1313    unichars '\pN' '\W' | wc -l
1314    unichars '\p{Other_Alphabetic}'
1315    unichars '\p{Other_Alphabetic}' | less
1316    unichars '\p{Other_Alphabetic}' '\PM' | less
1317    unichars '\p{Other_Lowercase}'
1318    unichars '\p{Other_Lowercase}' | less
1319    unichars '\pP'
1320    unichars '\p{Pc}'
1321    unichars '\P{Pd}' '\p{Dash}'
1322    unichars '\p{Pe}'
1323    unichars '\p{Pf}'
1324    unichars '[\p{Pf}\p{Pi}]'
1325    unichars '[\p{Pf}\p{Pi}]' '\p{BidiM}'
1326    unichars '[\p{Pf}\p{Pi}]' '\P{BidiM}'
1327    unichars '[\p{Pi}]'
1328    unichars '\p{Pi}' '\p{bidim}'
1329    unichars '[\p{Pi}]' '\P{BidiM}'
1330    unichars '[\p{Pi}\p{Ps}]' 'NAME =~ /VERTICAL/'
1331    unichars '\p{Po}'
1332    unichars '\p{Po}' '\p{bidim}'
1333    unichars '[\p{Po}\p{Pe}]' '\P{BidiM}'
1334    unichars '\pP' '\P{QMark}' 'NAME =~ /QUOT/'
1335    unichars '\p{Ps}' 'NAME =~ /VERTICAL/'
1336    unichars '\p{Ps}' '\p{bidim}'
1337    unichars '\p{Ps}' '\P{bidim}'
1338    unichars '\p{Ps}' '\P{bidim}' | less
1339    unichars '\p{Ps}' '\p{bidim}' | wc -l
1340    unichars '\p{Ps}' '\P{bidim}' | wc -l
1341    unichars '[\p{Ps}\p{Pe}]' '\P{BidiM}'
1342    unichars '\p{Qmark}'
1343    unichars '\p{QMark}'
1344    unichars '\pS'
1345    unichars '\p{Sk}'
1346    unichars '\pS' 'NAME =~ /BALL/'
1347    unichars '\p{Surrogate}'
1348    unichars '\p{Title}' 'lc !~ /\p{Lower}/'
1349    unichars '\p{Title}' '[^\p{CWL}\p{CWU}]'
1350    unichars '\p{Title}' '[^\p{CWL}\p{CWU}]' | wc -l
1351    unichars '\p{Title}' '\P{Lt}'
1352    unichars '\p{Title}' '\P{Lt|}
1353    unichars '\p{Title}' 'uc !~ /\p{Upper}/'
1354    unichars '\p{Upper}' 'NAME !~ /CAPITAL/'
1355    unichars '\p{UPPER}' 'NAME =~ /SMALL/'
1356    unichars '\p{Upper}' '\P{CWL}'
1357    unichars '\p{Upper}' '\P{CWL}' | wc -l
1358    unichars '\p{WhiteSpace}' '\PZ'
1359    unichars '\p{XPosixAlnum}' | less
1360    unichars '\p{XPosixAlnum}' '\P{XPosixAlpha}' | less
1361    unichars '\R'
1362    unichars '\R' | field %2
1363    unichars -sag Comp_Ex
1364    unichars -sag '\p{Lower}' '\P{CWU}'
1365    unichars -sag '\PL' '\p{Lower}' '\p{CWU}'
1366    unichars -sag '\PL' '\p{Lower}' '\P{CWU}'
1367    unichars -sag '\PL' '\p{Upper}' '\p{CWL}'
1368    unichars -sag '\PL' '\p{Upper}' '\P{CWL}'
1369    unichars -sag '\p{Upper}' '\P{CWL}'
1370    unichars -sag Singleton
1371    unichars /s/i
1372    unichars -ua '\p{Assigned}'
1373    unichars -ua '\p{Assigned}' | wc -l
1374    unichars 'UCA1 eq UCA1("a")'
1375    unichars 'UCA1 eq UCA1("a")' | cat -n
1376    unichars 'UCA1 eq UCA1("ae")'
1377    unichars 'UCA1 eq UCA1("d")'
1378    unichars 'UCA1 eq UCA1("d")' | cat -n
1379    unichars 'UCA1 eq UCA1("ij")'
1380    unichars 'UCA1 =~ UCA1("d")'
1381    unichars 'UCA2 eq UCA2("ij")'
1382    unichars 'UCA3 eq UCA3("ij")'
1383    unichars 'UCA eq UCA "%"'
1384    unichars 'UCA eq UCA("a")'
1385    unichars 'UCA eq UCA("ae")'
1386    unichars 'UCA eq UCA("d")'
1387    unichars 'UCA eq UCA("d")'df
1388    unichars 'UCA eq UCA "\N{PRIME}"'
1389    unichars 'UCA eq UCA("p")'
1390    unichars 'UCA eq UCA("p")' | wc -l
1391    unichars 'UCA eq UCA("s")' | wc -l
1392    unichars 'UCA(NFKD) =~ UCA("a")'
1393    unichars 'UCA(NFKD) =~ UCA("ae")'
1394    unichars 'UCA(NFKD) =~ UCA("a")' 'NFKD !~ /a/i'
1395    unichars 'UCA(NFKD) =~ UCA("d")'
1396    unichars 'UCA(NFKD) =~ UCA("d")' 'UCA ne UCA("d")'
1397    unichars 'UCA(NFKD) =~ UCA("i")' 'NFKD !~ /i/i'
1398    unichars 'UCA(NFKD) =~ UCA("m")'
1399    unichars 'UCA(NFKD) =~ UCA("o")'
1400    unichars 'UCA(NFKD) =~ UCA("oe")'
1401    unichars 'UCA(NFKD) =~ UCA("o")' 'NFKD !~ /o/i'
1402    unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))'
1403    unichars '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r
1404    unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' 'NFKD !~ /o/i'
1405    unichars 'UCA(NFKD) =~ UCA("o")."|".UCA("a")' 'NFKD !~ /o/i'
1406    unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' | ucsort | less
1407    unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' | ucsort | less -r
1408    unichars 'UCA(NFKD) =~ UCA("s")'
1409    unichars 'UCA(NFKD) =~ UCA("z")'
1410    unichars 'UCA(NFKD) =~ UCA("z")' 'NFKD !~ /z/i'
1411    unichars 'UCA(NFKD) =~ UCA("z")' 'UCA ne UCA("z")'
1412    unichars 'UCA =~ UCA("d")' 'UCA ne UCA("d")'
1413    unichars 'UCA =~ UCA("i") && UCA =~ UCA("j")'
1414    unichars 'UCA =~ UCA("p")'
1415    unichars 'UCA =~ UCA("p")' | wc -l
1416    unichars 'UCA =~ UCA("s")' 'UCA ne UCA("s")'
1417    unichars 'UCA =~ UCA("s")' 'UCA ne UCA("s")' | wc -l
1418    unichars 'UCA =~ UCA("s")' | wc -l
1419    unichars 'uc ne ucfirst'
1420    unichars -v
1421    unichars '\w' '[^_\p{Alphabetic}\p{Nd}]'
1422    unichars '\w' '[^_\p{Alphabetic}\p{Nd}]' | wc -l
1423    unichars '\w' '[^\pL\pN\pM]'
1424    unichars '\w' '[^\pL\pN\pM\p{Pc}]'
1425    unichars '\w' '[^\pL\pN\pM\p{Pc}]' | less
1426    unichars '\w' '[^\pL\p\pM]'
1427    unichars '\w' '\P{word}'
1428    unichars '\W' '\p{word}'
1429    unichars '/\w/ == /\W/'
1430    unichars '\w' '\W'
1431
1432=head2  Demo of ucsort
1433
1434    ucsort ../CRAFT-dumps/lc-not-unique | less
1435    ucsort --level=1 --upper-before-lower --preprocess="s/\s.*//" fing-2 | less
1436    ucsort --locale=ca /tmp/cat
1437    ucsort --locale=es /tmp/cat
1438    ucsort --locale=es__traditional /tmp/cat
1439    ucsort --locale=es_traditional /tmp/cat
1440    ucsort --locale=ru /tmp/cyril > /tmp/cyril.ru
1441    ucsort overlapping-obos | less
1442    ucsort --pre '/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less
1443    ucsort --pre='/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less
1444    ucsort --preprocess='/.*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less
1445    ucsort --preprocess='/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less
1446    ucsort --preprocess='s/(.*)([\[\]].*[\[\]])(.*)/$2 $1 $3/' --reverse-input stem-fail-tally | less
1447    ucsort --preprocess='s/(.*)([\[\]].*[\[\]])(.*)/$2 $1 $3/' --reverse-input stem-fail-tally > stem-fail-sort
1448    ucsort --preprocess='s/([\[\]].*[\[\]])(.*)/$2 $1/' --reverse-input stem-fail-tally | less
1449    ucsort --preprocess='s/(\[.*\])(.*)/$2 $1/' --reverse-input stem-fail-tally | less
1450    ucsort --preprocess='s/(\].*\[)(.*)/$2 $1/' --reverse-input stem-fail-tally | less
1451    ucsort --preprocess='s/(\[FAIL: .*\])(.*)/$2 $1/' --reverse-input stem-fail-tally | less
1452    ucsort --preprocess='s/.*\gt//' go-greek
1453    ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//' ../CRAFT-dumps/lc-not-unique | & less
1454    ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//' ../CRAFT-dumps/lc-not-unique | less
1455    ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less
1456    ucsort --preprocess='s/.*\t//' go-greek | less
1457    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/lc-not-unique | & less
1458    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/lc-not-unique | tac > ../CRAFT-dumps/lcnu-sort
1459    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/not-unique | tac > ../CRAFT-dumps/nu-sort
1460    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' lc-not-unique | tac > lc-nu-sort
1461    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' not-unique | tac > nu-sort
1462    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $n)/e' ../CRAFT-dumps/lc-not-unique | & less
1463    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $n = $1; s/^/sprintf("%06d", $n)/e' ../CRAFT-dumps/lc-not-unique | & less
1464    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | & less
1465    ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less
1466    ucsort --pre='s/.*] //' gene* | less
1467    ucsort --pre='s/.*=> //' gene* | less
1468    ucsort --pre 's/.*\gt//' go-greek | less
1469    ucsort --pre='s/^\S+\h+//' pmc-weirds | less
1470    ucsort --reverse-input go-uglies | perl -nle ' printf "%50s\n", $_' > flip-go-uglies
1471    ucsort --reverse-input stem-fail-tally | less
1472    ucsort --reverse-input stem-fail-tally > stem-fail-sort2
1473    ucsort --reverse-input ugly-tally > flip-ugly-tally
1474    ucsort -reverse-input ugly-tally > flip-ugly-tally
1475    ucsort --reverse-input uuglies > flip-uuglies
1476    ucsort /tmp/emoji | less
1477    ucsort /tmp/emoji | unifmt -180 | less
1478    ucsort < /tmp/u | uniq > /tmp/uu
1479    ucsort /tmp/uw > /tmp/u
1480    ucsort --upper-before-lower --preprocess="s/\h.*//" fing-2 > fa
1481    ucsort --upper-before-lower --preprocess="s/\s.*//" fing-2
1482    ucsort --upper --pre='s/^\S+\h+//' pmc-weirds | less
1483    ucsort --upper --pre='s/^\S+\h+//' pmc-weirds > pmc-wsort
1484
1485    unichars -a '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d
1486    unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' | ucsort
1487    unichars -a '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r
1488    unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper}' | ucsort | cat -n | less -r
1489    unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper})' | ucsort | cat -n | less -r
1490    unichars -fgns '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r
1491    unichars -fgns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r
1492    unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' | ucsort | cat -n | less -r
1493    unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r
1494    unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort --upper-before-lower | less -r
1495    unichars -gas '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r
1496    unichars -gas '\p{Lower}' '\P{Ll}' | ucsort | less
1497    unichars -gfns '/\p{Lower}/ && /\p{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r
1498    unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r
1499    unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort --upper | less -r
1500    unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r
1501    unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | less -r
1502    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less
1503    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r
1504    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r
1505    unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r
1506    unichars -gs '/(?=\P{Ll})\p{Lower}|(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r
1507    unichars -gs '/(?=\P{Ll})\p{Lower}|/(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r
1508    unichars -gs '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r
1509    unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r
1510    unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r
1511    unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less
1512    unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less -r
1513    unichars -gs '\p{Lower}' '\P{Ll}' | ucsort | less -r
1514    unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower
1515    unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r
1516    unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r
1517    unichars -gs '/\p{Upper}/ && /\P{CWL}/' | ucsort | less
1518    unichars -gs '/\p{Upper}/ && /\P{CWT}/' | ucsort | less
1519    unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort | less
1520    unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d
1521
1522    uninames CYRIL | ucsort | less
1523
1524    cat *-top | field %2 | tally | ucsort --reverse-input | less
1525    cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %13s\n", @F[0,1]' | less
1526    cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %15s\n", @F[0,1]' | less
1527    cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %18s\n", @F[0,1]' > hapax50
1528    cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %18s\n", @F[0,1]' | less
1529    cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %25s\n", @F[0,1]'
1530    cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %25s\n", @F[0,1]' | less
1531
1532    egrep '^GO:(0005024|0005025|0005026|0007179|0015052)' ../CRAFT-dumps/CRAFT-go | sort -u | ucsort | less
1533    egrep '^GO:(0005024|0005025|0005026|0007179|0015052)' ../CRAFT-dumps/CRAFT-go | ucsort | less
1534
1535    ls *.obo* | ucsort
1536    ls *.obo* | ucsort | perl -le 'printf "%-40s", $_'
1537    ls *.obo* | ucsort | perl -lne 'printf "%-40s\n", $_'
1538    ls *.obo* | ucsort | perl -lne 'printf "%40s\n", $_'
1539
1540    perl5.12.0 -S -CLA unichars '/s/i' | ucsort
1541    perl -CS -E 'say for split " ", "cat ca\x{308}t czt c\x{e4}t bat dat"' | ucsort --locale=sv
1542    perl fingerprint $cat | ucsort > fing-all2
1543    perl fingerprint $cat | ucsort --upper-before-lower --preprocess="s/\h.*//" > fing-all2
1544    perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep 'ABBREV' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/ab
1545    perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep 'GEN' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/ge
1546    perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/......//'
1547    perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//'
1548    perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/d
1549    perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/......//; s/\h.*//'
1550    perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='$_ = fixstring($_)' ovl | & less
1551    perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='$_ = fixstring($_)' ovl > ovl-sort
1552    perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='\&fixstring' ovl | & less
1553    perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='\&fixstring' ovl | less
1554
1555    repeat 200 randline /tmp/u6 | ucsort | uniq > /tmp/u
1556
1557    tcgrep '^0' ../../new-output/results-go-gene-stemwords-GOOD | ucsort --reverse-input | less
1558
1559=head2  Demo of unilook
1560
1561    unilook activation
1562    unilook adi
1563    unilook adieu
1564    unilook 'alis\b'
1565    unilook angina
1566    unilook arthrit
1567    unilook ascite
1568    unilook betab
1569    unilook '\boverexertion\b'
1570    unilook capitali
1571    unilook catheterization
1572    unilook defib
1573    unilook delineate
1574    unilook digitalis
1575    unilook dofetilide
1576    unilook dysauto
1577    unilook dyssyn
1578    unilook dysyn
1579    unilook edema
1580    unilook estiv
1581    unilook etouf
1582    unilook euphon
1583    unilook fentan
1584    unilook fentanyl
1585    unilook /glob
1586    unilook gw
1587    unilook gwen
1588    unilook gyneco
1589    unilook hemochr
1590    unilook hemodialysis
1591    unilook hibern
1592    unilook hippodam
1593    unilook hippo | wc -l
1594    unilook hypogl
1595    unilook idyl
1596    unilook '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL'
1597    unilook inscrut
1598    unilook ischemia
1599    unilook leucocy
1600    unilook leuc | wc -l
1601    unilook leukocy
1602    unilook leuk | wc -l
1603    unilook lighthead
1604    unilook lymp
1605    unilook lympho
1606    unilook lymphom
1607    unilook meningitis
1608    unilook '\N{oslash}'
1609    unilook oligu
1610    unilook oto
1611    unilook otot
1612    unilook overeat
1613    unilook overexert
1614    unilook overexertion
1615    unilook pacem
1616    unilook '(?\P{ASCII})\pL'
1617    unilook '(?=\P{ASCII})\pL'
1618    unilook pectoris
1619    unilook pheo
1620    unilook phonious
1621    unilook 'phonious\b'
1622    unilook '\pM'
1623    unilook -ppro
1624    unilook -ppro .
1625    unilook -Ppro .
1626    unilook -ppronoun .
1627    unilook -ppronoun wh
1628    unilook primum
1629    unilook pseudonormal
1630    unilook pulmon
1631    unilook rale
1632    unilook rheto
1633    unilook rosuv
1634    unilook secund
1635    unilook sputum
1636    unilook sum
1637    unilook tachyph
1638    unilook thiazol
1639    unilook thyrotox
1640    unilook tracheitis
1641    unilook uephon
1642    unilook uremia
1643    unilook -v .
1644    unilook -v activation
1645    unilook -v angina
1646    unilook -v ascite
1647    unilook vascul
1648    unilook -v '\boverexertion\b'
1649    unilook verna
1650    unilook vesnar
1651    unilook -v holter
1652    unilook -v '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL'
1653    unilook -V '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL'
1654    unilook -V '(?i)(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL'
1655    unilook -V '(?i)(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}])\pL'
1656    unilook -v meningitis
1657    unilook -v 'overexertion'
1658    unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}])\pL'
1659    unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}])\pL'
1660    unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}]\pL'
1661    unilook -V '(?=\P{ASCII})\pL'
1662    unilook -v pneumoconiosis
1663    unilook -vz 'comeraderie'
1664    unilook wh
1665    unilook who
1666    unilook widespread
1667    unilook -z dofetilide
1668    unilook -z eplerenone
1669    unilook -z pectoris
1670    unilook -z sulfoxide
1671    unilook -zv anterolateral
1672    unilook -zv holter
1673    unilook -zv metformin
1674