1=encoding utf8 2 3=head1 OSCON scripts 4 5Because I misplaced the first of these I wrote, I ended up writing them twice. 6 7=head2 OSCON scripts description #1/2 8 9 HOLY TRIO OF INDISPENSABLE TOOLS FOR UNDERSTANDING THE UCD & UNICODE IN GENERAL 10 unichars - show which code points match arbitrary criteria 11 uniprops - show which props a code point has (by number or name, etc) 12 uninames - intelligrep the now-excised NameList.txt (included) 13 14 REWRITES OF CRITICAL UNIX PROGRAMS: 15 uniquote - replacement for od(1) or -v option to cat(1), but for Unicode 16 tcgrep - very ancient grep(1) replacment, needs rewrite but now supports named character 17 unilook - look(1) rewrite but with grep and agrep support; requires included words.utf8 file 18 ucsort - sort(1) rewrite using the UCA, includes Unicode locales, and intelligent --pre stuff 19 unifmt - fmt(1) rewrite, using the ULA; both smarter and dumber than Damian's 20 rename - ancient rewrite of Larry's old rename(1) rewrite; might help Unicode filesyssues 21 uniwc - wc(1) rewrite for Unicode, includes \R support, graphemes, etc; needs refactoring 22 23 PROGRAMS FOR NORMALIZATION FILTERS, CHECKER 24 nfd, nfc, nfkd, nfkc - Unicode normalization filters 25 nfcheck - report which which of NF{,K}[DC} apply to any given file 26 % nfcheck leo hantest nunez tc macroman 27 leo: NFC NFD 28 hantest: NFC 29 nunez: NFC NFKC 30 tc: NFC NFKC NFD NFKD 31 32 (RE)CASING FILTER PROGRAMS: 33 lc - filter to do the Unicode toLower casemapping 34 % echo "Filter to Convert a Title's Words to the Right Case" | lc 35 filter to convert a title's words to the right case 36 tc - filter to do the Unicode toTitle casemapping (intelligently) 37 % echo "filter to convert a title's words to the right case" | tc 38 Filter To Convert A Title's Words To The Right Case 39 titulate - \u\L-converts string args to English **HEADLINE** case (NB: headline != titlecase) 40 % titulate "filter to CONVERT a title's words to the right case" 41 Filter to Convert a Title's Words to the Right Case 42 uc - filter to do the Unicode toUpper casemapping 43 % echo "filter to convert a title's words to the right case" | uc 44 FILTER TO CONVERT A TITLE'S WORDS TO THE RIGHT CASE 45 46 FONT GAME PROGRAMS: 47 leo - uʍopəpᴉsdn sƃuᴉɥʇ əʇᴉɹʍ oʇ ɹəʇlᴉɟ 48 unifont - filter for showing all Unicode "alternate font" letters 49 % echo hic sunt data unicodica | unifont 50 Double-Struck: 51 Monospace: 52 Sans-Serif: 53 Sans-Serif Italic: 54 Sans-Serif Bold: 55 Sans-Serif Bold Italic: 56 Script: ℴ 57 Italic: h 58 Bold: 59 Bold Italic: 60 Fraktur: 61 Bold Fraktur: 62 unicaps - Fɪʟᴛᴇʀ ᴛᴏ ᴄᴏɴᴠᴇʀᴛ ᴛᴏ sᴍᴀʟʟ ᴄᴀᴘs 63 unisubs, unisupers - filter to show subscripted₁₉₈₇ and ˢᵘᵖᵉʳˢᶜʳⁱᵖᵗᵉᵈ versions 64 unititle - prototype to over/underline things (real version inprogress) 65 uniwide, uninarrow - reversable filters for converting to FULLWIDTH equivs 66 67 TEST AND DEMO PROGRAMS: 68 macroman - show mapping between MacRoman and Unicode 69 byte2uni - early prototype of general-purpose version of the macroman 70 DEMO: byte2uni -a -ecp1252 71 es-sort - how to do fancy UCA sorts, using Iberian city-names 72 hantest - demo of Unihan stuff and Unicode::{LineBreak, GCString} 73 havshpx - vs lbh unir gb nfx, lbh qb abg jnag gb xabj 74 hypertest - demo support trans-Unicode code point support 75 nunez - demo accent-insensitive searches; ¡MUY BIEN COMENTADO! 76 vowel-sigs - show how to create your own properties; also, regex subroutines 77 78 MODULES 79 ForbidUnderscore.pm - "no Underscore;" forbids unlocalized $_ access 80 FixString.pm - tries to sort text items with numbers, including Roman, intelligently, 81 includes support for Unicode Romans, and for Romans written in Latin 82 script, but requires Roman.pm module for the latter. Falls back to the UCA. 83 tchrist-unicode-charclasses__alpha.java - EGAD! I talked them into making most of 84 this functionality part of JDK7. 85 86 LIBRARIES: 87 unicore/{all,html,uwords}_alias.pl - a forgotten charnames facility 88 89 FILES: 90 words.utf8 - dictionary list needed for for unilook 91 92=head2 OSCON scripts description #2/2 93 94 Modules: 95 FixString.pm - program & module to do "logical" sorting w/numbers 96 ForbidUnderscore.pm - forbid unlocalized $_ with no Underscore 97 98 Libraries: 99 unicore/html_alias.pl - allows for customer charclass names \N{egrave} etc 100 unicore/uwords_alias.pl - ditto with specials for unilook, like \N{spu} 101 unicore/all_alias.pl - both the above 102 103 Programs for probing the UCD: 104 unichars - list characters for one or more properties 105 uniprops - list regex properties of one or more characters 106 uninames - search the current Unicode NamesList 107 108 Encoding Demos 109 macroman - how the MacRoman encoding maps to Unicode 110 byte2uni - generalized `macroman` program; try `byteuni -a -cp1252` 111 112 Unix Tool Rewrites 113 (not fmt but) unifmt - like `fmt` but uses the Unicode Linebreaking Algorithm (ULA) 114 (not grep but) tcgrep - like `grep`, but groks unicode patterns and data 115 (not look but) unilook - improved `look` + `grep` + `agrep` on included `words.utf8` 116 (not mv but) rename - a better version of rename, takes a perl pattern 117 (not od but) uniquote - like `cat -v` or `od`, but better 118 (not sort but) ucsort - `sort` input records according to the Unicode Collation Algorithm (UCA) 119 (not wc but) uniwc - Unicode rewrite a `wc` (needs nonslurpy rewrite) 120 121 Casing Filters 122 lc - filter into Unicode lowercase 123 uc - filter into Unicode uppercase 124 tc - filter into Unicode titlecase+lowercase 125 titulate - like tc but used English headline rulers 126 127 Normalization Filters 128 nfc - convert to NFD 129 nfd - convert to NFD 130 nfkc - convert to NFKC 131 nfkd - convert to NFKD 132 nfcheck - report which NF forms file(s) are in 133 134 Font Games 135 unifont - display equivalent Math/fonted versions 136 leo - write like Leonardo 137 unicaps - convert lowercase to Unicode small caps 138 unisubs - show equivalent subscripts where available 139 unisupers - show equivalent supercripts where available 140 uniwide - convert regular text to full-width if possible 141 uninarrow - convert full-wdith to regular width if possible 142 unititle - prototype to use combining underlines 143 144 Demos and Test Programs 145 es-sort - demo how to use a custom UCA on Spanish cities 146 hantest - demo various Unihan bits, including the ULA 147 havshpx - you have to figure this one out yourself 148 hypertest - demo forbidden Unicode chars, like supers and hypers 149 vowel-sigs - get the CVCVVC signatures for word 150 nunez - demo how to use the UCA for accent-insensitive searching 151 152=head1 Uses of Unicode in Perl identifiers in OSCON scripts 153 154A few use Unicode not just in literals, but in identifiers, too: 155 156 % tcgrep '((^\h*sub\h+)|[\$\@%&])\p{ASCII}*\P{ASCII}' * 157 byte2uni: $display_char = "\N{SYMBOL FOR SUBSTITUTE FORM TWO}", # ␦ 158 hantest:$path = "婴儿服饰"; 159 hypertest:my @ὑπέρμεγας = ( 160 hypertest: ὑπέρμεγας => \@ὑπέρμεγας, 161 leo: my $ʇndʇno = uʍopəpᴉƨdn($input); 162 leo: say $ʇndʇno; 163 leo:sub uʍopəpᴉƨdn($) { 164 leo: tr [-¯_#&'"“”‘’!¡?¿,.] 165 mismaps:my @ɪsᴏ = map { "iso-$_" } ratsort qw{ 166 mismaps:my @μsoft = map { "cp$_"} ratsort qw{ 167 mismaps:my @鯉 = ratsort <koi8-{f,u,r}>; 168 mismaps:my @all_tests = (@μsoft, @ɪsᴏ, @apple, @鯉, @etc); 169 mismaps: dos => \@μsoft, 170 mismaps: microsoft => \@μsoft, 171 mismaps: ms => \@μsoft, 172 mismaps: windows => \@μsoft, 173 mismaps: win => \@μsoft, 174 mismaps: posix => \@ɪsᴏ, 175 mismaps: iso => \@ɪsᴏ, 176 mismaps: standard => \@ɪsᴏ, 177 mismaps: std => \@ɪsᴏ, 178 mismaps: koi => \@鯉, 179 nunez:my $INCLUÍR_NINGUNOS = 0; 180 nunez:my $SI_IMPORTAN_MARCAS_DIACRÍTICAS = 0; 181 nunez:sub sí_ó_no(_) { $_[0] ? "sí" : "no" } 182 nunez:my @ciudades_españolas = ordenar_a_la_española(<<'LA_ÚLTIMA' =~ /\S.*\S/g); 183 nunez:my $cmáx = -(2 + max map { length } @ciudades_españolas); 184 nunez:my @búsquedas = < {A,E,I,O,U}N AL >; 185 nunez:my $bmáx = -(2 + max map { length } @búsquedas); 186 nunez:for my $aldea (@ciudades_españolas) { 187 nunez: my $déjà_imprimée; # Mais oui! C’est en français celle‐ci! 188 nunez: for my $búsqueda (@búsquedas) { 189 nunez: my @resultados = $ordenador->gmatch($aldea, $búsqueda); 190 nunez: next unless @resultados || $INCLUÍR_NINGUNOS; 191 nunez: $cmáx => !$déjà_imprimée++ && encomillar($aldea), 192 nunez: $bmáx => "/$búsqueda/", 193 nunez:sub cuántos_sitios { 194 nunez:sub ordenar_a_la_española { 195 nunez: state $ordenador_a_la_española = new Unicode::Collate:: 196 nunez: return $ordenador_a_la_española->sort(@lista); 197 ucsort: ($OFS, $IFS) if /\x{FFFF}/; # déjà vu 198 uniquote: my $fh = $file; # is *so* a lexical filehandle! ☺ 199 uniquote:sub commaʼd_list { 200 201=head1 Demos of OSCON Unicode scripts 202 203There's absolutely nothing like examples, so here are five sets. 204 205=head2 Demo of uniprops 206 207 uniprops '[' 208 uniprops '[' '{' ')' 209 uniprops '[' '{' '<' 210 uniprops '[' '{' '>' 211 uniprops ']' 212 uniprops 08 213 uniprops a8 214 uniprops 00ff 215 uniprops 180B 216 uniprops 180B 303E 217 uniprops 180B 303E FE01 218 uniprops 180B 303E FE01 E0101 219 uniprops 2026 220 uniprops 202e 221 uniprops 2058 222 uniprops 2060 223 uniprops 2062 224 uniprops 20a8 225 uniprops 20e0 226 uniprops 2163 227 uniprops 2241 228 uniprops 2421 229 uniprops 2461 230 uniprops 26bd 231 uniprops 2e2c 232 uniprops 3000 233 uniprops fb01 234 uniprops feff 235 uniprops ffef 236 uniprops FFFD 237 uniprops 1011c 238 uniprops 1101c 239 uniprops 12000 240 uniprops 13000 241 uniprops 1F42A 242 uniprops 1F42A '$' 243 uniprops 1F42A '$' % @ 244 uniprops 1F4A9 245 uniprops 1F608 246 uniprops 4 247 uniprops -a 03 08 248 uniprops -a 1F42a 249 uniprops -a 2062 250 uniprops -a 20e0 251 uniprops -a 3350 252 uniprops -a 3 8 253 uniprops -a 4 254 uniprops -a FFFD 255 uniprops -a 'MATHEMATICAL BOLD FRAKTUR CAPITAL T' 256 uniprops -ga 4 257 uniprops -gl | less -r 258 uniprops HYPHEN 259 uniprops -l 260 uniprops 'LADY BEETLE' 261 uniprops -l | less -r 262 uniprops -taw75 20e0 263 uniprops -w75 -a 20e0 264 265=head2 Demo of uninames 266 267 uninames ancient 268 uninames ankh 269 uninames arrow 270 uninames ass 271 uninames AT 272 uninames atom 273 uninames AT SIGN 274 uninames '\bAA\b' 275 uninames ball 276 uninames BALL 277 uninames balls 278 uninames '\bALPHA\b' 279 uninames '\band\b' 280 uninames '\bAND\b' 281 uninames '\bAT\b' 282 uninames beetle 283 uninames '\b[IJ]\b' -WITH 284 uninames bird 285 uninames black letter 286 uninames BLACK LETTER 287 uninames BLACK-LETTER 288 uninames '\bNO\b' 289 uninames BOLD TWO 290 uninames book 291 uninames brac 292 uninames brace 293 uninames brok 294 uninames '\bSCRIPT\b' 295 uninames '\bSCRIPT\b' -MATHEMATICAL 296 uninames '\bTAU\b' 297 uninames '\bT\b' WITH 298 uninames bug 299 uninames bullet 300 uninames burro 301 uninames '\bY\b' | tcgrep -v '^\t' | ucsort | less -r 302 uninames '\bz\b' 303 uninames '\bZ\b' 304 uninames '\bz\b' | tcgrep -v '^\t' | ucsort | less -r 305 uninames '\bZ\b' | tcgrep -v '^\t' | ucsort | less -r 306 uninames camel 307 uninames CAMEL 308 uninames care 309 uninames CARE 310 uninames caution 311 uninames chi 312 uninames CHI 313 uninames CIRCL 314 uninames circled 315 uninames circle k 316 uninames circle 'one|two' 317 uninames clown 318 uninames colon 319 uninames COMB 'HOOK|TAIL|CURV' 320 uninames COMBIN 321 uninames combin ferm 322 uninames combin hacek 323 uninames combining 324 uninames COMBINING 325 uninames COMBINING DOTS 326 uninames combining enclosing 327 uninames combining enclosing prohib 328 uninames COMBIN REV SOL 329 uninames COMB LINE 330 uninames COMB 'MACRO|LINE' 331 uninames commer 332 uninames commonly abbreviated 333 uninames coptic 334 uninames cross 335 uninames crying 336 uninames CUEN | tcgrep -v '^\t' 337 uninames cun 338 uninames CUN 339 uninames CUNE 340 uninames CUNEI | tcgrep -v '^\t' 341 uninames CUNI 342 uninames CUNIE 343 uninames curren 344 uninames currenc 345 uninames d7 346 uninames dagger 347 uninames dash 348 uninames dead 349 uninames desert 350 uninames destr 351 uninames diaer 352 uninames DIAER 353 uninames diag 354 uninames divis 355 uninames does not prevent 356 uninames does not prevent | fmt 357 uninames dog 358 uninames donkey 359 uninames DOT ABOVE 360 uninames DOT CIRC 361 uninames dots 362 uninames double 363 uninames DOUBLE 364 uninames DOUBLE HY 365 uninames DOUBLE MATH 366 uninames DOUBLE QUOT 367 uninames DOUBLE STR 368 uninames DOUBLE STR CAPITAL -MATH 369 uninames DOUBLE STR CAPITAL -MATH | tail 370 uninames double struc 371 uninames DOUBLE STRUCK 372 uninames double struct 373 uninames DOUBL ITALI 374 uninames earth 375 uninames edit 376 uninames EIGHTEEN 377 uninames ellip 378 uninames em 379 uninames ENC CIR 380 uninames EQUAL 381 uninames EQUIV 382 uninames evil 383 uninames example 384 uninames exclam 385 uninames EYE 386 uninames face 387 uninames FACE 388 uninames fair 389 uninames farthing 390 uninames feather 391 uninames fem 392 uninames fermata 393 uninames ff lig 394 uninames flash 395 uninames flip 396 uninames fl lig 397 uninames four 398 uninames fractu 399 uninames FRAKT 400 uninames fraktu 401 uninames fraktur 402 uninames fullwidth 403 uninames gothic 404 uninames GOTHIC 405 uninames GREEK LETTER WITH 406 uninames GREEK LETTER WITH | tcgrep -v '^\t' | ucsort | less -r 407 uninames GREEK PHI 408 uninames GREEK SUBSCRIPT 409 uninames greek yp 410 uninames GREEK YP 411 uninames gun 412 uninames hallo 413 uninames hazard 414 uninames head 415 uninames HEAD 416 uninames heart 417 uninames HEART 418 uninames HIERO 419 uninames horse 420 uninames hurt 421 uninames hyphen 422 uninames HYPHEN 423 uninames ideo stop 424 uninames insect 425 uninames insters 426 uninames INSULAR 427 uninames INSULAR | lc 428 uninames INSULAR | lc | tcgrep -v '^\t' 429 uninames INSULAR | uc 430 uninames INSULAR | uc | tcgrep -v '^\t' 431 uninames intro 432 uninames invis 433 uninames invisible 434 uninames iota sub 435 uninames iso 436 uninames jackol 437 uninames jong 438 uninames lake 439 uninames left bracket 440 uninames left single quot 441 uninames LESS 442 uninames LESS THAN 443 uninames LIG 444 uninames liga ff 445 uninames liga gg 446 uninames ligat fi 447 uninames ligature 448 uninames ligature -arabic 449 uninames light 450 uninames magic 451 uninames mah jong 452 uninames mah jong | tcgrep -v '^\t' 453 uninames male 454 uninames MATH CAPIT FRAK 455 uninames MATH DIGIT 456 uninames MATHE 457 uninames MATHEM 458 uninames MATHEM '\bA\b' 459 uninames MATHEM CAPITA '\bA\b' 460 uninames MATHEM CAPITA '\bA\b' | grep font 461 uninames MATHEM CAPITA '\bA\b' | grep -v font 462 uninames MATH FRACTU BOLD 463 uninames MATH SCRIPT CAPITAL 464 uninames -MATH SCRIPT E 465 uninames MATH SCRIPT E 466 uninames MATH SCRIPT SMALL 467 uninames -MATH -SUB -SUPER SCRIPT 468 uninames -MATH -SUB -SUPER SCRIPT E 469 uninames 'MODIFIER|(?i:superscript)' 470 uninames MODIFIER -LETTER 471 uninames moon 472 uninames mountain 473 uninames multi 474 uninames music 475 uninames music -combin 476 uninames music -combin | tcgrep -v '^\t' 477 uninames MUSIC SHARP 478 uninames NL 479 uninames no one under 480 uninames numeral 481 uninames oasis 482 uninames one 483 uninames one way 484 uninames ordina 485 uninames pain 486 uninames pen 487 uninames people 488 uninames person 489 uninames PG 490 uninames phi 491 uninames PILE POO 492 uninames '\pL' '\p{Latin}' | less -r 493 uninames plum 494 uninames PLUS 495 uninames poo 496 uninames POO 497 uninames power 498 uninames pumpkin 499 uninames punct 500 uninames quill 501 uninames quot 502 uninames radio 503 uninames right bracket 504 uninames right left 505 uninames RIGHT LEFT 506 uninames roman 507 uninames ROMAN NUM 508 uninames roman numeral 509 uninames round 510 uninames rx 511 uninames Rx 512 uninames SAME 513 uninames santa 514 uninames script 515 uninames SCRIPT 516 uninames set 517 uninames SET 518 uninames sex 519 uninames Sigma 520 uninames skull 521 uninames slash 522 uninames soccer 523 uninames space 524 uninames SPACE 525 uninames spanish 526 uninames sphere 527 uninames square 528 uninames st 529 uninames start 530 uninames st lig 531 uninames stlig 532 uninames subscript -SUBSCRIPT 533 uninames sun 534 uninames SUN 535 uninames 'SU(PER|B)SCRIPT|MODIFIER' '\b[AT]' 536 uninames 'SU(PER|B)SCRIPT|MODIFIER LETTER' '\b[AT]' 537 uninames superscript 538 uninames switch 539 uninames SYM DEL 540 uninames teste 541 uninames testi 542 uninames thin 543 uninames tilde 544 uninames times 545 uninames tongue 546 uninames traffic 547 uninames TWO DOT LEAD 548 uninames VARIATION 549 uninames VARIATION | grep -c VARIATION 550 uninames wand 551 uninames warn 552 uninames WIDTH 553 uninames -WITH '\bAND\b' 554 uninames WITH BAR 555 uninames WITH SLASH 556 uninames WITH STROKE 557 uninames WITH STROKE '\b[BROKENNESS]\b' 558 uninames wiz 559 uninames writ 560 uninames wrong 561 uninames yuan 562 uninames zero 563 uninames ZERO 564 565=head2 Demo of unichars 566 567unichars is the most important 568and useful program, so here are 861 of them, ucsorted, of course. 569 570 unichars -aBbs '\p{Age=6}' 571 unichars -aBbs '\p{Age=6}' '\P{Miscellaneous_Symbols_And_Pictographs}' > /tmp/u6 572 unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' > /tmp/na 573 unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 - > /tmp/ua 574 unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua 575 unichars -ac 'checkFCC' | less -r 576 unichars -ac 'checkFCC(NFD)' 577 unichars -ac 'checkFCC NFD' | less -r 578 unichars -ac 'checkFCC(NFD)' | less -r 579 unichars -ac 'checkFCD' | less -r 580 unichars -ac 'checkNFD' 581 unichars -ac '! checkNFD' | less 582 unichars -ac 'checkNFD' | less 583 unichars -ac '! checkNFD' | less -r 584 unichars -ac 'Comp_Ex' | less -r 585 unichars -ac 'Exclusion' | less -r 586 unichars -ac 'isExclusion' | less -r 587 unichars -ac 'isSingleton' 588 unichars -ac 'isSingleton()' 589 unichars -ac 'isSingleton()' | less -r 590 unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' > /tmp/n1 591 unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --level=1 --upper-before-lower --preprocess='s/..\K.*//' > /tmp/u2 592 unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | wc -l 593 unichars -ac 'NFC_NO' | less -r 594 unichars -ac 'NFD_NO' | less -r 595 unichars -ac 'NFD =~ /\pM/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less 596 unichars -ac 'NFD =~ /\pM/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less -r 597 unichars -ac 'NFD =~ /\pM/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less -r 598 unichars -ac 'NFD =~ /^\PM\pM*\z/ && NFD !~ /^(?:\p{Grapheme_Base}\p{Grapheme_Extend}*|\p{Grapheme_Extend})\z/' | less -r 599 unichars -ac 'NFD =~ /^\PM\pM*\z/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*\z/' | less -r 600 unichars -ac 'NFD =~ /^\X$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less 601 unichars -ac 'NFD =~ /^\X$/ && NFD !~ /^\PM\pM*\z/' | less -r 602 unichars -ac 'NFD =~ /^\X$/ && NFD =~ /^\PM\pM*\z/' | less -r 603 unichars -ac 'NFKC_NO' | less -r 604 unichars -ac 'NonStDecomp' | less -r 605 unichars -ac 'not checkFCC' | less -r 606 unichars -ac 'not checkFCD' | less -r 607 unichars -ac 'not checkNFD' | less 608 unichars -ac 'ord>0xffff && /\p{Latin}/' 609 unichars -ac '\p{cased}' '\PL' | less 610 unichars -ac '\p{cased}' '\P{upper}' | less 611 unichars -ac '\p{cased}' '\P{upper}' '\P{Lower}' | less 612 unichars -ac '\p{Greek}' | less 613 unichars -ac '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s 614 unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc 615 unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --normalization=NFKD > /tmp/uk 616 unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc 617 unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADP]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc 618 unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc 619 unichars -ac '/\pL/ && '[\p{Latin}\p{Common}]' && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc 620 unichars -ac '/\pL/ && [\p{Latin}\p{Common}] && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc 621 unichars -ac '/[\pM\pL]/ && NAME =~ /\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua 622 unichars -ac '/[\pM\pL]/ && NAME =~ /\b(MATHEMATICAL|COMBINING|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua 623 unichars -ac '\p{Upper}' 'NAME !~ /CAPITAL/' 624 unichars -acsbBCnf '\p{Cased}' '[^\p{Ll}\p{Lu}]' 625 unichars -acsbBCnf '/\p{CWCF}/ != /p{CWCM}/' 626 unichars -acsbBCnf '/\p{CWCF}/ != \p{CWCM}' 627 unichars -a -dgfs '\p{Cased}' '\PL' 628 unichars -a -gfs '\p{Cased}' '\PL' 629 unichars -a -gfs '\p{Cased}' '[^\p{Upper}\p{Title}]' 630 unichars -ags 'length NFKD > 5' 631 unichars -a -gs 'length(uc) > 1' 632 unichars -a -gs 'length(ucfirst) > 1' | wc -l 633 unichars -agsn NUM 634 unichars -ags '\p{lowercase}' '\P{Ll}' 635 unichars -ags '\p{lowercase}' '\P{Ll}' | wc -l 636 unichars -ags '\p{uppercase}' '\P{Lu}' 637 unichars -ags '\p{uppercase}' '\P{Lu}' | wc -l 638 unichars -a 'NAME =~ /BALL/' 639 unichars -a 'NAME =~ /EARTH GLOBE/' 640 unichars -anc 'NUM && (10*NUM) !~ /0/' 641 unichars -anc 'UCA =~ UCA("d")' 642 unichars -a 'NFKD =~ /\[/' 643 unichars -a -ngfs 'ord > 0xFFFF' '\p{Cased}' 644 unichars -a -ngfs 'ord > 0xFFFF' '\p{Cased}' '\PL' 645 unichars -a -ngfs '\p{Cased}' '\PL' 646 unichars -a 'ord > 0xffff' 'NAME =~ /FACE/' 647 unichars -a '\p{Age:6.0}' '\P{Numeric_Value=NaN}' 648 unichars -a '\P{Alnum}' '\w' | wc -l 649 unichars -a '\P{Bidi_Class=NSM}' '\p{Mn}' 650 unichars -a '\P{Bidi_Class=NSM}' '\p{Mn}' | wc -l 651 unichars -a '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' | wc -l 652 unichars -a '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' | wc -l 653 unichars -a '\p{Cased}' '\p{Lm}' | wc -l 654 unichars -a '\p{Cased}' '\PL' | wc -l 655 unichars -a '\p{InMiscellaneousSymbolsAnd_Pictographs}' 656 unichars -a '\p{InMiscellaneousSymbolsAnd_Pictographs}' > /tmp/emoji 657 unichars -a '\p{IsThai}' '\P{InThai}' 658 unichars -a '\P{IsThai}' '\p{InThai}' 659 unichars -a '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d 660 unichars -a '\p{Latin}' '\w' | wc -l 661 unichars -a '\p{Lower}' '\P{CWU}' | wc -l 662 unichars -a '\PL' '\p{Alphabetic}' | wc -l 663 unichars -a '\p{Nchar}' 664 unichars -a '\pN' '\W' | wc -l 665 unichars -a '\p{Other_Alphabetic}' '\PM' | less 666 unichars -a '\p{Other_Alphabetic}' '\PM' | M 667 unichars -a '[\p{Pf}\p{Pi}]' 668 unichars -a '[\p{Pi}]' 669 unichars -a '\p{Po}' 670 unichars -a '\p{Title}' '[^\p{CWL}\p{CWU}]' | wc -l 671 unichars -a '\p{Upper}' '\P{CWL}' | wc -l 672 unichars -a 'UCA1 eq UCA1("a")' 673 unichars -a 'UCA1 eq UCA1("a")' | cat -n 674 unichars -a 'UCA1 eq UCA1("a")' | less 675 unichars -a 'UCA1 eq UCA1("d")' | cat -n 676 unichars -a 'UCA1 eq UCA1("e")' | cat -n 677 unichars -a 'UCA1 eq UCA1("g")' | cat -n 678 unichars -a 'UCA1 eq UCA1("m")' | cat -n 679 unichars -a 'UCA1 eq UCA1("p")' | cat -n 680 unichars -a 'UCA eq UCA("d")' 681 unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' 682 unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' | ucsort 683 unichars -a 'UCA eq UCA("d")' > /tmp/d 684 unichars -a '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r 685 unichars -a '\w' '[^_\p{Alphabetic}\p{Nd}]' | wc -l 686 unichars -a '\w' '\PM' 'ord > 0xffff' '\PN' | less 687 unichars -a '\w' '\PM' '\PL' 688 unichars -a '\w' '\PM' '\PL' '\PN' | less 689 unichars -Bbs '\p{Age=6}' 690 unichars -Bbs '\p{Age=6}'o 691 unichars -BCgsa '[\p{CCC=224}\p{CCC=226}' 692 unichars -BCgsa '[\p{CCC=224}\p{CCC=226}]' 693 unichars -BCgsa '[\p{CCC=Left}\p{CCC=Right}]' 694 unichars -BCgsa '\p{Mn}' 695 unichars --bmp --smp 'UCA1 eq UCA1("d")' | cat -n >> /tmp/lets 696 unichars --bmp --smp 'UCA1 eq UCA1("e")' | cat -n 697 unichars --bmp --smp 'UCA1 eq UCA1("e")' | cat -n >> /tmp/lets 698 unichars --bmp --smp 'UCA1(NFKD) eq UCA1("d")' | cat -n >> /tmp/lets2 699 unichars --bmp --smp 'UCA1(NFKD) eq UCA1("e")' | cat -n >> /tmp/lets2 700 unichars -bs . 701 unichars -bs 1 702 unichars '\bSCRIPT\b' '[CEFHILMRego]' 703 unichars -bs '\p{Age=6}' 704 unichars -Bs '\p{Bidiclass:M}' 705 unichars -Bs '\p{BidiM}' 706 unichars -Bs '\p{BI:M}' 707 unichars -B '\w' 708 unichars -B /\w' 709 unichars -c 710 unichars -c '\D' NUM 711 unichars -Cgas '\pM' 712 unichars -Cgas '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k1,1 | less -r 713 unichars -Cgas '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|SOLIDUS|STROKE|LINE/' | sort -t= -k4,4n -k2,2 714 unichars -Cgas '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 715 unichars -Cgsa '\p{Mc}' 716 unichars -Cgsa '\p{Mn}' 717 unichars -Cgs '\p{Me}' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less 718 unichars -Cgs '\pM' 'NAME =~ /above/' | sort -t= -k4,4n -k2,2 719 unichars -Cgs '\pM' 'NAME =~ /ABOVE/' | sort -t= -k4,4n -k2,2 720 unichars -Cgs '\pM' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 721 unichars -Cgs '\pM' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less 722 unichars -Cgs '\pM' 'NAME =~ /SLASH/' | sort -t= -k4,4n -k2,2 723 unichars -Cgs '\pM' 'NAME =~ /TILDE/' | sort -t= -k4,4n -k2,2 724 unichars -Cgs '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k2,2 | less -r 725 unichars -Cgs '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 726 unichars -Cgs '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less 727 unichars -Cgs '\pM' | sort -t= -k4,4n -k2,2 | grep -i TILDE 728 unichars -Cgs '\pM' | sort -t= -k4,4n -k2,2 | less -r 729 unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' 730 unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | less 731 unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | wc -l 732 unichars -c 'NAME =~ /LATIN LETTER SMALL CAPITAL/' | less -r 733 unichars -c 'NAME =~ /ORD/' 734 unichars -c 'NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' 735 unichars -c 'NFD =~ /^\PM\pM*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less 736 unichars -c 'NFD =~ /^\X*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less 737 unichars -c 'NFD =~ /^\X+$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less 738 unichars -c 'NFD =~ /^\X$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less 739 unichars -c 'NFD =~ /^\X$/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less 740 unichars -c NUM 741 unichars -c 'NUM && (10*NUM) !~ /0/' 742 unichars -c 'NUM && 10*NUM !~ /0/' 743 unichars -c 'ord>0xffff && /\p{Latin}/' 744 unichars -c 'ord == 640' 745 unichars -c '\p{Alphabetic}' 746 unichars -c '\p{Alphabetic}' | head -1000 | tail 747 unichars -c '\p{Alphabetic}' | head -3000 | tail 748 unichars -c '\p{Alphabetic}' | less -r 749 unichars -c '\p{Alphabetic}' '\pM' 750 unichars -c '\p{Alphabetic}' '\pM' | less 751 unichars -c '\p{Alphabetic}' '\pM' | less -r 752 unichars -c '\p{cased}' '[^\p{CWU}\p{CWL}\p{CWT}]' | less -r 753 unichars -c '\p{cased}' '\PL' 754 unichars -c '\p{cased}' '\PL' | less 755 unichars -c '\p{Dash}' '\P{Pd}' 756 unichars -c '\p{Greek}' 757 unichars -c '\p{Greek}' | less 758 unichars -c '\p{Greek}' '\p{Lower}' 'ord <= ord("\N{greek:alpha}")' 'ord >= ord "\N{omega}"' 759 unichars -c '\p{Greek}' '\p{Lower}' 'ord() < ord("\N{greek:alpha}") || ord > ord "\N{omega}"' 760 unichars -c '\p{Greek}' '\p{Lower}' 'ord() <= ord("\N{greek:alpha}") || ord >= ord "\N{omega}"' 761 unichars -c '\p{Greek}' '\p{Lower}' 'ord() <= ord("\N{greek:alpha}")' 'ord >= ord "\N{omega}"' 762 unichars -c '\p{IDC}' '\W' 763 unichars -c '\p{IDC}' '\W' | cat -n 764 unichars -c '\p{IDC}' '\W' | wc -l 765 unichars -c '\p{InEnclosed_Alphanumerics}' 766 unichars -c '\p{InEnclosed_Alphanumerics}' '\p{lower}' 767 unichars -c '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/uu 768 unichars -c '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s 769 unichars -c '\p{lower}' '\P{CWU}' 770 unichars -c '\p{lower}' '\P{CWU}' | less 771 unichars -c '\p{lower}' '\p{Lm}' | less 772 unichars -c '\p{lower}' '\p{Lm}' | less -r 773 unichars -c '\p{lower}' '\p{Lm}' | | perl -pe 's/.//' | ucsort --reverse-fields | less 774 unichars -c '\p{lower}' '\p{Lm}' | perl -pe 's/.//' | ucsort --reverse-fields | less 775 unichars -c '\p{lower}' '\p{Lm}' | perl -pe 's/.//' | ucsort --reverse-fields | less -r 776 unichars -c '\p{Mc}' 777 unichars -c '\pM' '\P{Diacritic}' 778 unichars -c '\PM' '\p{Diacritic}' 779 unichars -c '\p{No}' 780 unichars -c '\p{No}' | head 781 unichars -c '\p{No}' | less 782 unichars -c '\p{No}' '\w' 783 unichars -c '\p{No}' '\W' 784 unichars -c '\p{No}' '\w' | head 785 unichars -c '\p{No}' '\W' | head 786 unichars -c '\p{No}' '\W' | less 787 unichars -c '[\p{Pi}\p{Ps}]' 'NAME =~ /VERTICAL/' 788 unichars -c '\pP' '\P{QMark}' 'NAME =~ /QUOT/' 789 unichars -c '\p{Upper}' 'NAME !~ /CAPITAL/' 790 unichars -cs /k/i 791 unichars -cs 'NFD \!~ /d/i' 'NFKD \!~ /d/i' 'UCA eq UCA("d")' 792 unichars -cs 'NFD \!~ /d/i' 'NFKD =~ /d/i' 'UCA eq UCA("d")' 793 unichars -cs 'NFD \!~ /d/i' 'NFKD =~ /d/' 'UCA eq UCA("d")' 794 unichars -cs 'NFD =~ /d/i' 'NFKD =~ /d/' 'UCA eq UCA("d")' 795 unichars -cs 'NFD \!~ /o/i' 'NFKD \!~ /o/i' 'UCA eq UCA("o")' 796 unichars -cs 'NFKD !~ /a/i' 'UCA eq UCA("a")' 797 unichars -cs 'NFKD \!~ /a/i' 'UCA eq UCA("a")' 798 unichars -cs 'NFKD \!~ /a/i' 'UCA =~ UCA("a")' 799 unichars -cs 'NFKD \!~ /a/i' 'UCA =~ UCA("ae")' 800 unichars -cs 'NFKD \!~ /b/i' 'UCA eq UCA("b")' 801 unichars -cs 'NFKD \!~ /c/i' 'UCA eq UCA("c")' 802 unichars -cs 'NFKD \!~ /d/i' 'UCA eq UCA("d")' 803 unichars -cs 'NFKD \!~ /e/i' 'UCA eq UCA("e")' 804 unichars -cs 'NFKD \!~ /f/i' 'UCA eq UCA("f")' 805 unichars -cs 'NFKD \!~ /f/i' 'UCA =~ UCA("f")' 806 unichars -cs 'NFKD \!~ /g/i' 'UCA eq UCA("g")' 807 unichars -cs 'NFKD \!~ /h/i' 'UCA eq UCA("h")' 808 unichars -cs 'NFKD \!~ /o/i' 'UCA eq UCA("o")' 809 unichars -cs 'NFKD \!~ /o/i' 'UCA =~ UCA("oe")' 810 unichars -cs '\P{ASCII}' '(lc() . uc) =~ /\p{ASCII}/' 811 unichars -cs '\P{ASCII}' 'NFD \!~ /\p{ASCII}/' 'NFKD =~ /\p{ASCII}' 812 unichars -cs '\P{ASCII}' 'NFD \!~ /\p{ASCII}/' 'NFKD =~ /\p{ASCII}/' 813 unichars -cs /s/i 814 unichars -cs 'UCA eq UCA("o")' 815 unichars -cs 'UCA =~ UCA ( "a" ) ' 816 unichars -cs ' 'UCA =~ UCA ( "ae" ) ' 817 unichars -cs 'UCA =~ UCA ( "ae" ) ' 818 unichars -c '\w' '\W' 819 unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' 820 unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' | head 821 unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' | less 822 unichars --debug '\p{No}' '\W' | head 823 unichars --debug '\p{No}' '\W' | less 824 unichars --debug '\p{No}' '\W' 'ord < 0xFF || die' | head 825 unichars --debug '\p{No}' '\W' 'ord > 0xFF && die' | head 826 unichars --debug '\p{No}' '\W' 'ord() < 0xFF || die' | head 827 unichars 'defined(NUM) && () ~~ [1..10]' | less -r 828 unichars 'defined(NUM) && [1..10] ~~ NUM' | less -r 829 unichars 'defined(NUM) && [1..1] ~~ NUM' | less -r 830 unichars 'defined(NUM) && ! (NUM() ~~ [0..10])' | less -r 831 unichars 'defined(NUM) && ! NUM() ~~ [0..10]' | less -r 832 unichars 'defined(NUM) && NUM <= 10' 833 unichars 'defined(NUM) && NUM <= 10' | less 834 unichars 'defined(NUM) && NUM <= 10' | less -r 835 unichars 'defined(NUM) && ! NUM() ~~ [1..10]' | less -r 836 unichars 'defined(NUM) && NUM() ~~ [1..10]' | less -r 837 unichars '\d' '\p{common}' 838 unichars '\d' '\p{Latin} 839 unichars '\d' '\p{Latin}' 840 unichars -fgas 'CF =~ "."' 841 unichars -fgas 'CF eq "C"' 842 unichars -fgas 'CF eq "F"' 843 unichars -fgas 'length(uc . ucfirst . lc) != 3' 844 unichars -fgas 'length(uc . ucfirst . lc) != length NFKD * 3' 845 unichars -fgas 'length(uc . ucfirst . lc) != length(NFKD) * 3' 846 unichars -fgas 'not /\U\Q$_/i' 847 unichars -fgas '/\U$_/i' 848 unichars -fgas '/\U\Q$_/i' 849 unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper}' | ucsort | cat -n | less -r 850 unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper})' | ucsort | cat -n | less -r 851 unichars -fgns '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r 852 unichars -fgns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r 853 unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' 854 unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' | ucsort | cat -n | less -r 855 unichars -gacs '\p{Cased}' '\P{CWCM}' | cat -n | less -r 856 unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '^[\p{Ll}\p{Lu}]' 857 unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}]' 858 unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' 859 unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l 860 unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l 861 unichars -ga '\P{ASCII}' '\p{Common}' '\pM' 862 unichars -ga '\P{ASCII}' '\p{Common}' '[\pP\pS]' 863 unichars -ga '\P{ASCII}' '\p{Inherited}' '\PL' 864 unichars -ga '\P{ASCII}' '\p{Inherited}' '\pM' 865 unichars -ga '\P{ASCII}' '[\pP\pS]' 866 unichars -ga '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' 867 unichars -ga '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l 868 unichars -gas 'length(uc) > 1' 869 unichars -gas 'NFKD eq "."' 870 unichars -gasn 'not /\d/' 'NFKD =~ /\d/' 871 unichars -gasn 'not /\d/' 'NFKD =~ /\d/' | wc -l 872 unichars -gasn 'not /\pN/' 'NFKD =~ /^(?=\D*$)\pN/' 873 unichars -gasn 'not /\pN/' 'NFKD =~ /\pN/' 874 unichars -gasn 'not /\pN/' 'NFKD =~ /\pN/' | wc -l 875 unichars -gasn NUM 876 unichars -gasn 'NUM && NUM < 0' 877 unichars -gasn 'NUM || (/\pN/ && /\p{Enclosed_Alphanumerics}/)' 878 unichars -gasn 'NUM || /\p{pokey(tchrist)% ls -d1F uni* 879 unichars -gasn NUM > /tmp/num 880 unichars -gasn NUM | wc -l 881 unichars -gasn '\pN' 'not NUM' 882 unichars -gasn '\PN' 'NUM' 883 unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r 884 unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort --upper-before-lower | less -r 885 unichars -gas '\p{di}' 886 unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BK}]' 887 unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BK}]' '\V' 888 unichars -gas '[\P{LB=CR}\P{LB=LF}\P{LB=NL}\P{LB=BK}]' '\v' 889 unichars -gas '[\P{LB=CR}\P{LB=LF}\P{LB=NL}\P{LB=BK}]' '\V' 890 unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BR}' 891 unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BR}]' 892 unichars -gas '\p{LB=LF}' 893 unichars -gas '\p{LB=NL}' 894 unichars -gas '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r 895 unichars -gas '\pL' 'NAME =~ /\bSCRIPT/' 896 unichars -gas '\PL' 'NAME =~ /\bSCRIPT/' 897 unichars -gas '\p{Lower}' '\P{Ll}' 898 unichars -gas '\p{Lower}' '\P{Ll}' | ucsort | less 899 unichars -gas '\PL' 'uc =~ /\p{Upper}/' 900 unichars -gas '\pM' 901 unichars -gas '\p{Me}' 902 unichars -gas '\p{Other_Lowercase}' 903 unichars -gas '\p{Other_Lowercase}' | wc -l 904 unichars -gas '\p{SB=AT}' 905 unichars -gas '\p{SB=ST}' 906 unichars -gas '\p{sc=greek}' '\P{blk=greek}' 907 unichars -gas '\P{Upper}' '\PL' 'uc =~ /\p{Upper}/' 908 unichars -gas '\R' 909 unichars -gas 'UCA eq UCA("d")' 910 unichars -gas 'UCA eq UCA("d")' 'NFKD !~ /d/i' 911 unichars -gas 'UCA eq UCA("o")' 'NFKD !~ /o/i' 912 unichars -gbas '\p{sc=greek}' '\P{blk=greek}' 913 unichars -gbas '\P{sc=greek}' '\p{blk=greek}' 914 unichars -gcas '\pM' 915 unichars -gCas '\pM' 916 unichars -gc '\p{Control}' 917 unichars -gcs '\p{Cased}' '\P{CWCF}' | cat -n | less -r 918 unichars -gcs '\p{Cased}' '\P{CWCM}' | cat -n | less -r 919 unichars -gcs '\p{Cased}' '[^\p{CWU}\p{CWD' | cat -n | less -r 920 unichars -gcs '\p{Cased}' '\PL' | cat -n | less -r 921 unichars -gCs '\pM' '\P{CCC=0}' | sort -k5.3,5n | less -r 922 unichars -gCs '\pM' '\P{CCC=0}' | sort -k5.4,5n | less -r 923 unichars -gCs '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k1,1 | less -r 924 unichars -gCs '\pM' | sort -k5.4,5n | less -r 925 unichars -gCs '\pM' | sort -k5.4n | less -r 926 unichars -gCs '\pM' | sort -k5.5n | less 927 unichars -gCs '\pM' | sort -k5.5n | less -r 928 unichars -gcs '\p{Titlecase}' 929 unichars -gcs '\p{Titlecase}' | wc -l 930 unichars -gfns '/\p{Lower}/ && /\p{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r 931 unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r 932 unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort --upper | less -r 933 unichars -gfs '\p{Cased}' 934 unichars -gfs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' 935 unichars -gfs '\p{Cased}' '[^\p{Upper}\p{Lower}]' 936 unichars -gfs '\p{Cased}' '[^\p{Upper}\p{Title}]' 937 unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r 938 unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | less -r 939 unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l 940 unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l 941 unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFKD !~ /\p{ASCII}/' | wc -l 942 unichars -g '\P{ASCII}' '\p{Common}' '[\pP\pS]' 943 unichars -g '\p{Cased}' '\P{Alphabetic}' 944 unichars -g '\p{Cased}' '\PL' | cat -n | less -r 945 unichars -g '\p{InHalfwidthAndFullwidthForms}' '\p{bidim}' 946 unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{alpha}+$ 947 unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{alpha}+\z/' 948 unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{perlword}+\z/' 949 unichars -gs '/\A\p{alpha}+\z/ && NFD !~ /\A\p{perlword}+\z/' 950 unichars -gs '/\A\p{alpha}+\z/ && NFKC !~ /\A\p{perlword}+\z/' 951 unichars -gs '/\A\p{alpha}+\z/ && NFKD =~ /\W/' 952 unichars -gs '/\A\p{alpha}+\z/ && ! /\w/' 953 unichars -gsCB 'ord ~~ (0x345,0x37A)' 954 unichars -gsCB 'ord ~~ [0x345,0x37A]' 955 unichars -gsCB 'ord == 0x345 || ord == 0x37A' 956 unichars -gs '\d' 957 unichars -gsfB 'ord == 0x345 || ord == 0x37A' 958 unichars -gsf '/\p{Upper}/ && /\P{CWL}/' 959 unichars -gs 'length(lc) > 1' | wc -l 960 unichars -gs 'length NFKD == 2' 961 unichars -gs 'length NFKD > 4' 962 unichars -gs 'length NFKD > 5' 963 unichars -gs 'length NFKD > 6' 964 unichars -gs 'length NFKD > 7' 965 unichars -gs 'length NFL 966 unichars -gs 'length(uc) > 1' 967 unichars -gs 'length(uc) > 1' 'length(ucfirst) == 1' 968 unichars -gs 'length(uc) > 1' | wc -l 969 unichars -gs 'length(ucfirst) > 1' 970 unichars -gs --locale=de_phonebook ''UCA1 eq UCA1 ( "ae" ) ' 971 unichars -gs --locale=de_phonebook 'UCA1 eq UCA1 ( "ae" ) ' 972 unichars -gs ' NFKD !~ /d/i && UCA1 eq UCA1("d")' 973 unichars -gs 'NFKD !~ /d/i' 'UCA1 eq UCA1("d")' 974 unichars -gs 'NFKD !~ /f/i && UCA1 eq UCA1("f")' 975 unichars -gs 'NFKD !~ /h/i && UCA1 eq UCA1("h")' 976 unichars -gs 'NFKD !~ /,/i' 'UCA1 eq UCA1(",")' 977 unichars -gs 'NFKD !~ /;/i' 'UCA1 eq UCA1(";")' 978 unichars -gs 'NFKD !~ /\?/i' 'UCA1 eq UCA1("?")' 979 unichars -gs 'NFKD !~ /\./i' 'UCA1 eq UCA1(".")' 980 unichars -gs 'NFKD !~ /o/i && UCA1 eq UCA1("o")' 981 unichars -gs ' NFKD(string) !~ /d/i && UCA1 eq UCA1("d")' 982 unichars -gs 'NFKD(string) !~ /d/i' 'UCA1 eq UCA1("d")' 983 unichars -gsn NUM 984 unichars -gsn 'NUM && NUM < 0' 985 unichars -gs '\P{ASCII}' '\p{Common}' '\pP' 986 unichars -gs '\p{Bidi_Class=NSM}' '\P{Mn}' 987 unichars -gs '\P{Bidi_Class=NSM}' '\p{Mn}' 988 unichars -gs '\p{bidim}' 989 unichars -gs '[\p{bidim}\p{Ps}]' 990 unichars -gs '[\p{bidim}\p{Ps}\p{Pe}]' 991 unichars -gs '\p{Cased}' 'Comp_Ex()' 992 unichars -gs '\p{Cased}' 'Exclusion()' 993 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' 994 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL} NFKD =~ /\W/' 995 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less 996 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r 997 unichars -gs '\p{Cased}' 'Singleton()' 998 unichars -gs '\p{Inherited}' 999 unichars -gs '/(?=\P{Ll})\p{Lower}|(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r 1000 unichars -gs '/(?=\P{Ll})\p{Lower}|/(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r 1001 unichars -gs '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r 1002 unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r 1003 unichars -gs '\p{Lower}' 1004 unichars -gs '\p{lowercase}' '\P{Ll}' 1005 unichars -gs '\p{Lower}' '\P{CWCM}' 1006 unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less 1007 unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less -r 1008 unichars -gs '\p{Lower}' '\p{CWU}' | wc -l 1009 unichars -gs '\p{Lower}' '\P{Ll}' | ucsort | less -r 1010 unichars -gs '\PL' '\p{Lower}' '\p{CWCF}' 1011 unichars -gs '\PL' '\p{Lower}' '\p{CWCM}' 1012 unichars -gs '\PL' '\p{Lower}' '\P{CWCM}' 1013 unichars -gs '\PL' '\p{Lower}' '\p{CWU}' 1014 unichars -gs '\pL' '\p{Lower}' '\p{CWU}' | wc -l 1015 unichars -gs '\PL' '\p{Lower}' '\p{CWU}' | wc -l 1016 unichars -gs '\pL' '\p{Lower}' '\P{Ll}' 1017 unichars -gs '\pL' '\p{Lower}' '\P{Ll}' '\p{CWU}' | wc -l 1018 unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower 1019 unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r 1020 unichars -gs '\pS' 'NFKD !~ /\pS/' 1021 unichars -gs '[\pS\pP]' 'NFKD !~ /[\pS\pP]/' 1022 unichars -gs '\p{Symbol}' 1023 unichars -gs '\p{uppercase}' '\P{Lu}' | wc -l 1024 unichars -gs '/\p{Upper}/ && /\P{CWL}/' 1025 unichars -gs '/\p{Upper}/ && /\P{CWL}/' | ucsort | less 1026 unichars -gs '/\p{Upper}/ && /\P{CWT}/' | ucsort | less 1027 unichars -gsS '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' 1028 unichars -gs 'UCA1 eq UCA1(";")' 1029 unichars -gs 'UCA eq UCA("&")' 1030 unichars -gs 'UCA eq UCA("d")' 1031 unichars -gua '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l 1032 unichars -gua '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l 1033 unichars --help 1034 unichars '/ij/i' 1035 unichars 'length(lc) > 1 1036 unichars 'length(lc) > 1' 1037 unichars 'length(lcfirst) != length' 1038 unichars 'length(lcfirst) != length(uc)' 1039 unichars 'length(lc) < length(uc)' 1040 unichars 'length(NFD) == 1 && length(NFC) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less -r 1041 unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less -r 1042 unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | ucsort 1043 unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | wc -l 1044 unichars 'length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' > /tmp/ndc & 1045 unichars 'length(uc) > 1' 1046 unichars 'length(ucfirst) > 1' 1047 unichars --locale=de__phonebook 'NFD =~ /a/i/' 1048 unichars --locale=de__phonebook 'NFD() =~ /a/i' 1049 unichars --locale=de__phonebook 'NFD() =~ /a/i' | less -r 1050 unichars --locale=de__phonebook 'UCA eq UCA("ae")' 'NFKD !~ /d/i' | ucsort 1051 unichars --locale=de__phonebook 'UCA eq UCA("ae")' 'NFKD !~ /d/i' | ucsort --locale=de__phonebook 1052 unichars --locale=de__phonebook 'UCA(NFKD) =~ UCA("a WITH DIAERESIS")' 1053 unichars --locale=de__phonebook 'UCA() =~ UCA("a")' 1054 unichars --locale=de__phonebook 'UCA() =~ UCA("a")' | less -r 1055 unichars --locale=de__phonebook 'UCA =~ UCA("a WITH DIAERESIS")' 1056 unichars --locale=en 'UCA eq UCA("ae")' 1057 unichars --locale=en "UCA eq UCA("ae")' 1058 unichars 'NAME =~ /BALL/' 1059 unichars 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' 1060 unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' 1061 unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l 1062 unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH\b.*\b[ABCD]\b/' 1063 unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH\b.*\b[ABCD]\b/' | less -r 1064 unichars 'NAME =~ /PRIME/' 1065 unichars -nc 'NUM && (10*NUM) !~ /0/' 1066 unichars -nc 'UCA eq UCA("d")' 1067 unichars 'NFD =~ /ij/i' 1068 unichars 'NFD ne NFKD' 1069 unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' 1070 unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less 1071 unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' > /tmp/ndc & 1072 unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | wc -l 1073 unichars 'NFD !~ /\pM/ && NAME =~ /LATIN.*LETTER WITH/' 1074 unichars 'NFD =~ /^\PM\pM*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' 1075 unichars 'NFKD =~ /\[/' 1076 unichars 'NFKD eq ","' 1077 unichars 'NFKD eq ":"' 1078 unichars 'NFKD eq ".."' 1079 unichars 'NFKD eq "*"' 1080 unichars 'NFKD eq 'comma' 1081 unichars 'NFKD eq "\N{PRIME}"' 1082 unichars 'NFKD =~ /ij/i' 1083 unichars 'NFKD \!~ /s/i and UCA =~ UCA "s"' 1084 unichars 'NFKD \!~ /s/i || UCA =~ UCA "s"' 1085 unichars 'NFKD =~ /s/i || UCA =~ UCA "s"' 1086 unichars -ngas 'NUM && not NUM ~~ [ 0..10 ]' 1087 unichars -ngas 'NUM && not NUM ~~ [ 1..10 ]' 1088 unichars --nopager -gaBsn 'NUM && int(NUM) != NUM' 1089 unichars --nopager -gaBsn 'NUM && NUM == 100' 1090 unichars --nopager -gasn 'NUM && NUM == 100' 1091 unichars --nopager -gsn 'NUM && int(NUM) != NUM' 1092 unichars --nopager -gsn 'NUM && NUM < 0' 1093 unichars --nopager --locale=de__phonebook 'UCA eq UCA("ae")' 1094 unichars --nopager --locale=en 'UCA eq UCA("ae")' 1095 unichars --nopager --locale=is 'UCA eq UCA("ae")' 1096 unichars --nopager 'UCA eq UCA("ae")' 1097 unichars --nopager 'UCA eq UCA("ae")' | ucsort 1098 unichars --nopager 'UCA eq UCA("ae")' | ucsort --upper 1099 unichars 'not /\d/' 'NFKD =~ /\d/' 1100 unichars 'not /\w/' 1101 unichars 'not /\w/' 'not /\W/' 1102 unichars -nsag '\p{Cased}' 'NUM' 1103 unichars -nsag '\p{Lower}' '\P{CWU}' 1104 unichars -ns 'UCA eq UCA("d")' 1105 unichars NUM 1106 unichars 'ord ~~ [ 0x2622, 0x26bd]' 1107 unichars 'ord ~~ [ 0x2622, 0x2bbd]' 1108 unichars 'ord == 0x2622 || ord == 0x26bd' 1109 unichars 'ord>0xffff' '\p{Po}' 1110 unichars 'ord < 255' '\p{pattern_syntax}' 1111 unichars 'ord() < 255' '\p{pattern_syntax}' 1112 unichars 'ord() < 255' '\p{pattern_syntax}' | less 1113 unichars 'ord() < 255' '\p{pattern_syntax}' | wc -l 1114 unichars '\p{Age:6.0}' 1115 unichars '\p{Age:6.0}' '\p{Numeric_Value=NaN}' 1116 unichars '\p{Age:6.0}' '\P{Numeric_Value=NaN}' 1117 unichars '\p{alnum}' '\P{word}' 1118 unichars '\P{alnum}' '\p{word}' 1119 unichars '\p{alnum}' '\W' 1120 unichars '\p{Alnum}' '\W' 1121 unichars '\P{Alnum}' '\w' 1122 unichars '\P{Alnum}' '\w' | less 1123 unichars '\p{alnum}' '\W' | wc -l 1124 unichars '\P{alnum}' '\w' | wc -l 1125 unichars '\P{alnum}' '\W' | wc -l 1126 unichars '\P{Alnum}' '\w' | wc -l 1127 unichars '\p{Alphabetic}' '\P{XPosixAlpha}' | less 1128 unichars '\P{Alphabetic}' '\p{XPosixAlpha}' | less 1129 unichars '\p{alpha}' '\p{CI}' | less -r 1130 unichars '\p{alpha}' '\p{CI}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r 1131 unichars '\p{alpha}' '\P{XPosixAlpha}' | less 1132 unichars '\P{alpha}' '\p{XPosixAlpha}' | less 1133 unichars '\P{alpha}' '\P{XPosixAlpha}' | less 1134 unichars '\P{ASCII}' '(lc() . uc) =~ /\p{ASCII}/' 1135 unichars '\P{ASCII}' 'lc.uc =~ /\p{ASCII}/ 1136 unichars '\P{ASCII}' 'lc.uc =~ /\p{ASCII}/' 1137 unichars '\P{ASCII}' 'ord() < 255' '\p{pattern_syntax}' | wc -l 1138 unichars '\P{ASCII}' 'ord() < 255' '\W' | wc -l 1139 unichars '\P{ASCII}' '\p{Common}' '\pP' 1140 unichars '\p{BC=ON}' 1141 unichars '\P{Bidi_Class=NSM}' '\p{Mn}' 1142 unichars '\p{BidiM}' '\pS' 1143 unichars '\p{Block=CombiningDiacriticalMarks}' 1144 unichars '\p{Block=CombiningDiacriticalMarks}' '\PM' 1145 unichars '\p{Block=CombiningDiacriticalMarks}' '\p{Mn}' 1146 unichars '\p{Block=CombiningDiacriticalMarks}' '\P{Mn}' 1147 unichars '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' 1148 unichars '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' | wc -l 1149 unichars '\p{Cased}' '\P{Changes_When_Casefolded}' 1150 unichars '\p{Cased}' '\p{Changes_When_Casemapped}' 1151 unichars '\p{Cased}' '\P{Changes_When_Casemapped}' 1152 unichars '\P{Cased}' '\p{Changes_When_Casemapped}' 1153 unichars '\p{Cased}' '\p{Changes_When_Casemapped}' | less 1154 unichars '\p{Cased}' '\P{Changes_When_Casemapped}' | less 1155 unichars '\p{Cased}' '\p{Changes_When_Casemapped}' | less -r 1156 unichars '\p{Cased}' '\P{Changes_When_Casemapped}' | less -r 1157 unichars '\P{Cased}' '\p{Changes_When_Casemapped}' | less -r 1158 unichars '\p{Cased}' '\p{CI}' 1159 unichars '\p{Cased}' '\p{CI}' | less 1160 unichars '\p{Cased}' '\p{CI}' | less -r 1161 unichars '\p{cased}' '[^\p{CWU}\p{CWL}\p{CWT}]' | less -r 1162 unichars '\p{cased}' '[\^p{CWU}\p{CWL}\p{CWT}]' | less -r 1163 unichars '\p{cased}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r 1164 unichars '\p{cased}' '\PL' 1165 unichars '\p{Cased}' '\PL' 1166 unichars '\p{cased}' '\PL' | less 1167 unichars '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' 1168 unichars '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' | wc -l 1169 unichars '\p{Cased}' '\p{Lm}' | wc -l 1170 unichars '\p{Cased}' '\PL' | wc -l 1171 unichars '\p{Cased}' '\pM' 1172 unichars '\p{cased}' '[^\p{upper}\p{lower}]' | less 1173 unichars '\p{cased}' '[^\p{upper}\p{lower}\p{title}]' | less 1174 unichars '\p{cased}' '[^\p{upper}\p{lower}\p{title}]' | less -r 1175 unichars '\p{CC=A}' 1176 unichars '\p{CCC=A}' 1177 unichars '\p{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' 1178 unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' 1179 unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' | less 1180 unichars '\p{Changes_When_Casefolded}' '\P{Changes_When_Casemapped}' | less -r 1181 unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' | less -r 1182 unichars '\p{CI}' | less -r 1183 unichars '\p{CI}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r 1184 unichars '\p{Common}' '\pP' 1185 unichars '\p{Control}' 1186 unichars '\p{Control_Pictures}' 1187 unichars '\p{CWL}' 'NAME =~ /LATIN LETTER SMALL CAPITAL/' 1188 unichars '\p{CWTC}' '\PL' 1189 unichars '\p{CWT}' '\PL' 1190 unichars '\p{CWU}' 'NAME =~ /LATIN LETTER SMALL CAPITAL/' 1191 unichars '\p{Dash}' 1192 unichars '\p{di}' 1193 unichars '\p{E 1194 unichars '\p{EA=W}' 1195 unichars '\p{Greek}' '\pP' 1196 unichars '\p{Greek}' '\pS' 1197 unichars '\p{InGreek}' '\P{IsGreek}' | wc -l 1198 unichars '\P{InGreek}' '\p{IsGreek}' | wc -l 1199 unichars '\p{InHalfwidthAndFullwidthForms}' '\p{bidim}' 1200 unichars '\p{InHiragana}' '\P{Hiragana}' 1201 unichars '\p{InHiragana}' '\P{Kana}' 1202 unichars '\p{InHirakana}' '\P{Kana}' 1203 unichars '\p{InKatakana}' '\P{Kana}' 1204 unichars '\P{InKatakana}' '\p{Kana}' 1205 unichars '\P{InKatakana}' '\p{Kana}' | less 1206 unichars '\p{InLatin}' '\P{IsLatin}' | wc -l 1207 unichars '\p{InMiscellaneousSymbolsAnd_Pictographs}' 1208 unichars '\p{InThai}' '\P{IsThai}' 1209 unichars '\p{IsThai}' '\P{InThai}' 1210 unichars '\p{Latin} 1211 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r 1212 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r 1213 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r 1214 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 > /tmp/u1 1215 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/u4 1216 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[C-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r 1217 unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' '$$CF{full} =~ / /' 1218 unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /./' '$$CF{full} =~ / /' 1219 unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /F/' 1220 unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /F/' '$$CF{full} =~ / /' 1221 unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /[SF]/' '$$CF{full} =~ / /' 1222 unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,4}$/' 'CF =~ /F/' '$$CF{full} =~ / /' 1223 unichars '\p{Latin}' 'NAME =~ /\b\pL{2,3}$/' 'CF =~ /F/' '$$CF{full} =~ / /' 1224 unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' 1225 unichars '\p{Latin}''NAME =~ /\bWITH\b/' 'length(NFKD) == 1' 1226 unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort | less 1227 unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d 1228 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | less 1229 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | ucsort | less 1230 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | ucsort | less -r 1231 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | wc -l 1232 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b.*\bWITH\b/' | ucsort | less -r 1233 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b.*\bWITH\b/' | ucsort --level=1 | less -r 1234 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l 1235 unichars '\p{Latin} && NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l 1236 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort | less -r 1237 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=1 | less -r 1238 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=1 > /tmp/u1 1239 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=2 > /tmp/u2 1240 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=3 > /tmp/u3 1241 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=4 > /tmp/u4 1242 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --level=1 | less -r 1243 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --level=4 > /tmp/u4 1244 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --preprocess 's/..\K\h+\S+//' --level=1 | less -r 1245 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --preprocess 's/..\K.*//' --level=1 | less -r 1246 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r 1247 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r 1248 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/uu 1249 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[CD]\b.*\bWITH\b/' | wc -l 1250 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[CD].*\bWITH\b/' | wc -l 1251 unichars '\p{Latin}' '\w' 1252 unichars '\p{Latin}' '\w' | wc -l 1253 unichars '\pL' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' 1254 unichars '\pL' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l 1255 unichars '\pL' 'length(NFD) == 1' 'NAME =~ /WITH/' 1256 unichars '\p{Lm}' '\p{Cased}' 1257 unichars '\p{Lm}' '\p{Changes_When_Casemapped}' 1258 unichars '\p{Lm}' '\p{upper}' 1259 unichars '\pL' 'NAME =~ /\bSCRIPT/' 1260 unichars '\PL' 'NAME =~ /\bSCRIPT/' 1261 unichars '\pL' 'NAME =~ /SCRIPT/' 1262 unichars '\pL' 'NAME =~ /WITH/' | wc -l 1263 unichars '\p{Lower}' 'length(uc) > 1' 1264 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' 1265 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | less 1266 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6}' 1267 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6.0}' 1268 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age=6.0}' 1269 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6.0.0}' 1270 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s 1271 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | wc -=l 1272 unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | wc -l 1273 unichars '\p{Lower}' 'NAME !~ /SMALL CAPITAL|CAPITAL LETTER/' | wc -l 1274 unichars '\p{Lower}' 'NAME =~ /SMALL CAPITAL|CAPITAL LETTER/' | wc -l 1275 unichars '\p{lower}' '\P{CWL}' | less -r 1276 unichars '\p{Lower}' '\P{CWU}' 1277 unichars '\p{lower}' '\P{CWU}' | less -r 1278 unichars '\p{Lower}' '\P{CWU}' | wc -l 1279 unichars '[\p{Lower}\p{Upper}' '[^\p{CWU}\p{CWL}]' | less -r 1280 unichars '[\p{Lower}\p{Upper}]' '[^\p{CWU}\p{CWL}]' | less -r 1281 unichars '\pL' '\p{Alphabetic}' | wc -l 1282 unichars '\PL' '\p{Alphabetic}' | wc -l 1283 unichars '\pL' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' 1284 unichars '\pL' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l 1285 unichars '\pL' '\p{Latin}' 'NAME =~ /WITH/' | wc -l 1286 unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' 1287 unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %1 1288 unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %2 1289 unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %2 > /tmp/a 1290 unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l 1291 unichars '\pL' '\P{Lm}' '\p{Latin}' 'NAME =~ /WITH/' | wc -l 1292 unichars '\PL' 'uc =~ /\p{Lower}/' 1293 unichars '\PL' 'uc =~ /\p{Upper}/' 1294 unichars '\p{Math}' 1295 unichars '\p{Mc}' 1296 unichars '\p{Me}' 1297 unichars '[\p{Miscellaneous_Symbols}\p{Miscellaneous_Symbols_and_Pictographs}]' 1298 unichars '\pM' '\P{Grapheme_Extend}' 1299 unichars '\PM' '\p{Grapheme_Extend}' 1300 unichars '\PM' '\P{Grapheme_Extend}' 1301 unichars '\p{Nchar}' 1302 unichars '\p{Nl}' 1303 unichars '\p{No}' '\w' 1304 unichars '\p{No}' '\W' 1305 unichars '\p{No}' '\W' | head 1306 unichars '\p{No}' '\W' | less 1307 unichars '\pN' '\P{Nd}' | less 1308 unichars '\p{Numeric_Value=NaN}' 1309 unichars '/\P{NV=NaN}/ && ! (NUM() ~~ [0..10])' | less -r 1310 unichars '/\P{NV=NaN/ && ! (NUM() ~~ [0..10])' | less -r 1311 unichars '/\P{NV=NAN/ && ! (NUM() ~~ [0..10])' | less -r 1312 unichars '\pN' '\W' 1313 unichars '\pN' '\W' | wc -l 1314 unichars '\p{Other_Alphabetic}' 1315 unichars '\p{Other_Alphabetic}' | less 1316 unichars '\p{Other_Alphabetic}' '\PM' | less 1317 unichars '\p{Other_Lowercase}' 1318 unichars '\p{Other_Lowercase}' | less 1319 unichars '\pP' 1320 unichars '\p{Pc}' 1321 unichars '\P{Pd}' '\p{Dash}' 1322 unichars '\p{Pe}' 1323 unichars '\p{Pf}' 1324 unichars '[\p{Pf}\p{Pi}]' 1325 unichars '[\p{Pf}\p{Pi}]' '\p{BidiM}' 1326 unichars '[\p{Pf}\p{Pi}]' '\P{BidiM}' 1327 unichars '[\p{Pi}]' 1328 unichars '\p{Pi}' '\p{bidim}' 1329 unichars '[\p{Pi}]' '\P{BidiM}' 1330 unichars '[\p{Pi}\p{Ps}]' 'NAME =~ /VERTICAL/' 1331 unichars '\p{Po}' 1332 unichars '\p{Po}' '\p{bidim}' 1333 unichars '[\p{Po}\p{Pe}]' '\P{BidiM}' 1334 unichars '\pP' '\P{QMark}' 'NAME =~ /QUOT/' 1335 unichars '\p{Ps}' 'NAME =~ /VERTICAL/' 1336 unichars '\p{Ps}' '\p{bidim}' 1337 unichars '\p{Ps}' '\P{bidim}' 1338 unichars '\p{Ps}' '\P{bidim}' | less 1339 unichars '\p{Ps}' '\p{bidim}' | wc -l 1340 unichars '\p{Ps}' '\P{bidim}' | wc -l 1341 unichars '[\p{Ps}\p{Pe}]' '\P{BidiM}' 1342 unichars '\p{Qmark}' 1343 unichars '\p{QMark}' 1344 unichars '\pS' 1345 unichars '\p{Sk}' 1346 unichars '\pS' 'NAME =~ /BALL/' 1347 unichars '\p{Surrogate}' 1348 unichars '\p{Title}' 'lc !~ /\p{Lower}/' 1349 unichars '\p{Title}' '[^\p{CWL}\p{CWU}]' 1350 unichars '\p{Title}' '[^\p{CWL}\p{CWU}]' | wc -l 1351 unichars '\p{Title}' '\P{Lt}' 1352 unichars '\p{Title}' '\P{Lt|} 1353 unichars '\p{Title}' 'uc !~ /\p{Upper}/' 1354 unichars '\p{Upper}' 'NAME !~ /CAPITAL/' 1355 unichars '\p{UPPER}' 'NAME =~ /SMALL/' 1356 unichars '\p{Upper}' '\P{CWL}' 1357 unichars '\p{Upper}' '\P{CWL}' | wc -l 1358 unichars '\p{WhiteSpace}' '\PZ' 1359 unichars '\p{XPosixAlnum}' | less 1360 unichars '\p{XPosixAlnum}' '\P{XPosixAlpha}' | less 1361 unichars '\R' 1362 unichars '\R' | field %2 1363 unichars -sag Comp_Ex 1364 unichars -sag '\p{Lower}' '\P{CWU}' 1365 unichars -sag '\PL' '\p{Lower}' '\p{CWU}' 1366 unichars -sag '\PL' '\p{Lower}' '\P{CWU}' 1367 unichars -sag '\PL' '\p{Upper}' '\p{CWL}' 1368 unichars -sag '\PL' '\p{Upper}' '\P{CWL}' 1369 unichars -sag '\p{Upper}' '\P{CWL}' 1370 unichars -sag Singleton 1371 unichars /s/i 1372 unichars -ua '\p{Assigned}' 1373 unichars -ua '\p{Assigned}' | wc -l 1374 unichars 'UCA1 eq UCA1("a")' 1375 unichars 'UCA1 eq UCA1("a")' | cat -n 1376 unichars 'UCA1 eq UCA1("ae")' 1377 unichars 'UCA1 eq UCA1("d")' 1378 unichars 'UCA1 eq UCA1("d")' | cat -n 1379 unichars 'UCA1 eq UCA1("ij")' 1380 unichars 'UCA1 =~ UCA1("d")' 1381 unichars 'UCA2 eq UCA2("ij")' 1382 unichars 'UCA3 eq UCA3("ij")' 1383 unichars 'UCA eq UCA "%"' 1384 unichars 'UCA eq UCA("a")' 1385 unichars 'UCA eq UCA("ae")' 1386 unichars 'UCA eq UCA("d")' 1387 unichars 'UCA eq UCA("d")'df 1388 unichars 'UCA eq UCA "\N{PRIME}"' 1389 unichars 'UCA eq UCA("p")' 1390 unichars 'UCA eq UCA("p")' | wc -l 1391 unichars 'UCA eq UCA("s")' | wc -l 1392 unichars 'UCA(NFKD) =~ UCA("a")' 1393 unichars 'UCA(NFKD) =~ UCA("ae")' 1394 unichars 'UCA(NFKD) =~ UCA("a")' 'NFKD !~ /a/i' 1395 unichars 'UCA(NFKD) =~ UCA("d")' 1396 unichars 'UCA(NFKD) =~ UCA("d")' 'UCA ne UCA("d")' 1397 unichars 'UCA(NFKD) =~ UCA("i")' 'NFKD !~ /i/i' 1398 unichars 'UCA(NFKD) =~ UCA("m")' 1399 unichars 'UCA(NFKD) =~ UCA("o")' 1400 unichars 'UCA(NFKD) =~ UCA("oe")' 1401 unichars 'UCA(NFKD) =~ UCA("o")' 'NFKD !~ /o/i' 1402 unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' 1403 unichars '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r 1404 unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' 'NFKD !~ /o/i' 1405 unichars 'UCA(NFKD) =~ UCA("o")."|".UCA("a")' 'NFKD !~ /o/i' 1406 unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' | ucsort | less 1407 unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' | ucsort | less -r 1408 unichars 'UCA(NFKD) =~ UCA("s")' 1409 unichars 'UCA(NFKD) =~ UCA("z")' 1410 unichars 'UCA(NFKD) =~ UCA("z")' 'NFKD !~ /z/i' 1411 unichars 'UCA(NFKD) =~ UCA("z")' 'UCA ne UCA("z")' 1412 unichars 'UCA =~ UCA("d")' 'UCA ne UCA("d")' 1413 unichars 'UCA =~ UCA("i") && UCA =~ UCA("j")' 1414 unichars 'UCA =~ UCA("p")' 1415 unichars 'UCA =~ UCA("p")' | wc -l 1416 unichars 'UCA =~ UCA("s")' 'UCA ne UCA("s")' 1417 unichars 'UCA =~ UCA("s")' 'UCA ne UCA("s")' | wc -l 1418 unichars 'UCA =~ UCA("s")' | wc -l 1419 unichars 'uc ne ucfirst' 1420 unichars -v 1421 unichars '\w' '[^_\p{Alphabetic}\p{Nd}]' 1422 unichars '\w' '[^_\p{Alphabetic}\p{Nd}]' | wc -l 1423 unichars '\w' '[^\pL\pN\pM]' 1424 unichars '\w' '[^\pL\pN\pM\p{Pc}]' 1425 unichars '\w' '[^\pL\pN\pM\p{Pc}]' | less 1426 unichars '\w' '[^\pL\p\pM]' 1427 unichars '\w' '\P{word}' 1428 unichars '\W' '\p{word}' 1429 unichars '/\w/ == /\W/' 1430 unichars '\w' '\W' 1431 1432=head2 Demo of ucsort 1433 1434 ucsort ../CRAFT-dumps/lc-not-unique | less 1435 ucsort --level=1 --upper-before-lower --preprocess="s/\s.*//" fing-2 | less 1436 ucsort --locale=ca /tmp/cat 1437 ucsort --locale=es /tmp/cat 1438 ucsort --locale=es__traditional /tmp/cat 1439 ucsort --locale=es_traditional /tmp/cat 1440 ucsort --locale=ru /tmp/cyril > /tmp/cyril.ru 1441 ucsort overlapping-obos | less 1442 ucsort --pre '/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less 1443 ucsort --pre='/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less 1444 ucsort --preprocess='/.*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less 1445 ucsort --preprocess='/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less 1446 ucsort --preprocess='s/(.*)([\[\]].*[\[\]])(.*)/$2 $1 $3/' --reverse-input stem-fail-tally | less 1447 ucsort --preprocess='s/(.*)([\[\]].*[\[\]])(.*)/$2 $1 $3/' --reverse-input stem-fail-tally > stem-fail-sort 1448 ucsort --preprocess='s/([\[\]].*[\[\]])(.*)/$2 $1/' --reverse-input stem-fail-tally | less 1449 ucsort --preprocess='s/(\[.*\])(.*)/$2 $1/' --reverse-input stem-fail-tally | less 1450 ucsort --preprocess='s/(\].*\[)(.*)/$2 $1/' --reverse-input stem-fail-tally | less 1451 ucsort --preprocess='s/(\[FAIL: .*\])(.*)/$2 $1/' --reverse-input stem-fail-tally | less 1452 ucsort --preprocess='s/.*\gt//' go-greek 1453 ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//' ../CRAFT-dumps/lc-not-unique | & less 1454 ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//' ../CRAFT-dumps/lc-not-unique | less 1455 ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less 1456 ucsort --preprocess='s/.*\t//' go-greek | less 1457 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/lc-not-unique | & less 1458 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/lc-not-unique | tac > ../CRAFT-dumps/lcnu-sort 1459 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/not-unique | tac > ../CRAFT-dumps/nu-sort 1460 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' lc-not-unique | tac > lc-nu-sort 1461 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' not-unique | tac > nu-sort 1462 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $n)/e' ../CRAFT-dumps/lc-not-unique | & less 1463 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $n = $1; s/^/sprintf("%06d", $n)/e' ../CRAFT-dumps/lc-not-unique | & less 1464 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | & less 1465 ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less 1466 ucsort --pre='s/.*] //' gene* | less 1467 ucsort --pre='s/.*=> //' gene* | less 1468 ucsort --pre 's/.*\gt//' go-greek | less 1469 ucsort --pre='s/^\S+\h+//' pmc-weirds | less 1470 ucsort --reverse-input go-uglies | perl -nle ' printf "%50s\n", $_' > flip-go-uglies 1471 ucsort --reverse-input stem-fail-tally | less 1472 ucsort --reverse-input stem-fail-tally > stem-fail-sort2 1473 ucsort --reverse-input ugly-tally > flip-ugly-tally 1474 ucsort -reverse-input ugly-tally > flip-ugly-tally 1475 ucsort --reverse-input uuglies > flip-uuglies 1476 ucsort /tmp/emoji | less 1477 ucsort /tmp/emoji | unifmt -180 | less 1478 ucsort < /tmp/u | uniq > /tmp/uu 1479 ucsort /tmp/uw > /tmp/u 1480 ucsort --upper-before-lower --preprocess="s/\h.*//" fing-2 > fa 1481 ucsort --upper-before-lower --preprocess="s/\s.*//" fing-2 1482 ucsort --upper --pre='s/^\S+\h+//' pmc-weirds | less 1483 ucsort --upper --pre='s/^\S+\h+//' pmc-weirds > pmc-wsort 1484 1485 unichars -a '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d 1486 unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' | ucsort 1487 unichars -a '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r 1488 unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper}' | ucsort | cat -n | less -r 1489 unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper})' | ucsort | cat -n | less -r 1490 unichars -fgns '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r 1491 unichars -fgns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r 1492 unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' | ucsort | cat -n | less -r 1493 unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r 1494 unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort --upper-before-lower | less -r 1495 unichars -gas '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r 1496 unichars -gas '\p{Lower}' '\P{Ll}' | ucsort | less 1497 unichars -gfns '/\p{Lower}/ && /\p{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r 1498 unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r 1499 unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort --upper | less -r 1500 unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r 1501 unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | less -r 1502 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less 1503 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r 1504 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r 1505 unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r 1506 unichars -gs '/(?=\P{Ll})\p{Lower}|(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r 1507 unichars -gs '/(?=\P{Ll})\p{Lower}|/(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r 1508 unichars -gs '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r 1509 unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r 1510 unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r 1511 unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less 1512 unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less -r 1513 unichars -gs '\p{Lower}' '\P{Ll}' | ucsort | less -r 1514 unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower 1515 unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r 1516 unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r 1517 unichars -gs '/\p{Upper}/ && /\P{CWL}/' | ucsort | less 1518 unichars -gs '/\p{Upper}/ && /\P{CWT}/' | ucsort | less 1519 unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort | less 1520 unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d 1521 1522 uninames CYRIL | ucsort | less 1523 1524 cat *-top | field %2 | tally | ucsort --reverse-input | less 1525 cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %13s\n", @F[0,1]' | less 1526 cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %15s\n", @F[0,1]' | less 1527 cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %18s\n", @F[0,1]' > hapax50 1528 cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %18s\n", @F[0,1]' | less 1529 cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %25s\n", @F[0,1]' 1530 cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %25s\n", @F[0,1]' | less 1531 1532 egrep '^GO:(0005024|0005025|0005026|0007179|0015052)' ../CRAFT-dumps/CRAFT-go | sort -u | ucsort | less 1533 egrep '^GO:(0005024|0005025|0005026|0007179|0015052)' ../CRAFT-dumps/CRAFT-go | ucsort | less 1534 1535 ls *.obo* | ucsort 1536 ls *.obo* | ucsort | perl -le 'printf "%-40s", $_' 1537 ls *.obo* | ucsort | perl -lne 'printf "%-40s\n", $_' 1538 ls *.obo* | ucsort | perl -lne 'printf "%40s\n", $_' 1539 1540 perl5.12.0 -S -CLA unichars '/s/i' | ucsort 1541 perl -CS -E 'say for split " ", "cat ca\x{308}t czt c\x{e4}t bat dat"' | ucsort --locale=sv 1542 perl fingerprint $cat | ucsort > fing-all2 1543 perl fingerprint $cat | ucsort --upper-before-lower --preprocess="s/\h.*//" > fing-all2 1544 perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep 'ABBREV' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/ab 1545 perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep 'GEN' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/ge 1546 perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/......//' 1547 perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' 1548 perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/d 1549 perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/......//; s/\h.*//' 1550 perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='$_ = fixstring($_)' ovl | & less 1551 perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='$_ = fixstring($_)' ovl > ovl-sort 1552 perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='\&fixstring' ovl | & less 1553 perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='\&fixstring' ovl | less 1554 1555 repeat 200 randline /tmp/u6 | ucsort | uniq > /tmp/u 1556 1557 tcgrep '^0' ../../new-output/results-go-gene-stemwords-GOOD | ucsort --reverse-input | less 1558 1559=head2 Demo of unilook 1560 1561 unilook activation 1562 unilook adi 1563 unilook adieu 1564 unilook 'alis\b' 1565 unilook angina 1566 unilook arthrit 1567 unilook ascite 1568 unilook betab 1569 unilook '\boverexertion\b' 1570 unilook capitali 1571 unilook catheterization 1572 unilook defib 1573 unilook delineate 1574 unilook digitalis 1575 unilook dofetilide 1576 unilook dysauto 1577 unilook dyssyn 1578 unilook dysyn 1579 unilook edema 1580 unilook estiv 1581 unilook etouf 1582 unilook euphon 1583 unilook fentan 1584 unilook fentanyl 1585 unilook /glob 1586 unilook gw 1587 unilook gwen 1588 unilook gyneco 1589 unilook hemochr 1590 unilook hemodialysis 1591 unilook hibern 1592 unilook hippodam 1593 unilook hippo | wc -l 1594 unilook hypogl 1595 unilook idyl 1596 unilook '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' 1597 unilook inscrut 1598 unilook ischemia 1599 unilook leucocy 1600 unilook leuc | wc -l 1601 unilook leukocy 1602 unilook leuk | wc -l 1603 unilook lighthead 1604 unilook lymp 1605 unilook lympho 1606 unilook lymphom 1607 unilook meningitis 1608 unilook '\N{oslash}' 1609 unilook oligu 1610 unilook oto 1611 unilook otot 1612 unilook overeat 1613 unilook overexert 1614 unilook overexertion 1615 unilook pacem 1616 unilook '(?\P{ASCII})\pL' 1617 unilook '(?=\P{ASCII})\pL' 1618 unilook pectoris 1619 unilook pheo 1620 unilook phonious 1621 unilook 'phonious\b' 1622 unilook '\pM' 1623 unilook -ppro 1624 unilook -ppro . 1625 unilook -Ppro . 1626 unilook -ppronoun . 1627 unilook -ppronoun wh 1628 unilook primum 1629 unilook pseudonormal 1630 unilook pulmon 1631 unilook rale 1632 unilook rheto 1633 unilook rosuv 1634 unilook secund 1635 unilook sputum 1636 unilook sum 1637 unilook tachyph 1638 unilook thiazol 1639 unilook thyrotox 1640 unilook tracheitis 1641 unilook uephon 1642 unilook uremia 1643 unilook -v . 1644 unilook -v activation 1645 unilook -v angina 1646 unilook -v ascite 1647 unilook vascul 1648 unilook -v '\boverexertion\b' 1649 unilook verna 1650 unilook vesnar 1651 unilook -v holter 1652 unilook -v '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' 1653 unilook -V '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' 1654 unilook -V '(?i)(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' 1655 unilook -V '(?i)(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}])\pL' 1656 unilook -v meningitis 1657 unilook -v 'overexertion' 1658 unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}])\pL' 1659 unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}])\pL' 1660 unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}]\pL' 1661 unilook -V '(?=\P{ASCII})\pL' 1662 unilook -v pneumoconiosis 1663 unilook -vz 'comeraderie' 1664 unilook wh 1665 unilook who 1666 unilook widespread 1667 unilook -z dofetilide 1668 unilook -z eplerenone 1669 unilook -z pectoris 1670 unilook -z sulfoxide 1671 unilook -zv anterolateral 1672 unilook -zv holter 1673 unilook -zv metformin 1674