1% This is for checking how the underlying Lisp copes with 2% printing (and exploding) strings and symbols that contain 3% multi-byte characters - ie utf-8 sequences for characters with 4% code over U+007f. 5 6% The output is a little tedious to decode, but this is intended to 7% illustrate a collection of cases both as regards the actual 8% output generated and the calculation of "output columns" and hence 9% the way in which lines get wrapped. 10 11% At present this is only expected to give even partly sensible 12% results with CSL. 13 14lisp; 15on echo; 16 17% test line overflow 18 19<< 20% This test dispays a sequence of characters in clumps of 7 interleaved 21% with numbers showing the output column that has been reached. The 22% three instances use a letter "a" which provides a simple reference case. 23% then there is a "pi", a "forall" symbol and a double-struck capical B: 24% those use two, three and four bytes. Note that the #Bopf; may not be 25% available in the font you use unless it is somewhat specialised. 26% Note that #Zopf;, #Qopf; and #Ropf; are often used to denote the integers, 27% rationals an dreals, and that #Bopf; is a similar font effect. 28% 29% If things work well then the display should be similar in all cases, 30% both in term of the column values printed and the position 31% where line-breaks are inserted. If (eg) a sequence of utf-8 bytes ends up 32% counted as multiple "columns" that could lead to differences. 33% 34% First try printing strings. 35 linelength 72; 36 terpri(); terpri(); 37 prin2 "Check linelength effect with strings"; 38 terpri(); 39 prin2 ".. each of the following 4 blocks should show the sama layout"; 40 foreach x in list("a", "#pi;", "#ForAll;", "#Bopf;") do << 41 terpri(); 42 for i := 1:11 do << 43 for j := 1:7 do prin2 x; 44 prin2 posn() >> >>; 45 terpri(); terpri(); 46% Now the same but printing symbols (using prin2). 47 prin2 "Check linelength effect with symbols"; 48 terpri(); 49 prin2 ".. each of the following 4 blocks should show the sama layout"; 50 foreach x in list('a, '#pi;, '#ForAll;, '#Bopf;) do << 51 terpri(); 52 for i := 1:11 do << 53 for j := 1:7 do prin2 x; 54 prin2 posn() >> >>; 55 56% This section uses prin1 and variations on explode to process first strings 57% and then symbols with various contents. For prin1 the requirement is that 58% the output be re-inputable. 59% The string here is intended to contain a jolly mix of potential issues. 60 w1 := "2AbCd #pi; #ForAll; #Bopf; #hash;pi; #quot; #gamma; #Gamma;"; 61 foreach x in list(w1, intern w1) do << 62 terpri(); 63 prin2 "Test using "; 64 if stringp x then prin2 "strings" else prin2 "symbols"; 65 terpri(); 66% prin2 is used just to display the information "naturally" (at least 67% if you have an utf-8 capable terminal with enough fonts installed. 68 prin2 "Raw: "; prin2 x; print posn(); 69 70% prin1 should generate re-inputable material, and to assure that it 71% renders extended characters as hex sequence such as "#1234;". Within a 72% string if such a sequence literally occured then the initial "#" is expanded 73% to be "#hash;". In strings any double quote mark is doubled, while in 74% symbols special characters are preceeded by an exclamation mark. 75 prin2 "prin1: "; prin1 x; print posn(); 76 77% explode2 should be rather like prin2 except that it generates a list of 78% characters. Note that this means that multi-byte sequences in the data will 79% need to be rendered as single multi-byte character objects. E.g. 80% explode2 "#alpha;" => (#alpha;), a list of length 1. 81% spaces) it must explode2 as 82 prin2 "explode2: "; prin1 explode2 x; print posn(); 83 84% explode is like prin1 except that it can end up with extended characters... 85% thus 86% explode "#alpha;" => (!" !#alpha; !"), a list of length 3. The only joker 87% here is that if the string contains a literal sequence "# w o r d ;" (without 88% the spaces) then that has to end up as (!" !# h a s h !; w o r d !; !") 89% so it can be re-inputable. 90 prin2 "explode: "; prin1 explode x; print posn(); 91% explodecn is like explodec but returns a list of the numeric codes of 92% the characters involved. E.g. 93% explodecn "#alpha;" => (945) 94 princ "explodecn: "; prin1 explodecn x; print posn(); 95% exploden is like explode but returns a list of integer codes. 96% Note some codes can be bigger than 0xff. 97 princ "exploden: "; prin1 exploden x; print posn(); 98% explode2uc (and explode2lc, explode2ucn, explode2lcn) are like 99% explode2 except that they folds characters to upper or lower case. 100% There are two issues here. The first is whether #alpha; will change to 101% #Alpha; (and similarly for all other non-Latin letters), the second 102% is that the names for special characters will need to retain their 103% regular case, so for instance #Alpha; must appear not #ALPHA; even 104% after conversion to upper case. If in fact extended characters are 105% printed in hex not using names much of that worry evaporates. 106% In some - perhaps all - locales only a-x and A-Z will be changed 107% by case folding... 108 princ "explode2uc: "; prin1 explode2uc x; print posn(); 109 princ "explode2lc: "; prin1 explode2lc x; print posn() >>; 110 terpri() >>; 111 112end; 113