1%
2% Affixes get stripped off the left and right side of words
3% i.e. spaces are inserted between the affix and the word itself.
4%
5% Some of the funky UTF-8 parenthesis are used in Asian texts.
6% In order to allow single straight quote ' and double straight quote ''
7% to be stripped off from both the left and the right, they are
8% distinguished by the suffix .x and .y (as as Mr.x Mrs.x or Jr.y Sr.y)
9%
10% 。is an end-of-sentence marker used in Japanese texts.
11
12% Punctuation appearing on the right-side of words.
13% Note: the ellipsis ....y must appear *before* the dot ".", else the
14% splitting won't work right.
15")" "}" "]" ">" "".y" » 〉 ) 〕 》 】 ] 』 」 "’’" "’" “ ''.y '.y `.y
16"%" "," ....y "." 。.y ‧ ":" ";" "?" "!" ‽ ؟ ? ! ….y "”" ━.y –.y ー.y ‐.y 、.y
17~ ¢ ₵ ™ ℠
18  : RPUNC+;
19
20% Punctuation appearing on the left-side of words.
21"(" "{" "[" "<" "".x" « 〈 ( 〔 《 【 [ 『 「 、.x `.x `` „ ‘ ''.x '.x ….x ....x
22¿ ¡ "$" US$ USD C$
23£ ₤ € ¤ ₳ ฿ ₡ ₢ ₠ ₫ ৳ ƒ ₣ ₲ ₴ ₭ ₺  ℳ  ₥ ₦ ₧ ₱ ₰ ₹ ₨ ₪ ﷼ ₸ ₮ ₩ ¥ ៛ 호점
24† †† ‡ § ¶ © ® ℗ № "#"
25* • ⁂ ❧ ☞ ◊ ※  ○  。.x ゜ ✿ ☆ * ◕ ● ∇ □ ◇ @ ◎
26–.x ━.x ー.x -- - ‧.x
27  : LPUNC+;
28
29
30% The below is a quoted list, used during tokenization. Do NOT put
31% spaces in between the various quotation marks!!
32""«»《》【】『』`„“": QUOTES+;
33
34% The below is a quoted list, used during tokenization. Do NOT put
35% spaces in between the various symbols!!
36"()¿¡†‡§¶©®℗№#*•⁂❧☞◊※○。゜✿☆*◕●∇□◇@◎–━ー---‧": BULLETS+;
37