1The *.txt files were copied from 2 3 ftp://www.unicode.org/Public/UNIDATA 4 5with subdirectories 'extracted' and 'auxiliary' 6 7The Unihan files were not included due to space considerations. Also NOT 8included were any *.html files. It is possible to add the Unihan files, and 9edit mktables (see instructions near its beginning) to look at them. 10 11The file 'version' should exist and be a single line with the Unicode version, 12like: 135.2.0 14 15To be 8.3 filesystem friendly, the names of some of the input files have been 16changed from the values that are in the Unicode DB. Not all of the Test files 17are currently used, so may not be present, so some of the mv's can fail. The 18.html Test files are not touched. 19 20mv PropertyValueAliases.txt PropValueAliases.txt 21mv NamedSequencesProv.txt NamedSqProv.txt 22mv DerivedAge.txt DAge.txt 23mv DerivedCoreProperties.txt DCoreProperties.txt 24mv DerivedNormalizationProps.txt DNormalizationProps.txt 25mv extracted/DerivedBidiClass.txt extracted/DBidiClass.txt 26mv extracted/DerivedBinaryProperties.txt extracted/DBinaryProperties.txt 27mv extracted/DerivedCombiningClass.txt extracted/DCombiningClass.txt 28mv extracted/DerivedDecompositionType.txt extracted/DDecompositionType.txt 29mv extracted/DerivedEastAsianWidth.txt extracted/DEastAsianWidth.txt 30mv extracted/DerivedGeneralCategory.txt extracted/DGeneralCategory.txt 31mv extracted/DerivedJoiningGroup.txt extracted/DJoinGroup.txt 32mv extracted/DerivedJoiningType.txt extracted/DJoinType.txt 33mv extracted/DerivedLineBreak.txt extracted/DLineBreak.txt 34mv extracted/DerivedNumericType.txt extracted/DNumType.txt 35mv extracted/DerivedNumericValues.txt extracted/DNumValues.txt 36 37mv auxiliary/GraphemeBreakTest.txt auxiliary/GCBTest.txt 38mv auxiliary/LineBreakTest.txt auxiliary/LBTest.txt 39mv auxiliary/SentenceBreakTest.txt auxiliary/SBTest.txt 40mv auxiliary/WordBreakTest.txt auxiliary/WBTest.txt 41 42If you have the Unihan database (5.2 and above), you should also do the 43following: 44 45mv Unihan_DictionaryIndices.txt UnihanIndicesDictionary.txt 46mv Unihan_DictionaryLikeData.txt UnihanDataDictionaryLike.txt 47mv Unihan_IRGSources.txt UnihanIRGSources.txt 48mv Unihan_NumericValues.txt UnihanNumericValues.txt 49mv Unihan_OtherMappings.txt UnihanOtherMappings.txt 50mv Unihan_RadicalStrokeCounts.txt UnihanRadicalStrokeCounts.txt 51mv Unihan_Readings.txt UnihanReadings.txt 52mv Unihan_Variants.txt UnihanVariants.txt 53 54If you download everything, the names of files that are not used by mktables 55are not changed by the above, and will not work correctly as-is on 8.3 56filesystems. 57 58mktables is used to generate the tables used by the rest of Perl. It will warn 59you about any *.txt files in the directory substructure that it doesn't know 60about. You should remove any so-identified, or edit mktables to add them to 61its lists to process. You can run 62 63 mktables -globlist 64 65to have it try to process these tables generically. 66 67FOR PUMPKINS 68 69The files are inter-related. If you take the latest UnicodeData.txt, for 70example, but leave the older versions of other files, there can be subtle 71problems. So get everything available from Unicode, and delete those which 72aren't needed. 73 74When moving to a new version of Unicode, you need to update 'version' by hand 75 76 p4 edit version 77 ... 78 79You should look in the Unicode release notes (which are probably towards the 80bottom of http://www.unicode.org/reports/tr44/) to see if any properties have 81newly been moved to be Obsolete, Deprecated, or Stabilized. The full names for 82these should be added to the respective lists near the beginning of mktables, 83using an 'if' to add them for just this Unicode version going forward, so that 84mktables can continue to be used for earlier Unicode versions. 85 86When putting out a new Perl release, think about if any of the Deprecated 87properties should be moved to Suppressed. 88 89perlrecharclass.pod has a list of all the characters that are white space, 90which needs to be updated if there are changes. A quick way to check if there 91have been changes would be to see if the number of such characters listed in 92perluniprops.pod (generated by running mktables) for the property 93\p{White_Space} is no longer 26. Further investigation would then be necessary 94to classify the new characters as horizontal and vertical. 95 96The code in regexec.c for the \X match construct is intimately tied to the 97regular expression in UAX #29 (http://www.unicode.org/reports/tr29/). You 98should see if it has changed, and if so regexec.c should be modified. The 99current one is 100( CRLF 101| Prepend* ( Hangul-syllable | !Control ) 102 ( Grapheme_Extend | Spacing_Mark)* 103| . ) 104 105mktables has many checks to warn you if there are unexpected or novel things 106that it doesn't know how to handle. 107 108Finally: 109 110 p4 submit 111 112-- 113jhi@iki.fi; updated by nick@ccl4.org, public@khwilliamson.com 114