12286d8edStholoIt would be nice if the RCS file format (which is implemented by a 2780d15dfSthologreat many tools, both free and non-free, both by calling GNU RCS and 3780d15dfStholoby reimplementing access to RCS files) were documented in some 4780d15dfStholostandard separate from any one tool. But as far as I know no such 5780d15dfStholostandard exists. Hence this file. 6780d15dfStholo 7780d15dfStholoThe place to start is the rcsfile.5 manpage in the GNU RCS 5.7 8780d15dfStholodistribution. Then look at the diff at the end of this file (which 9780d15dfStholocontains a few fixes and clarifications to that manpage). 10780d15dfStholo 11780d15dfStholoIf you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a 12780d15dfStholocomment about their date format. However, as far as we know there 13780d15dfStholoisn't really any document describing MKS's changes to the RCS file 14780d15dfStholoformat. 15780d15dfStholo 16780d15dfStholoThe rcsfile.5 manpage does not document what goes in the "text" field 17780d15dfStholofor each revision. The answer is that the head revision contains the 18780d15dfStholocontents of that revision and every other revision contain a bunch of 19780d15dfStholoedits to produce that revision ("a" and "d" lines). The GNU diff 20780d15dfStholomanual (the version I looked at was for GNU diff 2.4) documents this 21780d15dfStholoformat somewhat (as the "RCS output format"), but the presentation is 22780d15dfStholoa bit confusing as it is all tangled up with the documentation of 23780d15dfStholoseveral other output formats. If you just want some source code to 24780d15dfSthololook at, the part of CVS which applies these is RCS_deltas in 25780d15dfStholosrc/rcs.c. 26780d15dfStholo 27b6f6614eStholoThe rcsfile.5 documentation only _very_ briefly touches on the order 28b6f6614eStholoof the revisions. The order _is_ important and CVS relies on it. 29b6f6614eStholoHere is an example of what I was able to find, based on the join3 30b6f6614eStholosanity.sh testcase (and the behavior I am documenting here seems to be 31b6f6614eStholothe same for RCS 5.7 and CVS 1.9.27): 32b6f6614eStholo 33b6f6614eStholo 1.1 -----------------> 1.2 34b6f6614eStholo \---> 1.1.2.1 \---> 1.2.2.1 35b6f6614eStholo 36b6f6614eStholoHere is how this shows up in the RCS file (omitting irrelevant parts): 37b6f6614eStholo 38b6f6614eStholo admin: head 1.2; 39b6f6614eStholo deltas: 40b6f6614eStholo 1.2 branches 1.2.2.1; next 1.1; 41b6f6614eStholo 1.1 branches 1.1.2.1; next; 42b6f6614eStholo 1.1.2.1 branches; next; 43b6f6614eStholo 1.2.2.1 branches; next; 44b6f6614eStholo deltatexts: 45b6f6614eStholo 1.2 46b6f6614eStholo 1.2.2.1 47b6f6614eStholo 1.1 48b6f6614eStholo 1.1.2.1 49b6f6614eStholo 50b6f6614eStholoYes, the order seems to differ between the deltas and the deltatexts. 51b6f6614eStholoI have no idea how much of this should actually be considered part of 52b6f6614eStholothe RCS file format, and how much programs reading it should expect to 53b6f6614eStholoencounter any order. 54780d15dfStholo 552286d8edStholoThe rcsfile.5 grammar shows the {num} after "next" as optional; if it 562286d8edStholois omitted then there is no next delta node (for example 1.1 or the 572286d8edStholohead of a branch will typically have no next). 582286d8edStholo 59780d15dfStholoThere is one case where CVS uses CVS-specific, non-compatible changes 60780d15dfStholoto the RCS file format, and this is magic branches. See cvs.texinfo 61780d15dfStholofor more information on them. CVS also sets the RCS state to "dead" 62780d15dfStholoto indicate that a file does not exist in a given revision (this is 63780d15dfStholostored just as any other RCS state is). 64780d15dfStholo 652770ece5StholoThe RCS file format allows quite a variety of extensions to be added 662770ece5Stholoin a compatible manner by use of the "newphrase" feature documented in 672770ece5Stholorcsfile.5. We won't try to document extensions not used by CVS in any 682770ece5Stholodetail, but we will briefly list them. Each occurrence of a newphrase 692770ece5Stholobegins with an identifier, which is what we list here. Future 702770ece5Stholodesigners of extensions are strongly encouraged to pick 712770ece5Stholonon-conflicting identifiers. Note that newphrase occurs several 722770ece5Stholoplaces in the RCS grammar, and a given extension may not be legal in 732770ece5Stholoall locations. However, it seems better to reserve a particular 742770ece5Stholoidentifier for all locations, to avoid confusion and complicated 752770ece5Stholorules. 762770ece5Stholo 772770ece5Stholo Identifier Used by 782770ece5Stholo ---------- ------- 792770ece5Stholo namespace RCS library done at Silicon Graphics Inc. (SGI) in 1996 802770ece5Stholo (a modified RCS 5.7--not sure it has any other name). 812770ece5Stholo dead A set of RCS patches developed by Rich Pixley at 822286d8edStholo Cygnus about 1992. These were for CVS, and predated 832286d8edStholo the current CVS death support, which uses a state "dead" 842286d8edStholo rather than a "dead" newphrase. 852770ece5Stholo 86b6f6614eStholoCVS does use newphrases to implement the `PreservePermissions' 87b6f6614eStholoextension introduced in CVS 1.9.26. The following new keywords are 88b6f6614eStholodefined when PreservePermissions=yes: 89b6f6614eStholo 90b6f6614eStholo owner 91b6f6614eStholo group 92b6f6614eStholo permissions 93b6f6614eStholo special 94b6f6614eStholo symlink 95b6f6614eStholo hardlinks 96b6f6614eStholo 97b6f6614eStholoThe contents of the `owner' and `group' field should be a numeric uid 98b6f6614eStholoand a numeric gid, respectively, representing the user and group who 99b6f6614eStholoown the file. The `permissions' field contains an octal integer, 100b6f6614eStholorepresenting the permissions that should be applied to the file. The 101b6f6614eStholo`special' field contains two words; the first must be either `block' 102b6f6614eStholoor `character', and the second is the file's device number. The 103b6f6614eStholo`symlink' field should be present only in files which are symbolic 104b6f6614eSthololinks to other files, and absent on all regular files. The 105b6f6614eStholo`hardlinks' field contains a list of filenames to which the current 106b6f6614eStholofile is linked, in alphabetical order. Because files often contain 107b6f6614eStholocharacters special to RCS, like `.' and sometimes even contain spaces 108b6f6614eStholoor eight-bit characters, the filenames in the hardlinks field will 109b6f6614eStholousually be enclosed in RCS strings. For example: 110b6f6614eStholo 111b6f6614eStholo hardlinks README @install.txt@ @Installation Notes@; 112b6f6614eStholo 113b6f6614eStholoThe hardlinks field should always include the name of the current 114b6f6614eStholofile. That is, in the repository file README,v, any hardlinks fields 115b6f6614eStholoin the delta nodes should include `README'; CVS will not operate 116b6f6614eStholoproperly if this is not done. 117b6f6614eStholo 118*c133e2caSjcsNewphrases are also used to implement the 'commitid' feature. The 119*c133e2caSjcsfollowing new keyword is defined: 120*c133e2caSjcs 121*c133e2caSjcs commitid 122*c133e2caSjcs 123b6c02222StholoThe rules regarding keyword expansion are not documented along with 124b6c02222Stholothe rest of the RCS file format; they are documented in the co(1) 125b6c02222Stholomanpage in the RCS 5.7 distribution. See also the "Keyword 126b6c02222Stholosubstitution" chapter of cvs.texinfo. The co(1) manpage refers to 127b6c02222Stholospecial behavior if the log prefix for the $Log keyword is /* or (*. 128b6c02222StholoRCS 5.7 produces a warning whenever it behaves that way, and current 129b6c02222Stholoversions of CVS do not handle this case in a special way (CVS 1.9 and 130b6c02222Stholoearlier invoke RCS to perform keyword expansion). 131b6c02222Stholo 1322286d8edStholoNote that if the "expand" keyword is omitted from the RCS file, the 1332286d8edStholodefault is "kv". 1342286d8edStholo 135b6c02222StholoNote that the "comment {string};" syntax from rcsfile.5 specifies a 136b6c02222Stholocomment leader, which affects expansion of the $Log keyword for old 137b6c02222Stholoversions of RCS. The comment leader is not used by RCS 5.7 or current 138b6c02222Stholoversions of CVS. 139b6c02222Stholo 140b6c02222StholoBoth RCS 5.7 and current versions of CVS handle the $Log keyword in a 141b6c02222Stholodifferent way if the log message starts with "checked in with -k by ". 142b6c02222StholoI don't think this behavior is documented anywhere. 143b6c02222Stholo 144c71bc7e2StholoHere is a clarification regarding characters versus bytes in certain 145c71bc7e2Stholocharacter sets like JIS and Big5: 146c71bc7e2Stholo 147c71bc7e2Stholo The RCS file format, as described in the rcsfile(5) man page, is 148c71bc7e2Stholo actually byte-oriented, not character-oriented, despite hints to 149c71bc7e2Stholo the contrary in the man page. This distinction is important for 150c71bc7e2Stholo multibyte characters. For example, if a multibyte character 151c71bc7e2Stholo contains a `@' byte, the `@' must be doubled within strings in RCS 152c71bc7e2Stholo files, since RCS uses `@' bytes as escapes. 153c71bc7e2Stholo 154c71bc7e2Stholo This point is not an issue for encodings like ISO 8859, which do 155c71bc7e2Stholo not have multibyte characters. Nor is it an issue for encodings 156c71bc7e2Stholo like UTF-8 and EUC-JIS, which never uses ASCII bytes within a 157c71bc7e2Stholo multibyte character. It is an issue only for multibyte encodings 158c71bc7e2Stholo like JIS and BIG5, which _do_ usurp ASCII bytes. 159c71bc7e2Stholo 160c71bc7e2Stholo If `@' doubling occurs within a multibyte char, the resulting RCS 161c71bc7e2Stholo file is not a properly encoded text file. Instead, it is a byte 162c71bc7e2Stholo stream that does not use a consistent character encoding that can 163c71bc7e2Stholo be understood by the usual text tools, since doubling `@' messes 164c71bc7e2Stholo up the encoding. This point affects only programs that examine 165c71bc7e2Stholo the RCS files -- it doesn't affect the external RCS interface, as 166c71bc7e2Stholo the RCS commands always give you the properly encoded text files 167c71bc7e2Stholo and logs (assuming that you always check in properly encoded 168c71bc7e2Stholo text). 169c71bc7e2Stholo 170c71bc7e2Stholo CVS 1.10 (and earlier) probably has some bugs in this area on 171c71bc7e2Stholo systems where a C "char" is signed and where the data contains 172c71bc7e2Stholo bytes with the eighth bit set. 173c71bc7e2Stholo 1742770ece5StholoOne common concern about the RCS file format is the fact that to get 1752770ece5Stholothe head of a branch, one must apply deltas from the head of the trunk 1762770ece5Stholoto the branchpoint, and then from the branchpoint to the head of the 1772770ece5Stholobranch. While more detailed analyses might be worth doing, we will 1782770ece5Stholonote: 1792770ece5Stholo 1802770ece5Stholo * The performance bottleneck for CVS generally is figuring out which 1812770ece5Stholo files to operate on and that sort of thing, not applying deltas. 1822770ece5Stholo 1832770ece5Stholo * Here is one quick test (probably not a very good test; a better test 1842770ece5Stholo would use a normally sized file (say 50-200K) instead of a small one): 1852770ece5Stholo 1862770ece5Stholo I just did a quick test with a small file (on a Sun Ultra 1/170E 1872770ece5Stholo running Solaris 5.5.1), with 1000 revisions on the main branch and 1882770ece5Stholo 1000 revisions on branch that forked at the root (i.e., RCS revisions 1892770ece5Stholo 1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ..., 1902770ece5Stholo 1.1.1.1000). It took about 0.15 seconds real time to check in the 1912770ece5Stholo first revision, and about 0.6 seconds to check in and 0.3 seconds to 1922770ece5Stholo retrieve revision 1.1.1.1000 (the worst case). 1932770ece5Stholo 1942770ece5Stholo * Any attempt to "fix" this problem should be careful not to interfere 1952770ece5Stholo with other features, such as lightweight creation of branches 1962770ece5Stholo (particularly using CVS magic branches). 1972770ece5Stholo 198780d15dfStholoDiff follows: 199780d15dfStholo 200780d15dfStholo(Note that in the following diff the old value for the Id keyword was: 201780d15dfStholo Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp 202780d15dfStholoand the new one was: 203780d15dfStholo Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp 204780d15dfStholobut since this file itself might be subject to keyword expansion I 205780d15dfStholohaven't included a diff for that fact). 206780d15dfStholo 207780d15dfStholo=================================================================== 208780d15dfStholoRCS file: RCS/rcsfile.5in,v 209780d15dfStholoretrieving revision 5.6 210780d15dfStholoretrieving revision 5.7 211780d15dfStholodiff -u -r5.6 -r5.7 212780d15dfStholo--- rcsfile.5in 1995/06/05 08:28:35 5.6 213780d15dfStholo+++ rcsfile.5in 1996/12/09 17:31:44 5.7 214780d15dfStholo@@ -85,7 +85,8 @@ 215780d15dfStholo .LP 216780d15dfStholo \f2sym\fP ::= {\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}* 217780d15dfStholo .LP 218780d15dfStholo-\f2idchar\fP ::= any visible graphic character except \f2special\fP 219780d15dfStholo+\f2idchar\fP ::= any visible graphic character, 220780d15dfStholo+ except \f2digit\fP or \f2special\fP 221780d15dfStholo .LP 222780d15dfStholo \f2special\fP ::= \f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP 223780d15dfStholo .LP 224780d15dfStholo@@ -119,12 +120,23 @@ 225780d15dfStholo the minute (00\-59), 226780d15dfStholo and 227780d15dfStholo .I ss 228780d15dfStholo-the second (00\-60). 229780d15dfStholo+the second (00\-59). 230780d15dfStholo+If 231780d15dfStholo .I Y 232780d15dfStholo-contains just the last two digits of the year 233780d15dfStholo-for years from 1900 through 1999, 234780d15dfStholo-and all the digits of years thereafter. 235780d15dfStholo-Dates use the Gregorian calendar; times use UTC. 236780d15dfStholo+contains exactly two digits, 237780d15dfStholo+they are the last two digits of a year from 1900 through 1999; 238780d15dfStholo+otherwise, 239780d15dfStholo+.I Y 240780d15dfStholo+contains all the digits of the year. 241780d15dfStholo+Dates use the Gregorian calendar. 242780d15dfStholo+Times use UTC, except that for portability's sake leap seconds are not allowed; 243780d15dfStholo+implementations that support leap seconds should output 244780d15dfStholo+.B 59 245780d15dfStholo+for 246780d15dfStholo+.I ss 247780d15dfStholo+during an inserted leap second, and should accept 248780d15dfStholo+.B 59 249780d15dfStholo+for a deleted leap second. 250780d15dfStholo .PP 251780d15dfStholo The 252780d15dfStholo .I newphrase 253780d15dfStholo@@ -144,16 +156,23 @@ 254780d15dfStholo field in order of decreasing numbers. 255780d15dfStholo The 256780d15dfStholo .B head 257780d15dfStholo-field in the 258780d15dfStholo-.I admin 259780d15dfStholo-node points to the head of that sequence (i.e., contains 260780d15dfStholo+field points to the head of that sequence (i.e., contains 261780d15dfStholo the highest pair). 262780d15dfStholo The 263780d15dfStholo .B branch 264780d15dfStholo-node in the admin node indicates the default 265780d15dfStholo+field indicates the default 266780d15dfStholo branch (or revision) for most \*r operations. 267780d15dfStholo If empty, the default 268780d15dfStholo branch is the highest branch on the trunk. 269780d15dfStholo+The 270780d15dfStholo+.B symbols 271780d15dfStholo+field associates symbolic names with revisions. 272780d15dfStholo+For example, if the file contains 273780d15dfStholo+.B "symbols rr:1.1;" 274780d15dfStholo+then 275780d15dfStholo+.B rr 276780d15dfStholo+is a name for revision 277780d15dfStholo+.BR 1.1 . 278780d15dfStholo .PP 279780d15dfStholo All 280780d15dfStholo .I delta 281780d15dfStholo 282