1It would be nice if the RCS file format (which is implemented by a 2great many tools, both free and non-free, both by calling GNU RCS and 3by reimplementing access to RCS files) were documented in some 4standard separate from any one tool. But as far as I know no such 5standard exists. Hence this file. 6 7The place to start is the rcsfile.5 manpage in the GNU RCS 5.7 8distribution. Then look at the diff at the end of this file (which 9contains a few fixes and clarifications to that manpage). 10 11If you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a 12comment about their date format. However, as far as we know there 13isn't really any document describing MKS's changes to the RCS file 14format. 15 16The rcsfile.5 manpage does not document what goes in the "text" field 17for each revision. The answer is that the head revision contains the 18contents of that revision and every other revision contain a bunch of 19edits to produce that revision ("a" and "d" lines). The GNU diff 20manual (the version I looked at was for GNU diff 2.4) documents this 21format somewhat (as the "RCS output format"), but the presentation is 22a bit confusing as it is all tangled up with the documentation of 23several other output formats. If you just want some source code to 24look at, the part of CVS which applies these is RCS_deltas in 25src/rcs.c. 26 27The rcsfile.5 documentation only _very_ briefly touches on the order 28of the revisions. The order _is_ important and CVS relies on it. 29Here is an example of what I was able to find, based on the join3 30sanity.sh testcase (and the behavior I am documenting here seems to be 31the same for RCS 5.7 and CVS 1.9.27): 32 33 1.1 -----------------> 1.2 34 \---> 1.1.2.1 \---> 1.2.2.1 35 36Here is how this shows up in the RCS file (omitting irrelevant parts): 37 38 admin: head 1.2; 39 deltas: 40 1.2 branches 1.2.2.1; next 1.1; 41 1.1 branches 1.1.2.1; next; 42 1.1.2.1 branches; next; 43 1.2.2.1 branches; next; 44 deltatexts: 45 1.2 46 1.2.2.1 47 1.1 48 1.1.2.1 49 50Yes, the order seems to differ between the deltas and the deltatexts. 51I have no idea how much of this should actually be considered part of 52the RCS file format, and how much programs reading it should expect to 53encounter any order. 54 55The rcsfile.5 grammar shows the {num} after "next" as optional; if it 56is omitted then there is no next delta node (for example 1.1 or the 57head of a branch will typically have no next). 58 59There is one case where CVS uses CVS-specific, non-compatible changes 60to the RCS file format, and this is magic branches. See cvs.texinfo 61for more information on them. CVS also sets the RCS state to "dead" 62to indicate that a file does not exist in a given revision (this is 63stored just as any other RCS state is). 64 65The RCS file format allows quite a variety of extensions to be added 66in a compatible manner by use of the "newphrase" feature documented in 67rcsfile.5. We won't try to document extensions not used by CVS in any 68detail, but we will briefly list them. Each occurrence of a newphrase 69begins with an identifier, which is what we list here. Future 70designers of extensions are strongly encouraged to pick 71non-conflicting identifiers. Note that newphrase occurs several 72places in the RCS grammar, and a given extension may not be legal in 73all locations. However, it seems better to reserve a particular 74identifier for all locations, to avoid confusion and complicated 75rules. 76 77 Identifier Used by 78 ---------- ------- 79 namespace RCS library done at Silicon Graphics Inc. (SGI) in 1996 80 (a modified RCS 5.7--not sure it has any other name). 81 dead A set of RCS patches developed by Rich Pixley at 82 Cygnus about 1992. These were for CVS, and predated 83 the current CVS death support, which uses a state "dead" 84 rather than a "dead" newphrase. 85 86CVS does use newphrases to implement the `PreservePermissions' 87extension introduced in CVS 1.9.26. The following new keywords are 88defined when PreservePermissions=yes: 89 90 owner 91 group 92 permissions 93 special 94 symlink 95 hardlinks 96 97The contents of the `owner' and `group' field should be a numeric uid 98and a numeric gid, respectively, representing the user and group who 99own the file. The `permissions' field contains an octal integer, 100representing the permissions that should be applied to the file. The 101`special' field contains two words; the first must be either `block' 102or `character', and the second is the file's device number. The 103`symlink' field should be present only in files which are symbolic 104links to other files, and absent on all regular files. The 105`hardlinks' field contains a list of filenames to which the current 106file is linked, in alphabetical order. Because files often contain 107characters special to RCS, like `.' and sometimes even contain spaces 108or eight-bit characters, the filenames in the hardlinks field will 109usually be enclosed in RCS strings. For example: 110 111 hardlinks README @install.txt@ @Installation Notes@; 112 113The hardlinks field should always include the name of the current 114file. That is, in the repository file README,v, any hardlinks fields 115in the delta nodes should include `README'; CVS will not operate 116properly if this is not done. 117 118Newphrases are also used to implement the 'commitid' feature. The 119following new keyword is defined: 120 121 commitid 122 123The rules regarding keyword expansion are not documented along with 124the rest of the RCS file format; they are documented in the co(1) 125manpage in the RCS 5.7 distribution. See also the "Keyword 126substitution" chapter of cvs.texinfo. The co(1) manpage refers to 127special behavior if the log prefix for the $Log keyword is /* or (*. 128RCS 5.7 produces a warning whenever it behaves that way, and current 129versions of CVS do not handle this case in a special way (CVS 1.9 and 130earlier invoke RCS to perform keyword expansion). 131 132Note that if the "expand" keyword is omitted from the RCS file, the 133default is "kv". 134 135Note that the "comment {string};" syntax from rcsfile.5 specifies a 136comment leader, which affects expansion of the $Log keyword for old 137versions of RCS. The comment leader is not used by RCS 5.7 or current 138versions of CVS. 139 140Both RCS 5.7 and current versions of CVS handle the $Log keyword in a 141different way if the log message starts with "checked in with -k by ". 142I don't think this behavior is documented anywhere. 143 144Here is a clarification regarding characters versus bytes in certain 145character sets like JIS and Big5: 146 147 The RCS file format, as described in the rcsfile(5) man page, is 148 actually byte-oriented, not character-oriented, despite hints to 149 the contrary in the man page. This distinction is important for 150 multibyte characters. For example, if a multibyte character 151 contains a `@' byte, the `@' must be doubled within strings in RCS 152 files, since RCS uses `@' bytes as escapes. 153 154 This point is not an issue for encodings like ISO 8859, which do 155 not have multibyte characters. Nor is it an issue for encodings 156 like UTF-8 and EUC-JIS, which never uses ASCII bytes within a 157 multibyte character. It is an issue only for multibyte encodings 158 like JIS and BIG5, which _do_ usurp ASCII bytes. 159 160 If `@' doubling occurs within a multibyte char, the resulting RCS 161 file is not a properly encoded text file. Instead, it is a byte 162 stream that does not use a consistent character encoding that can 163 be understood by the usual text tools, since doubling `@' messes 164 up the encoding. This point affects only programs that examine 165 the RCS files -- it doesn't affect the external RCS interface, as 166 the RCS commands always give you the properly encoded text files 167 and logs (assuming that you always check in properly encoded 168 text). 169 170 CVS 1.10 (and earlier) probably has some bugs in this area on 171 systems where a C "char" is signed and where the data contains 172 bytes with the eighth bit set. 173 174One common concern about the RCS file format is the fact that to get 175the head of a branch, one must apply deltas from the head of the trunk 176to the branchpoint, and then from the branchpoint to the head of the 177branch. While more detailed analyses might be worth doing, we will 178note: 179 180 * The performance bottleneck for CVS generally is figuring out which 181 files to operate on and that sort of thing, not applying deltas. 182 183 * Here is one quick test (probably not a very good test; a better test 184 would use a normally sized file (say 50-200K) instead of a small one): 185 186 I just did a quick test with a small file (on a Sun Ultra 1/170E 187 running Solaris 5.5.1), with 1000 revisions on the main branch and 188 1000 revisions on branch that forked at the root (i.e., RCS revisions 189 1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ..., 190 1.1.1.1000). It took about 0.15 seconds real time to check in the 191 first revision, and about 0.6 seconds to check in and 0.3 seconds to 192 retrieve revision 1.1.1.1000 (the worst case). 193 194 * Any attempt to "fix" this problem should be careful not to interfere 195 with other features, such as lightweight creation of branches 196 (particularly using CVS magic branches). 197 198Diff follows: 199 200(Note that in the following diff the old value for the Id keyword was: 201 Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp 202and the new one was: 203 Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp 204but since this file itself might be subject to keyword expansion I 205haven't included a diff for that fact). 206 207=================================================================== 208RCS file: RCS/rcsfile.5in,v 209retrieving revision 5.6 210retrieving revision 5.7 211diff -u -r5.6 -r5.7 212--- rcsfile.5in 1995/06/05 08:28:35 5.6 213+++ rcsfile.5in 1996/12/09 17:31:44 5.7 214@@ -85,7 +85,8 @@ 215 .LP 216 \f2sym\fP ::= {\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}* 217 .LP 218-\f2idchar\fP ::= any visible graphic character except \f2special\fP 219+\f2idchar\fP ::= any visible graphic character, 220+ except \f2digit\fP or \f2special\fP 221 .LP 222 \f2special\fP ::= \f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP 223 .LP 224@@ -119,12 +120,23 @@ 225 the minute (00\-59), 226 and 227 .I ss 228-the second (00\-60). 229+the second (00\-59). 230+If 231 .I Y 232-contains just the last two digits of the year 233-for years from 1900 through 1999, 234-and all the digits of years thereafter. 235-Dates use the Gregorian calendar; times use UTC. 236+contains exactly two digits, 237+they are the last two digits of a year from 1900 through 1999; 238+otherwise, 239+.I Y 240+contains all the digits of the year. 241+Dates use the Gregorian calendar. 242+Times use UTC, except that for portability's sake leap seconds are not allowed; 243+implementations that support leap seconds should output 244+.B 59 245+for 246+.I ss 247+during an inserted leap second, and should accept 248+.B 59 249+for a deleted leap second. 250 .PP 251 The 252 .I newphrase 253@@ -144,16 +156,23 @@ 254 field in order of decreasing numbers. 255 The 256 .B head 257-field in the 258-.I admin 259-node points to the head of that sequence (i.e., contains 260+field points to the head of that sequence (i.e., contains 261 the highest pair). 262 The 263 .B branch 264-node in the admin node indicates the default 265+field indicates the default 266 branch (or revision) for most \*r operations. 267 If empty, the default 268 branch is the highest branch on the trunk. 269+The 270+.B symbols 271+field associates symbolic names with revisions. 272+For example, if the file contains 273+.B "symbols rr:1.1;" 274+then 275+.B rr 276+is a name for revision 277+.BR 1.1 . 278 .PP 279 All 280 .I delta 281 282