1It would be nice if the RCS file format (which is implemented by a
2great many tools, both free and non-free, both by calling GNU RCS and
3by reimplementing access to RCS files) were documented in some
4standard separate from any one tool.  But as far as I know no such
5standard exists.  Hence this file.
6
7The place to start is the rcsfile.5 manpage in the GNU RCS 5.7
8distribution.  Then look at the diff at the end of this file (which
9contains a few fixes and clarifications to that manpage).
10
11If you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a
12comment about their date format.  However, as far as we know there
13isn't really any document describing MKS's changes to the RCS file
14format.
15
16The rcsfile.5 manpage does not document what goes in the "text" field
17for each revision.  The answer is that the head revision contains the
18contents of that revision and every other revision contain a bunch of
19edits to produce that revision ("a" and "d" lines).  The GNU diff
20manual (the version I looked at was for GNU diff 2.4) documents this
21format somewhat (as the "RCS output format"), but the presentation is
22a bit confusing as it is all tangled up with the documentation of
23several other output formats.  If you just want some source code to
24look at, the part of CVS which applies these is RCS_deltas in
25src/rcs.c.
26
27The rcsfile.5 documentation only _very_ briefly touches on the order
28of the revisions.  The order _is_ important and CVS relies on it.
29Here is an example of what I was able to find, based on the join3
30sanity.sh testcase (and the behavior I am documenting here seems to be
31the same for RCS 5.7 and CVS 1.9.27):
32
33    1.1 ----------------->  1.2
34     \---> 1.1.2.1           \---> 1.2.2.1
35
36Here is how this shows up in the RCS file (omitting irrelevant parts):
37
38  admin:  head 1.2;
39  deltas:
40    1.2 branches 1.2.2.1; next 1.1;
41    1.1 branches 1.1.2.1; next;
42    1.1.2.1 branches; next;
43    1.2.2.1 branches; next;
44  deltatexts:
45    1.2
46    1.2.2.1
47    1.1
48    1.1.2.1
49
50Yes, the order seems to differ between the deltas and the deltatexts.
51I have no idea how much of this should actually be considered part of
52the RCS file format, and how much programs reading it should expect to
53encounter any order.
54
55The rcsfile.5 grammar shows the {num} after "next" as optional; if it
56is omitted then there is no next delta node (for example 1.1 or the
57head of a branch will typically have no next).
58
59There is one case where CVS uses CVS-specific, non-compatible changes
60to the RCS file format, and this is magic branches.  See cvs.texinfo
61for more information on them.  CVS also sets the RCS state to "dead"
62to indicate that a file does not exist in a given revision (this is
63stored just as any other RCS state is).
64
65The RCS file format allows quite a variety of extensions to be added
66in a compatible manner by use of the "newphrase" feature documented in
67rcsfile.5.  We won't try to document extensions not used by CVS in any
68detail, but we will briefly list them.  Each occurrence of a newphrase
69begins with an identifier, which is what we list here.  Future
70designers of extensions are strongly encouraged to pick
71non-conflicting identifiers.  Note that newphrase occurs several
72places in the RCS grammar, and a given extension may not be legal in
73all locations.  However, it seems better to reserve a particular
74identifier for all locations, to avoid confusion and complicated
75rules.
76
77   Identifier   Used by
78   ----------   -------
79   namespace    RCS library done at Silicon Graphics Inc. (SGI) in 1996
80                (a modified RCS 5.7--not sure it has any other name).
81   dead         A set of RCS patches developed by Rich Pixley at
82                Cygnus about 1992.  These were for CVS, and predated
83                the current CVS death support, which uses a state "dead"
84                rather than a "dead" newphrase.
85
86CVS does use newphrases to implement the `PreservePermissions'
87extension introduced in CVS 1.9.26.  The following new keywords are
88defined when PreservePermissions=yes:
89
90   owner
91   group
92   permissions
93   special
94   symlink
95   hardlinks
96
97The contents of the `owner' and `group' field should be a numeric uid
98and a numeric gid, respectively, representing the user and group who
99own the file.  The `permissions' field contains an octal integer,
100representing the permissions that should be applied to the file.  The
101`special' field contains two words; the first must be either `block'
102or `character', and the second is the file's device number.  The
103`symlink' field should be present only in files which are symbolic
104links to other files, and absent on all regular files.  The
105`hardlinks' field contains a list of filenames to which the current
106file is linked, in alphabetical order.  Because files often contain
107characters special to RCS, like `.' and sometimes even contain spaces
108or eight-bit characters, the filenames in the hardlinks field will
109usually be enclosed in RCS strings.  For example:
110
111	hardlinks	README @install.txt@ @Installation Notes@;
112
113The hardlinks field should always include the name of the current
114file.  That is, in the repository file README,v, any hardlinks fields
115in the delta nodes should include `README'; CVS will not operate
116properly if this is not done.
117
118Newphrases are also used to implement the 'commitid' feature. The
119following new keyword is defined:
120
121   commitid
122
123The rules regarding keyword expansion are not documented along with
124the rest of the RCS file format; they are documented in the co(1)
125manpage in the RCS 5.7 distribution.  See also the "Keyword
126substitution" chapter of cvs.texinfo.  The co(1) manpage refers to
127special behavior if the log prefix for the $Log keyword is /* or (*.
128RCS 5.7 produces a warning whenever it behaves that way, and current
129versions of CVS do not handle this case in a special way (CVS 1.9 and
130earlier invoke RCS to perform keyword expansion).
131
132Note that if the "expand" keyword is omitted from the RCS file, the
133default is "kv".
134
135Note that the "comment {string};" syntax from rcsfile.5 specifies a
136comment leader, which affects expansion of the $Log keyword for old
137versions of RCS.  The comment leader is not used by RCS 5.7 or current
138versions of CVS.
139
140Both RCS 5.7 and current versions of CVS handle the $Log keyword in a
141different way if the log message starts with "checked in with -k by ".
142I don't think this behavior is documented anywhere.
143
144Here is a clarification regarding characters versus bytes in certain
145character sets like JIS and Big5:
146
147    The RCS file format, as described in the rcsfile(5) man page, is
148    actually byte-oriented, not character-oriented, despite hints to
149    the contrary in the man page.  This distinction is important for
150    multibyte characters.  For example, if a multibyte character
151    contains a `@' byte, the `@' must be doubled within strings in RCS
152    files, since RCS uses `@' bytes as escapes.
153
154    This point is not an issue for encodings like ISO 8859, which do
155    not have multibyte characters.  Nor is it an issue for encodings
156    like UTF-8 and EUC-JIS, which never uses ASCII bytes within a
157    multibyte character.  It is an issue only for multibyte encodings
158    like JIS and BIG5, which _do_ usurp ASCII bytes.
159
160    If `@' doubling occurs within a multibyte char, the resulting RCS
161    file is not a properly encoded text file.  Instead, it is a byte
162    stream that does not use a consistent character encoding that can
163    be understood by the usual text tools, since doubling `@' messes
164    up the encoding.  This point affects only programs that examine
165    the RCS files -- it doesn't affect the external RCS interface, as
166    the RCS commands always give you the properly encoded text files
167    and logs (assuming that you always check in properly encoded
168    text).
169
170    CVS 1.10 (and earlier) probably has some bugs in this area on
171    systems where a C "char" is signed and where the data contains
172    bytes with the eighth bit set.
173
174One common concern about the RCS file format is the fact that to get
175the head of a branch, one must apply deltas from the head of the trunk
176to the branchpoint, and then from the branchpoint to the head of the
177branch.  While more detailed analyses might be worth doing, we will
178note:
179
180    * The performance bottleneck for CVS generally is figuring out which
181    files to operate on and that sort of thing, not applying deltas.
182
183    * Here is one quick test (probably not a very good test; a better test
184    would use a normally sized file (say 50-200K) instead of a small one):
185
186	I just did a quick test with a small file (on a Sun Ultra 1/170E
187	running Solaris 5.5.1), with 1000 revisions on the main branch and
188	1000 revisions on branch that forked at the root (i.e., RCS revisions
189	1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ...,
190	1.1.1.1000).  It took about 0.15 seconds real time to check in the
191	first revision, and about 0.6 seconds to check in and 0.3 seconds to
192	retrieve revision 1.1.1.1000 (the worst case).
193
194    * Any attempt to "fix" this problem should be careful not to interfere
195    with other features, such as lightweight creation of branches
196    (particularly using CVS magic branches).
197
198Diff follows:
199
200(Note that in the following diff the old value for the Id keyword was:
201    Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp
202and the new one was:
203    Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp
204but since this file itself might be subject to keyword expansion I
205haven't included a diff for that fact).
206
207===================================================================
208RCS file: RCS/rcsfile.5in,v
209retrieving revision 5.6
210retrieving revision 5.7
211diff -u -r5.6 -r5.7
212--- rcsfile.5in	1995/06/05 08:28:35	5.6
213+++ rcsfile.5in	1996/12/09 17:31:44	5.7
214@@ -85,7 +85,8 @@
215 .LP
216 \f2sym\fP	::=	{\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}*
217 .LP
218-\f2idchar\fP	::=	any visible graphic character except \f2special\fP
219+\f2idchar\fP	::=	any visible graphic character,
220+		except \f2digit\fP or \f2special\fP
221 .LP
222 \f2special\fP	::=	\f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP
223 .LP
224@@ -119,12 +120,23 @@
225 the minute (00\-59),
226 and
227 .I ss
228-the second (00\-60).
229+the second (00\-59).
230+If
231 .I Y
232-contains just the last two digits of the year
233-for years from 1900 through 1999,
234-and all the digits of years thereafter.
235-Dates use the Gregorian calendar; times use UTC.
236+contains exactly two digits,
237+they are the last two digits of a year from 1900 through 1999;
238+otherwise,
239+.I Y
240+contains all the digits of the year.
241+Dates use the Gregorian calendar.
242+Times use UTC, except that for portability's sake leap seconds are not allowed;
243+implementations that support leap seconds should output
244+.B 59
245+for
246+.I ss
247+during an inserted leap second, and should accept
248+.B 59
249+for a deleted leap second.
250 .PP
251 The
252 .I newphrase
253@@ -144,16 +156,23 @@
254 field in order of decreasing numbers.
255 The
256 .B head
257-field in the
258-.I admin
259-node points to the head of that sequence (i.e., contains
260+field points to the head of that sequence (i.e., contains
261 the highest pair).
262 The
263 .B branch
264-node in the admin node indicates the default
265+field indicates the default
266 branch (or revision) for most \*r operations.
267 If empty, the default
268 branch is the highest branch on the trunk.
269+The
270+.B symbols
271+field associates symbolic names with revisions.
272+For example, if the file contains
273+.B "symbols rr:1.1;"
274+then
275+.B rr
276+is a name for revision
277+.BR 1.1 .
278 .PP
279 All
280 .I delta
281
282