xref: /openbsd/gnu/usr.bin/cvs/doc/RCSFILES (revision c133e2ca)
12286d8edStholoIt would be nice if the RCS file format (which is implemented by a
2780d15dfSthologreat many tools, both free and non-free, both by calling GNU RCS and
3780d15dfStholoby reimplementing access to RCS files) were documented in some
4780d15dfStholostandard separate from any one tool.  But as far as I know no such
5780d15dfStholostandard exists.  Hence this file.
6780d15dfStholo
7780d15dfStholoThe place to start is the rcsfile.5 manpage in the GNU RCS 5.7
8780d15dfStholodistribution.  Then look at the diff at the end of this file (which
9780d15dfStholocontains a few fixes and clarifications to that manpage).
10780d15dfStholo
11780d15dfStholoIf you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a
12780d15dfStholocomment about their date format.  However, as far as we know there
13780d15dfStholoisn't really any document describing MKS's changes to the RCS file
14780d15dfStholoformat.
15780d15dfStholo
16780d15dfStholoThe rcsfile.5 manpage does not document what goes in the "text" field
17780d15dfStholofor each revision.  The answer is that the head revision contains the
18780d15dfStholocontents of that revision and every other revision contain a bunch of
19780d15dfStholoedits to produce that revision ("a" and "d" lines).  The GNU diff
20780d15dfStholomanual (the version I looked at was for GNU diff 2.4) documents this
21780d15dfStholoformat somewhat (as the "RCS output format"), but the presentation is
22780d15dfStholoa bit confusing as it is all tangled up with the documentation of
23780d15dfStholoseveral other output formats.  If you just want some source code to
24780d15dfSthololook at, the part of CVS which applies these is RCS_deltas in
25780d15dfStholosrc/rcs.c.
26780d15dfStholo
27b6f6614eStholoThe rcsfile.5 documentation only _very_ briefly touches on the order
28b6f6614eStholoof the revisions.  The order _is_ important and CVS relies on it.
29b6f6614eStholoHere is an example of what I was able to find, based on the join3
30b6f6614eStholosanity.sh testcase (and the behavior I am documenting here seems to be
31b6f6614eStholothe same for RCS 5.7 and CVS 1.9.27):
32b6f6614eStholo
33b6f6614eStholo    1.1 ----------------->  1.2
34b6f6614eStholo     \---> 1.1.2.1           \---> 1.2.2.1
35b6f6614eStholo
36b6f6614eStholoHere is how this shows up in the RCS file (omitting irrelevant parts):
37b6f6614eStholo
38b6f6614eStholo  admin:  head 1.2;
39b6f6614eStholo  deltas:
40b6f6614eStholo    1.2 branches 1.2.2.1; next 1.1;
41b6f6614eStholo    1.1 branches 1.1.2.1; next;
42b6f6614eStholo    1.1.2.1 branches; next;
43b6f6614eStholo    1.2.2.1 branches; next;
44b6f6614eStholo  deltatexts:
45b6f6614eStholo    1.2
46b6f6614eStholo    1.2.2.1
47b6f6614eStholo    1.1
48b6f6614eStholo    1.1.2.1
49b6f6614eStholo
50b6f6614eStholoYes, the order seems to differ between the deltas and the deltatexts.
51b6f6614eStholoI have no idea how much of this should actually be considered part of
52b6f6614eStholothe RCS file format, and how much programs reading it should expect to
53b6f6614eStholoencounter any order.
54780d15dfStholo
552286d8edStholoThe rcsfile.5 grammar shows the {num} after "next" as optional; if it
562286d8edStholois omitted then there is no next delta node (for example 1.1 or the
572286d8edStholohead of a branch will typically have no next).
582286d8edStholo
59780d15dfStholoThere is one case where CVS uses CVS-specific, non-compatible changes
60780d15dfStholoto the RCS file format, and this is magic branches.  See cvs.texinfo
61780d15dfStholofor more information on them.  CVS also sets the RCS state to "dead"
62780d15dfStholoto indicate that a file does not exist in a given revision (this is
63780d15dfStholostored just as any other RCS state is).
64780d15dfStholo
652770ece5StholoThe RCS file format allows quite a variety of extensions to be added
662770ece5Stholoin a compatible manner by use of the "newphrase" feature documented in
672770ece5Stholorcsfile.5.  We won't try to document extensions not used by CVS in any
682770ece5Stholodetail, but we will briefly list them.  Each occurrence of a newphrase
692770ece5Stholobegins with an identifier, which is what we list here.  Future
702770ece5Stholodesigners of extensions are strongly encouraged to pick
712770ece5Stholonon-conflicting identifiers.  Note that newphrase occurs several
722770ece5Stholoplaces in the RCS grammar, and a given extension may not be legal in
732770ece5Stholoall locations.  However, it seems better to reserve a particular
742770ece5Stholoidentifier for all locations, to avoid confusion and complicated
752770ece5Stholorules.
762770ece5Stholo
772770ece5Stholo   Identifier   Used by
782770ece5Stholo   ----------   -------
792770ece5Stholo   namespace    RCS library done at Silicon Graphics Inc. (SGI) in 1996
802770ece5Stholo                (a modified RCS 5.7--not sure it has any other name).
812770ece5Stholo   dead         A set of RCS patches developed by Rich Pixley at
822286d8edStholo                Cygnus about 1992.  These were for CVS, and predated
832286d8edStholo                the current CVS death support, which uses a state "dead"
842286d8edStholo                rather than a "dead" newphrase.
852770ece5Stholo
86b6f6614eStholoCVS does use newphrases to implement the `PreservePermissions'
87b6f6614eStholoextension introduced in CVS 1.9.26.  The following new keywords are
88b6f6614eStholodefined when PreservePermissions=yes:
89b6f6614eStholo
90b6f6614eStholo   owner
91b6f6614eStholo   group
92b6f6614eStholo   permissions
93b6f6614eStholo   special
94b6f6614eStholo   symlink
95b6f6614eStholo   hardlinks
96b6f6614eStholo
97b6f6614eStholoThe contents of the `owner' and `group' field should be a numeric uid
98b6f6614eStholoand a numeric gid, respectively, representing the user and group who
99b6f6614eStholoown the file.  The `permissions' field contains an octal integer,
100b6f6614eStholorepresenting the permissions that should be applied to the file.  The
101b6f6614eStholo`special' field contains two words; the first must be either `block'
102b6f6614eStholoor `character', and the second is the file's device number.  The
103b6f6614eStholo`symlink' field should be present only in files which are symbolic
104b6f6614eSthololinks to other files, and absent on all regular files.  The
105b6f6614eStholo`hardlinks' field contains a list of filenames to which the current
106b6f6614eStholofile is linked, in alphabetical order.  Because files often contain
107b6f6614eStholocharacters special to RCS, like `.' and sometimes even contain spaces
108b6f6614eStholoor eight-bit characters, the filenames in the hardlinks field will
109b6f6614eStholousually be enclosed in RCS strings.  For example:
110b6f6614eStholo
111b6f6614eStholo	hardlinks	README @install.txt@ @Installation Notes@;
112b6f6614eStholo
113b6f6614eStholoThe hardlinks field should always include the name of the current
114b6f6614eStholofile.  That is, in the repository file README,v, any hardlinks fields
115b6f6614eStholoin the delta nodes should include `README'; CVS will not operate
116b6f6614eStholoproperly if this is not done.
117b6f6614eStholo
118*c133e2caSjcsNewphrases are also used to implement the 'commitid' feature. The
119*c133e2caSjcsfollowing new keyword is defined:
120*c133e2caSjcs
121*c133e2caSjcs   commitid
122*c133e2caSjcs
123b6c02222StholoThe rules regarding keyword expansion are not documented along with
124b6c02222Stholothe rest of the RCS file format; they are documented in the co(1)
125b6c02222Stholomanpage in the RCS 5.7 distribution.  See also the "Keyword
126b6c02222Stholosubstitution" chapter of cvs.texinfo.  The co(1) manpage refers to
127b6c02222Stholospecial behavior if the log prefix for the $Log keyword is /* or (*.
128b6c02222StholoRCS 5.7 produces a warning whenever it behaves that way, and current
129b6c02222Stholoversions of CVS do not handle this case in a special way (CVS 1.9 and
130b6c02222Stholoearlier invoke RCS to perform keyword expansion).
131b6c02222Stholo
1322286d8edStholoNote that if the "expand" keyword is omitted from the RCS file, the
1332286d8edStholodefault is "kv".
1342286d8edStholo
135b6c02222StholoNote that the "comment {string};" syntax from rcsfile.5 specifies a
136b6c02222Stholocomment leader, which affects expansion of the $Log keyword for old
137b6c02222Stholoversions of RCS.  The comment leader is not used by RCS 5.7 or current
138b6c02222Stholoversions of CVS.
139b6c02222Stholo
140b6c02222StholoBoth RCS 5.7 and current versions of CVS handle the $Log keyword in a
141b6c02222Stholodifferent way if the log message starts with "checked in with -k by ".
142b6c02222StholoI don't think this behavior is documented anywhere.
143b6c02222Stholo
144c71bc7e2StholoHere is a clarification regarding characters versus bytes in certain
145c71bc7e2Stholocharacter sets like JIS and Big5:
146c71bc7e2Stholo
147c71bc7e2Stholo    The RCS file format, as described in the rcsfile(5) man page, is
148c71bc7e2Stholo    actually byte-oriented, not character-oriented, despite hints to
149c71bc7e2Stholo    the contrary in the man page.  This distinction is important for
150c71bc7e2Stholo    multibyte characters.  For example, if a multibyte character
151c71bc7e2Stholo    contains a `@' byte, the `@' must be doubled within strings in RCS
152c71bc7e2Stholo    files, since RCS uses `@' bytes as escapes.
153c71bc7e2Stholo
154c71bc7e2Stholo    This point is not an issue for encodings like ISO 8859, which do
155c71bc7e2Stholo    not have multibyte characters.  Nor is it an issue for encodings
156c71bc7e2Stholo    like UTF-8 and EUC-JIS, which never uses ASCII bytes within a
157c71bc7e2Stholo    multibyte character.  It is an issue only for multibyte encodings
158c71bc7e2Stholo    like JIS and BIG5, which _do_ usurp ASCII bytes.
159c71bc7e2Stholo
160c71bc7e2Stholo    If `@' doubling occurs within a multibyte char, the resulting RCS
161c71bc7e2Stholo    file is not a properly encoded text file.  Instead, it is a byte
162c71bc7e2Stholo    stream that does not use a consistent character encoding that can
163c71bc7e2Stholo    be understood by the usual text tools, since doubling `@' messes
164c71bc7e2Stholo    up the encoding.  This point affects only programs that examine
165c71bc7e2Stholo    the RCS files -- it doesn't affect the external RCS interface, as
166c71bc7e2Stholo    the RCS commands always give you the properly encoded text files
167c71bc7e2Stholo    and logs (assuming that you always check in properly encoded
168c71bc7e2Stholo    text).
169c71bc7e2Stholo
170c71bc7e2Stholo    CVS 1.10 (and earlier) probably has some bugs in this area on
171c71bc7e2Stholo    systems where a C "char" is signed and where the data contains
172c71bc7e2Stholo    bytes with the eighth bit set.
173c71bc7e2Stholo
1742770ece5StholoOne common concern about the RCS file format is the fact that to get
1752770ece5Stholothe head of a branch, one must apply deltas from the head of the trunk
1762770ece5Stholoto the branchpoint, and then from the branchpoint to the head of the
1772770ece5Stholobranch.  While more detailed analyses might be worth doing, we will
1782770ece5Stholonote:
1792770ece5Stholo
1802770ece5Stholo    * The performance bottleneck for CVS generally is figuring out which
1812770ece5Stholo    files to operate on and that sort of thing, not applying deltas.
1822770ece5Stholo
1832770ece5Stholo    * Here is one quick test (probably not a very good test; a better test
1842770ece5Stholo    would use a normally sized file (say 50-200K) instead of a small one):
1852770ece5Stholo
1862770ece5Stholo	I just did a quick test with a small file (on a Sun Ultra 1/170E
1872770ece5Stholo	running Solaris 5.5.1), with 1000 revisions on the main branch and
1882770ece5Stholo	1000 revisions on branch that forked at the root (i.e., RCS revisions
1892770ece5Stholo	1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ...,
1902770ece5Stholo	1.1.1.1000).  It took about 0.15 seconds real time to check in the
1912770ece5Stholo	first revision, and about 0.6 seconds to check in and 0.3 seconds to
1922770ece5Stholo	retrieve revision 1.1.1.1000 (the worst case).
1932770ece5Stholo
1942770ece5Stholo    * Any attempt to "fix" this problem should be careful not to interfere
1952770ece5Stholo    with other features, such as lightweight creation of branches
1962770ece5Stholo    (particularly using CVS magic branches).
1972770ece5Stholo
198780d15dfStholoDiff follows:
199780d15dfStholo
200780d15dfStholo(Note that in the following diff the old value for the Id keyword was:
201780d15dfStholo    Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp
202780d15dfStholoand the new one was:
203780d15dfStholo    Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp
204780d15dfStholobut since this file itself might be subject to keyword expansion I
205780d15dfStholohaven't included a diff for that fact).
206780d15dfStholo
207780d15dfStholo===================================================================
208780d15dfStholoRCS file: RCS/rcsfile.5in,v
209780d15dfStholoretrieving revision 5.6
210780d15dfStholoretrieving revision 5.7
211780d15dfStholodiff -u -r5.6 -r5.7
212780d15dfStholo--- rcsfile.5in	1995/06/05 08:28:35	5.6
213780d15dfStholo+++ rcsfile.5in	1996/12/09 17:31:44	5.7
214780d15dfStholo@@ -85,7 +85,8 @@
215780d15dfStholo .LP
216780d15dfStholo \f2sym\fP	::=	{\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}*
217780d15dfStholo .LP
218780d15dfStholo-\f2idchar\fP	::=	any visible graphic character except \f2special\fP
219780d15dfStholo+\f2idchar\fP	::=	any visible graphic character,
220780d15dfStholo+		except \f2digit\fP or \f2special\fP
221780d15dfStholo .LP
222780d15dfStholo \f2special\fP	::=	\f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP
223780d15dfStholo .LP
224780d15dfStholo@@ -119,12 +120,23 @@
225780d15dfStholo the minute (00\-59),
226780d15dfStholo and
227780d15dfStholo .I ss
228780d15dfStholo-the second (00\-60).
229780d15dfStholo+the second (00\-59).
230780d15dfStholo+If
231780d15dfStholo .I Y
232780d15dfStholo-contains just the last two digits of the year
233780d15dfStholo-for years from 1900 through 1999,
234780d15dfStholo-and all the digits of years thereafter.
235780d15dfStholo-Dates use the Gregorian calendar; times use UTC.
236780d15dfStholo+contains exactly two digits,
237780d15dfStholo+they are the last two digits of a year from 1900 through 1999;
238780d15dfStholo+otherwise,
239780d15dfStholo+.I Y
240780d15dfStholo+contains all the digits of the year.
241780d15dfStholo+Dates use the Gregorian calendar.
242780d15dfStholo+Times use UTC, except that for portability's sake leap seconds are not allowed;
243780d15dfStholo+implementations that support leap seconds should output
244780d15dfStholo+.B 59
245780d15dfStholo+for
246780d15dfStholo+.I ss
247780d15dfStholo+during an inserted leap second, and should accept
248780d15dfStholo+.B 59
249780d15dfStholo+for a deleted leap second.
250780d15dfStholo .PP
251780d15dfStholo The
252780d15dfStholo .I newphrase
253780d15dfStholo@@ -144,16 +156,23 @@
254780d15dfStholo field in order of decreasing numbers.
255780d15dfStholo The
256780d15dfStholo .B head
257780d15dfStholo-field in the
258780d15dfStholo-.I admin
259780d15dfStholo-node points to the head of that sequence (i.e., contains
260780d15dfStholo+field points to the head of that sequence (i.e., contains
261780d15dfStholo the highest pair).
262780d15dfStholo The
263780d15dfStholo .B branch
264780d15dfStholo-node in the admin node indicates the default
265780d15dfStholo+field indicates the default
266780d15dfStholo branch (or revision) for most \*r operations.
267780d15dfStholo If empty, the default
268780d15dfStholo branch is the highest branch on the trunk.
269780d15dfStholo+The
270780d15dfStholo+.B symbols
271780d15dfStholo+field associates symbolic names with revisions.
272780d15dfStholo+For example, if the file contains
273780d15dfStholo+.B "symbols rr:1.1;"
274780d15dfStholo+then
275780d15dfStholo+.B rr
276780d15dfStholo+is a name for revision
277780d15dfStholo+.BR 1.1 .
278780d15dfStholo .PP
279780d15dfStholo All
280780d15dfStholo .I delta
281780d15dfStholo
282