xref: /386bsd/usr/local/lib/emacs/19.25/etc/etags.notes (revision a2142627)
1Return-Path: <kendall%saber@harvard.harvard.edu>
2Date: Thu, 19 Jan 89 12:06:02 EST
3From: kendall%saber@harvard.harvard.edu
4To: rms@wheaties.ai.mit.edu
5Cc: kendall@harvard.harvard.edu
6Subject: Here you go
7
8The papers I signed should arrive tomorrow if they haven't already.
9
10There are three files here: TAGS.README, a file of random comments in
11lieu of real documentation (I didn't get around to modifying the info
12tree); etags.c; and tags.el.
13
14Testing is needed for languages other than elisp, C, and C++.
15
16I'll be back next Thursday.  Hope you like the stuff; talk to you then.
17
18--Sam
19
20------------------------------- TAGS.README ----------------------------
21       Changes and suggested enhancements to etags.c and tags.el
22			      Sam Kendall
23				1/19/89
24
25etags options added (all the options should be documented in the info
26tree, but I didn't get around to doing that):
27
28	-D	C/C++ only: Do not find macro constants.  For many
29		programs, this option halves the size of the TAGS file.
30		Note a bug: macro constants defined to nothing are not
31		found.
32	-S	C/C++ only: Do not assume that a '}' at the beginning of
33		a line means the end of a function.  -S is helpful for
34		some poorly indented programs, such as "cfront".  Do not
35		use -S if your braces are unbalanced (as can happen with
36		conditional compilation).
37
38There was a -C option, meaning assume that .h and .c files are C++
39rather than C, but I made it the default.  It seems to produce very few
40spurious C++ tags in C programs.
41
42I've made it illegal to pass most ctags options to etags.  (To use the
43program as ctags, it must be compiled with -DCTAGS, or with neither
44-DETAGS nor -DCTAGS.)
45
46I haven't documented the ctags-only options here.
47
48tags.el modified as follows:
49
50 1. find-tag modified to handle the new "tag" field in a tag
51    line.  A tag line can either be in the old format:
52
53		definition\177lineno,charno
54
55    or can have the additional "tag" field:
56
57    		definition\177lineno,charno,\001tag
58
59    This "tag" field is necessary for C++, where the entire tag is NOT
60    necessarily found on the definition line.  The "tag" field also
61    helps with exact tag matching.  There may be more than one tag,
62    separated by \001 characters.
63
64    The theory here is that you provide the tag field when the
65    definition field does not contain the correct tag.  This theory could
66    be made more rigorous, to help with, for instance, exact tag matching
67    in mixed Lisp/C programs.
68
69    The old tags.el will fail with new-format lines in TAGS files, but
70    there are many new-format lines only in C++ programs.
71
72    This tags.el will work fine with old-format TAGS files.
73
74 2. All tags functions modified to ignore the character count that
75    follows each filename in the TAGS file.  Behavior is not changed,
76    but this makes it much easier to write a program that edits the
77    TAGS file.
78
79 3. When finding multiple matches for the same tag (via a prefix arg
80    to find-tag, usually invoked with "ESC ,"), exact matches
81    are found first; then word matches; then other matches.
82
83tags.el enhancements needed:
84
85	doesn't know how to handle included TAGS files yet
86	per-mode and/or local versions of find-tag-default,
87		for easy customization
88	eliminate duplicates in alist when prompting
89
90Enhancements to make:
91
92   * Generalize file suffix stuff -- clumsy and verbose now.  Should
93     be config-file- or environment variable- or option-driven.
94   * Treat C++ "operator token" as a single identifier -- canonicalize
95     to leave one space between "operator" and token when token is
96     an identifier, no space when token is an operator.
97   * Make a tag for a struct tag in plain C as `struct name' (likewise
98     for union etc.).  In canonical form, exactly one space in between.
99     Currently we make `name' into a tag in etags only; we can't in
100     ctags because `name' may not be unique, which breaks ctags.
101   * Symbol table should be implemented as a hash table instead of as a
102     linked list.
103   * The TeX stuff should use the Stab abstraction.
104   * Needs profiling and performance enhancements.
105   * Can -C be eliminated?  It is (nearly) unnecessary complexity.  Perhaps
106     merge -C and -t into a single option that indicates "well-formed
107     programs".  This option should probably be the default.  In etags
108     only, not ctags.  (ctags must not find struct tags by default in
109     plain C.)
110   * Is it possible to match C global variables?  It would be real nice.
111   * The C line "struct foo { ... };" will create a tag line that
112     searches for "struct foo ".  If the search included the "{" it
113     would be more specific.  This point is even more true if "foo" and
114     "{" are separated by a newline rather than a space, but having the
115     search pattern include a newline would require changes in the TAGS
116     file format.
117   * Needs config file to handle typical heavy C++ macrification.  Config file
118     must allow the definition of macros that expand to types (e.g.
119     `Seq(T)' expands to `Seq_T'), symbol-defining macros (e.g.
120     `DefineSeq(T)' should be interpreted as a definition of `Seq_T'),
121     macros that expand to symbols (e.g. `M(Foo,Bar)' expands to
122     `Class_Foo::Bar'), and macros that cause unbalanced braces or
123     parens (e.g. `LOOP' (whose C definition is "while(1){") leaves one
124     left brace).  Probably others.
125   * Config file should also handle definition of the "DEF" convention
126     of the Gnu Emacs source, and definition of the C* language
127     ("domain" keyword), so these can be taken out of the etags source.
128   * For C++, "::" is only understood if it occurs between identifiers with no
129     intervening whitespace.  This style of C++ function definition:
130     	   returntype classname::
131	   functionname(formals) {
132	        ...
133	   }
134     is not understood.
135   * Is there "::" handling needed for Common Lisp?
136   * Nroff/troff handling would be easy to write and useful.  It could
137     be one of the default file modes, along with Pascal, Fortran and C
138     -- if a source file starts with "'" or ".", it's nroff/troff
139     source.
140   * There should be a defined interface between the
141     language-independent part of etags and the language modules.  This
142     would make it easier to add a module.
143   * Macros defined inside braces aren't found, e.g.,
144        struct foo {
145	#define FOO() BAR
146	... };
147   * '#' is only recognized to start a preprocessor line if it is in
148     column 0.  According to ANSI it can be preceded by whitespace.
149   * A constant macro with no definition, e.g. `#define FOO', will not
150     be tagged, because `definedef' is prematurely reset by CNL.
151   * `typedef unsigned char TOK;' will get `char' defined as well as `TOK'.
152
153Items about the format of the TAGS file:
154
155   * Should make some special indication in a tags line that means both
156     (1) the pattern is an exact match for the definition line, and (2)
157     the ^? immediately follows the tag.  In absence of this indication,
158     the pattern is only the beginning of the definition line, and the
159     last character before the ^? is not part of the tag.  Helps in the
160     following situation:
161	struct foo *p;
162	struct foo
163	{
164	 ...
165	};
166     Currently find-tag finds the first line instead of the correct
167     second one.
168   * Some way to tell whether the line `char *foo^?5,200' is the tag
169     `foo' or `*foo'.  Since there are no modes in the TAGS file, we
170     can't (easily) say that since this is C, it can't be `*foo'.  So
171     this line should be `char *foo^?5,200,^Afoo'.  We need to decide on
172     the characters that are not in identifiers for any language.  How
173     about space, TAB, FF, `(', `)', and `;'?
174   * Along the same lines, both etags and tags.el should canonicalize
175     tags to lowercase for case-independent languages.  So `(defun
176     Foo^?5,200' in Lisp (but not Elisp) should be `(defun
177     Foo^?5,200,^Afoo'.
178