1Return-Path: <kendall%saber@harvard.harvard.edu> 2Date: Thu, 19 Jan 89 12:06:02 EST 3From: kendall%saber@harvard.harvard.edu 4To: rms@wheaties.ai.mit.edu 5Cc: kendall@harvard.harvard.edu 6Subject: Here you go 7 8The papers I signed should arrive tomorrow if they haven't already. 9 10There are three files here: TAGS.README, a file of random comments in 11lieu of real documentation (I didn't get around to modifying the info 12tree); etags.c; and tags.el. 13 14Testing is needed for languages other than elisp, C, and C++. 15 16I'll be back next Thursday. Hope you like the stuff; talk to you then. 17 18--Sam 19 20------------------------------- TAGS.README ---------------------------- 21 Changes and suggested enhancements to etags.c and tags.el 22 Sam Kendall 23 1/19/89 24 25etags options added (all the options should be documented in the info 26tree, but I didn't get around to doing that): 27 28 -D C/C++ only: Do not find macro constants. For many 29 programs, this option halves the size of the TAGS file. 30 Note a bug: macro constants defined to nothing are not 31 found. 32 -S C/C++ only: Do not assume that a '}' at the beginning of 33 a line means the end of a function. -S is helpful for 34 some poorly indented programs, such as "cfront". Do not 35 use -S if your braces are unbalanced (as can happen with 36 conditional compilation). 37 38There was a -C option, meaning assume that .h and .c files are C++ 39rather than C, but I made it the default. It seems to produce very few 40spurious C++ tags in C programs. 41 42I've made it illegal to pass most ctags options to etags. (To use the 43program as ctags, it must be compiled with -DCTAGS, or with neither 44-DETAGS nor -DCTAGS.) 45 46I haven't documented the ctags-only options here. 47 48tags.el modified as follows: 49 50 1. find-tag modified to handle the new "tag" field in a tag 51 line. A tag line can either be in the old format: 52 53 definition\177lineno,charno 54 55 or can have the additional "tag" field: 56 57 definition\177lineno,charno,\001tag 58 59 This "tag" field is necessary for C++, where the entire tag is NOT 60 necessarily found on the definition line. The "tag" field also 61 helps with exact tag matching. There may be more than one tag, 62 separated by \001 characters. 63 64 The theory here is that you provide the tag field when the 65 definition field does not contain the correct tag. This theory could 66 be made more rigorous, to help with, for instance, exact tag matching 67 in mixed Lisp/C programs. 68 69 The old tags.el will fail with new-format lines in TAGS files, but 70 there are many new-format lines only in C++ programs. 71 72 This tags.el will work fine with old-format TAGS files. 73 74 2. All tags functions modified to ignore the character count that 75 follows each filename in the TAGS file. Behavior is not changed, 76 but this makes it much easier to write a program that edits the 77 TAGS file. 78 79 3. When finding multiple matches for the same tag (via a prefix arg 80 to find-tag, usually invoked with "ESC ,"), exact matches 81 are found first; then word matches; then other matches. 82 83tags.el enhancements needed: 84 85 doesn't know how to handle included TAGS files yet 86 per-mode and/or local versions of find-tag-default, 87 for easy customization 88 eliminate duplicates in alist when prompting 89 90Enhancements to make: 91 92 * Generalize file suffix stuff -- clumsy and verbose now. Should 93 be config-file- or environment variable- or option-driven. 94 * Treat C++ "operator token" as a single identifier -- canonicalize 95 to leave one space between "operator" and token when token is 96 an identifier, no space when token is an operator. 97 * Make a tag for a struct tag in plain C as `struct name' (likewise 98 for union etc.). In canonical form, exactly one space in between. 99 Currently we make `name' into a tag in etags only; we can't in 100 ctags because `name' may not be unique, which breaks ctags. 101 * Symbol table should be implemented as a hash table instead of as a 102 linked list. 103 * The TeX stuff should use the Stab abstraction. 104 * Needs profiling and performance enhancements. 105 * Can -C be eliminated? It is (nearly) unnecessary complexity. Perhaps 106 merge -C and -t into a single option that indicates "well-formed 107 programs". This option should probably be the default. In etags 108 only, not ctags. (ctags must not find struct tags by default in 109 plain C.) 110 * Is it possible to match C global variables? It would be real nice. 111 * The C line "struct foo { ... };" will create a tag line that 112 searches for "struct foo ". If the search included the "{" it 113 would be more specific. This point is even more true if "foo" and 114 "{" are separated by a newline rather than a space, but having the 115 search pattern include a newline would require changes in the TAGS 116 file format. 117 * Needs config file to handle typical heavy C++ macrification. Config file 118 must allow the definition of macros that expand to types (e.g. 119 `Seq(T)' expands to `Seq_T'), symbol-defining macros (e.g. 120 `DefineSeq(T)' should be interpreted as a definition of `Seq_T'), 121 macros that expand to symbols (e.g. `M(Foo,Bar)' expands to 122 `Class_Foo::Bar'), and macros that cause unbalanced braces or 123 parens (e.g. `LOOP' (whose C definition is "while(1){") leaves one 124 left brace). Probably others. 125 * Config file should also handle definition of the "DEF" convention 126 of the Gnu Emacs source, and definition of the C* language 127 ("domain" keyword), so these can be taken out of the etags source. 128 * For C++, "::" is only understood if it occurs between identifiers with no 129 intervening whitespace. This style of C++ function definition: 130 returntype classname:: 131 functionname(formals) { 132 ... 133 } 134 is not understood. 135 * Is there "::" handling needed for Common Lisp? 136 * Nroff/troff handling would be easy to write and useful. It could 137 be one of the default file modes, along with Pascal, Fortran and C 138 -- if a source file starts with "'" or ".", it's nroff/troff 139 source. 140 * There should be a defined interface between the 141 language-independent part of etags and the language modules. This 142 would make it easier to add a module. 143 * Macros defined inside braces aren't found, e.g., 144 struct foo { 145 #define FOO() BAR 146 ... }; 147 * '#' is only recognized to start a preprocessor line if it is in 148 column 0. According to ANSI it can be preceded by whitespace. 149 * A constant macro with no definition, e.g. `#define FOO', will not 150 be tagged, because `definedef' is prematurely reset by CNL. 151 * `typedef unsigned char TOK;' will get `char' defined as well as `TOK'. 152 153Items about the format of the TAGS file: 154 155 * Should make some special indication in a tags line that means both 156 (1) the pattern is an exact match for the definition line, and (2) 157 the ^? immediately follows the tag. In absence of this indication, 158 the pattern is only the beginning of the definition line, and the 159 last character before the ^? is not part of the tag. Helps in the 160 following situation: 161 struct foo *p; 162 struct foo 163 { 164 ... 165 }; 166 Currently find-tag finds the first line instead of the correct 167 second one. 168 * Some way to tell whether the line `char *foo^?5,200' is the tag 169 `foo' or `*foo'. Since there are no modes in the TAGS file, we 170 can't (easily) say that since this is C, it can't be `*foo'. So 171 this line should be `char *foo^?5,200,^Afoo'. We need to decide on 172 the characters that are not in identifiers for any language. How 173 about space, TAB, FF, `(', `)', and `;'? 174 * Along the same lines, both etags and tags.el should canonicalize 175 tags to lowercase for case-independent languages. So `(defun 176 Foo^?5,200' in Lisp (but not Elisp) should be `(defun 177 Foo^?5,200,^Afoo'. 178