1.. _changes_tags_file:
2
3Changes to the tags file format
4---------------------------------------------------------------------
5
6``F`` kind usage
7~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8
9You cannot use ``F`` (``file``) kind in your .ctags because Universal Ctags
10reserves it. See :ref:`ctags-incompatibilities(7) <ctags-incompatibilities(7)>`.
11
12Reference tags
13~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14
15Traditionally ctags collects the information for locating where a
16language object is DEFINED.
17
18In addition Universal Ctags supports reference tags. If the extra-tag
19``r`` is enabled, Universal Ctags also collects the information for
20locating where a language object is REFERENCED. This feature was
21proposed by @shigio in `#569
22<https://github.com/universal-ctags/ctags/issues/569>`_ for GNU GLOBAL.
23
24Here are some examples. Here is the target input file named reftag.c.
25
26.. code-block:: c
27
28    #include <stdio.h>
29    #include "foo.h"
30    #define TYPE point
31    struct TYPE { int x, y; };
32    TYPE p;
33    #undef TYPE
34
35
36Traditional output:
37
38.. code-block:: console
39
40    $ ctags -o - reftag.c
41    TYPE	reftag.c	/^#define TYPE /;"	d	file:
42    TYPE	reftag.c	/^struct TYPE { int x, y; };$/;"	s	file:
43    p	reftag.c	/^TYPE p;$/;"	v	typeref:typename:TYPE
44    x	reftag.c	/^struct TYPE { int x, y; };$/;"	m	struct:TYPE	typeref:typename:int	file:
45    y	reftag.c	/^struct TYPE { int x, y; };$/;"	m	struct:TYPE	typeref:typename:int	file:
46
47Output with the extra-tag ``r`` enabled:
48
49.. code-block:: console
50
51    $ ctags --list-extras | grep ^r
52    r	Include reference tags	off
53    $ ctags -o - --extras=+r reftag.c
54    TYPE	reftag.c	/^#define TYPE /;"	d	file:
55    TYPE	reftag.c	/^#undef TYPE$/;"	d	file:
56    TYPE	reftag.c	/^struct TYPE { int x, y; };$/;"	s	file:
57    foo.h	reftag.c	/^#include "foo.h"/;"	h
58    p	reftag.c	/^TYPE p;$/;"	v	typeref:typename:TYPE
59    stdio.h	reftag.c	/^#include <stdio.h>/;"	h
60    x	reftag.c	/^struct TYPE { int x, y; };$/;"	m	struct:TYPE	typeref:typename:int	file:
61    y	reftag.c	/^struct TYPE { int x, y; };$/;"	m	struct:TYPE	typeref:typename:int	file:
62
63`#undef X` and two `#include` are newly collected.
64
65"roles" is a newly introduced field in Universal Ctags. The field
66named is for recording how a tag is referenced. If a tag is definition
67tag, the roles field has "def" as its value.
68
69Universal Ctags prints the role information when the `r`
70field is enabled with ``--fields=+r``.
71
72.. code-block:: console
73
74    $ ctags -o - --extras=+r --fields=+r reftag.c
75    TYPE	reftag.c	/^#define TYPE /;"	d	file:
76    TYPE	reftag.c	/^#undef TYPE$/;"	d	file:	roles:undef
77    TYPE	reftag.c	/^struct TYPE { int x, y; };$/;"	s	file:	roles:def
78    foo.h	reftag.c	/^#include "foo.h"/;"	h	roles:local
79    p	reftag.c	/^TYPE p;$/;"	v	typeref:typename:TYPE	roles:def
80    stdio.h	reftag.c	/^#include <stdio.h>/;"	h	roles:system
81    x	reftag.c	/^struct TYPE { int x, y; };$/;"	m	struct:TYPE	typeref:typename:int	file:	roles:def
82    y	reftag.c	/^struct TYPE { int x, y; };$/;"	m	struct:TYPE	typeref:typename:int	file:	roles:def
83
84The `Reference tag marker` field, ``R``, is a specialized GNU global
85requirement; D is used for the traditional definition tags, and R is
86used for the new reference tags. The field can be used only with
87``--_xformat``.
88
89.. code-block:: console
90
91    $ ctags -x --_xformat="%R %-16N %4n %-16F %C" --extras=+r reftag.c
92    D TYPE                3 reftag.c         #define TYPE point
93    D TYPE                4 reftag.c         struct TYPE { int x, y; };
94    D p                   5 reftag.c         TYPE p;
95    D x                   4 reftag.c         struct TYPE { int x, y; };
96    D y                   4 reftag.c         struct TYPE { int x, y; };
97    R TYPE                6 reftag.c         #undef TYPE
98    R foo.h               2 reftag.c         #include "foo.h"
99    R stdio.h             1 reftag.c         #include <stdio.h>
100
101See :ref:`Customizing xref output <xformat>` for more details about
102``--_xformat``.
103
104Although the facility for collecting reference tags is implemented,
105only a few parsers currently utilize it. All available roles can be
106listed with ``--list-roles``:
107
108.. code-block:: console
109
110    $ ctags --list-roles
111    #LANGUAGE      KIND(L/N)         NAME                ENABLED DESCRIPTION
112    SystemdUnit    u/unit            Requires            on      referred in Requires key
113    SystemdUnit    u/unit            Wants               on      referred in Wants key
114    SystemdUnit    u/unit            After               on      referred in After key
115    SystemdUnit    u/unit            Before              on      referred in Before key
116    SystemdUnit    u/unit            RequiredBy          on      referred in RequiredBy key
117    SystemdUnit    u/unit            WantedBy            on      referred in WantedBy key
118    Yaml           a/anchor          alias               on      alias
119    DTD            e/element         attOwner            on      attributes owner
120    Automake       c/condition       branched            on      used for branching
121    Cobol          S/sourcefile      copied              on      copied in source file
122    Maven2         g/groupId         dependency          on      dependency
123    DTD            p/parameterEntity elementName         on      element names
124    DTD            p/parameterEntity condition           on      conditions
125    LdScript       s/symbol          entrypoint          on      entry points
126    LdScript       i/inputSection    discarded           on      discarded when linking
127    ...
128
129.. NOTE: --xformat is the only way to extract referenced tag
130
131The first column shows the name of the parser.
132The second column shows the letter/name of the kind.
133The third column shows the name of the role.
134The fourth column shows whether the role is enabled or not.
135The fifth column shows the description of the role.
136
137You can define a role in an optlib parser for capturing reference
138tags. See :ref:`Capturing reference tags <roles>` for more
139details.
140
141``--roles-<LANG>.<KIND>`` is the option for enabling/disabling
142specified roles.
143
144Pseudo-tags
145~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146
147.. IN MAN PAGE
148
149See :ref:`ctags-client-tools(7) <ctags-client-tools(7)>` about the
150concept of the pseudo-tags.
151
152.. TODO move the following contents to ctags-client-tools(7).
153
154``TAG_KIND_DESCRIPTION``
155.........................................................................
156
157This is a newly introduced pseudo-tag. It is not emitted by default.
158It is emitted only when ``--pseudo-tags=+TAG_KIND_DESCRIPTION`` is
159given.
160
161This is for describing kinds; their letter, name, and description are
162enumerated in the tag.
163
164ctags emits ``TAG_KIND_DESCRIPTION`` with following format::
165
166	!_TAG_KIND_SEPARATOR!{parser}	{letter},{name}	/{description}/
167
168A backslash and a slash in {description} is escaped with a backslash.
169
170
171``TAG_KIND_SEPARATOR``
172.........................................................................
173
174This is a newly introduced pseudo-tag. It is not emitted by default.
175It is emitted only when ``--pseudo-tags=+TAG_KIND_SEPARATOR`` is
176given.
177
178This is for describing separators placed between two kinds in a
179language.
180
181Tag entries including the separators are emitted when ``--extras=+q``
182is given; fully qualified tags contain the separators. The separators
183are used in scope information, too.
184
185ctags emits ``TAG_KIND_SEPARATOR`` with following format::
186
187	!_TAG_KIND_SEPARATOR!{parser}	{sep}	/{upper}{lower}/
188
189or ::
190
191	!_TAG_KIND_SEPARATOR!{parser}	{sep}	/{lower}/
192
193Here {parser} is the name of language. e.g. PHP.
194{lower} is the letter representing the kind of the lower item.
195{upper} is the letter representing the kind of the upper item.
196{sep} is the separator placed between the upper item and the lower
197item.
198
199The format without {upper} is for representing a root separator. The
200root separator is used as prefix for an item which has no upper scope.
201
202`*` given as {upper} is a fallback wild card; if it is given, the
203{sep} is used in combination with any upper item and the item
204specified with {lower}.
205
206Each backslash character used in {sep} is escaped with an extra
207backslash character.
208
209Example output:
210
211.. code-block:: console
212
213    $ ctags -o - --extras=+p --pseudo-tags=  --pseudo-tags=+TAG_KIND_SEPARATOR input.php
214    !_TAG_KIND_SEPARATOR!PHP	::	/*c/
215    ...
216    !_TAG_KIND_SEPARATOR!PHP	\\	/c/
217    ...
218    !_TAG_KIND_SEPARATOR!PHP	\\	/nc/
219    ...
220
221The first line means ``::`` is used when combining something with an
222item of the class kind.
223
224The second line means ``\\`` is used when a class item is at the top
225level; no upper item is specified.
226
227The third line means ``\\`` is used when for combining a namespace item
228(upper) and a class item (lower).
229
230Of course, ctags uses the more specific line when choosing a
231separator; the third line has higher priority than the first.
232
233``TAG_OUTPUT_FILESEP``
234.........................................................................
235
236This pseudo-tag represents the separator used in file name: slash or
237backslash.  This is always 'slash' on Unix-like environments.
238This is also 'slash' by default on Windows, however when
239``--output-format=e-tags`` or ``--use-slash-as-filename-separator=no``
240is specified, it becomes 'backslash'.
241
242
243``TAG_OUTPUT_MODE``
244.........................................................................
245
246.. NOT REVIEWED YET
247
248This pseudo-tag represents output mode: u-ctags or e-ctags.
249This is controlled by ``--output-format`` option.
250
251See also :ref:`Compatible output and weakness <compat-output>`.
252
253Truncating the pattern for long input lines
254~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
255
256See ``--pattern-length-limit=N`` option in :ref:`ctags(1) <ctags(1)>`.
257
258.. _parser-specific-fields:
259
260Parser specific fields
261~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262
263A tag has a `name`, an `input` file name, and a `pattern` as basic
264information. Some fields like `language:`, `signature:`, etc are
265attached to the tag as optional information.
266
267In Exuberant Ctags, fields are common to all languages.
268Universal Ctags extends the concept of fields; a parser can define
269its specific field. This extension was proposed by @pragmaware in
270`#857 <https://github.com/universal-ctags/ctags/issues/857>`_.
271
272For implementing the parser specific fields, the options for listing and
273enabling/disabling fields are also extended.
274
275In the output of ``--list-fields``, the owner of the field is printed
276in the `LANGUAGE` column:
277
278.. code-block:: console
279
280	$ ctags --list-fields
281	#LETTER NAME            ENABLED LANGUAGE         XFMT  DESCRIPTION
282	...
283	-       end             off     C                TRUE   end lines of various constructs
284	-       properties      off     C                TRUE   properties (static, inline, mutable,...)
285	-       end             off     C++              TRUE   end lines of various constructs
286	-       template        off     C++              TRUE   template parameters
287	-       captures        off     C++              TRUE   lambda capture list
288	-       properties      off     C++              TRUE   properties (static, virtual, inline, mutable,...)
289	-       sectionMarker   off     reStructuredText TRUE   character used for declaring section
290	-       version         off     Maven2           TRUE   version of artifact
291
292e.g. reStructuredText is the owner of the sectionMarker field and
293both C and C++ own the end field.
294
295``--list-fields`` takes one optional argument, `LANGUAGE`. If it is
296given, ``--list-fields`` prints only the fields for that parser:
297
298.. code-block:: console
299
300	$ ctags --list-fields=Maven2
301	#LETTER NAME            ENABLED LANGUAGE        XFMT  DESCRIPTION
302	-       version         off     Maven2          TRUE  version of artifact
303
304A parser specific field only has a long name, no letter. For
305enabling/disabling such fields, the name must be passed to
306``--fields-<LANG>``.
307
308e.g. for enabling the `sectionMarker` field owned by the
309`reStructuredText` parser, use the following command line:
310
311.. code-block:: console
312
313	$ ctags --fields-reStructuredText=+{sectionMarker} ...
314
315The wild card notation can be used for enabling/disabling parser specific
316fields, too. The following example enables all fields owned by the
317`C++` parser.
318
319.. code-block:: console
320
321	$ ctags --fields-C++='*' ...
322
323`*` can also be used for specifying languages.
324
325The next example is for enabling `end` fields for all languages which
326have such a field.
327
328.. code-block:: console
329
330	$ ctags --fields-'*'=+'{end}' ...
331	...
332
333In this case, using wild card notation to specify the language, not
334only fields owned by parsers but also common fields having the name
335specified (`end` in this example) are enabled/disabled.
336
337Using the wild card notation to specify the language is helpful to
338avoid incompatibilities between versions of Universal Ctags itself
339(SELF INCOMPATIBLY).
340
341In Universal Ctags development, a parser developer may add a new
342parser specific field for a certain language.  Sometimes other developers
343then recognize it is meaningful not only for the original language
344but also other languages. In this case the field may be promoted to a
345common field. Such a promotion will break the command line
346compatibility for ``--fields-<LANG>`` usage. The wild card for
347`<LANG>` will help in avoiding this unwanted effect of the promotion.
348
349With respect to the tags file format, nothing is changed when
350introducing parser specific fields; `<fieldname>`:`<value>` is used as
351before and the name of field owner is never prefixed. The `language:`
352field of the tag identifies the owner.
353
354
355Parser specific extras
356~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
357
358.. NOT REVIEWED YET
359
360As man page of Exuberant Ctags says, ``--extras`` option specifies
361whether to include extra tag entries for certain kinds of information.
362This option is available in Universal Ctags, too.
363
364In Universal Ctags it is extended; a parser can define its specific
365extra flags. They can be controlled with ``--extras-<LANG>=[+|-]{...}``.
366
367See some examples:
368
369.. code-block:: console
370
371	$ ctags --list-extras
372	#LETTER NAME                   ENABLED LANGUAGE         DESCRIPTION
373	F       fileScope              TRUE    NONE             Include tags ...
374	f       inputFile              FALSE   NONE             Include an entry ...
375	p       pseudo                 FALSE   NONE             Include pseudo tags
376	q       qualified              FALSE   NONE             Include an extra ...
377	r       reference              FALSE   NONE             Include reference tags
378	g       guest                  FALSE   NONE             Include tags ...
379	-       whitespaceSwapped      TRUE    Robot            Include tags swapping ...
380
381See the `LANGUAGE` column. NONE means the extra flags are language
382independent (common). They can be enabled or disabled with `--extras=` as before.
383
384Look at `whitespaceSwapped`. Its language is `Robot`. This flag is enabled
385by default but can be disabled with `--extras-Robot=-{whitespaceSwapped}`.
386
387.. code-block:: console
388
389    $ cat input.robot
390    *** Keywords ***
391    it's ok to be correct
392	Python_keyword_2
393
394    $ ctags -o - input.robot
395    it's ok to be correct	input.robot	/^it's ok to be correct$/;"	k
396    it's_ok_to_be_correct	input.robot	/^it's ok to be correct$/;"	k
397
398    $ ctags -o - --extras-Robot=-'{whitespaceSwapped}' input.robot
399    it's ok to be correct	input.robot	/^it's ok to be correct$/;"	k
400
401When disabled the name `it's_ok_to_be_correct` is not included in the
402tags output.  In other words, the name `it's_ok_to_be_correct` is
403derived from the name `it's ok to be correct` when the extra flag is
404enabled.
405
406Discussion
407.........................................................................
408
409.. NOT REVIEWED YET
410
411(This subsection should move to somewhere for developers.)
412
413The question is what are extra tag entries. As far as I know none has
414answered explicitly. I have two ideas in Universal Ctags. I
415write "ideas", not "definitions" here because existing parsers don't
416follow the ideas. They are kept as is in variety reasons but the
417ideas may be good guide for people who wants to write a new parser
418or extend an exiting parser.
419
420The first idea is that a tag entry whose name is appeared in the input
421file as is, the entry is NOT an extra. (If you want to control the
422inclusion of such entries, the classical ``--kind-<LANG>=[+|-]...`` is
423what you want.)
424
425Qualified tags, whose inclusion is controlled by ``--extras=+q``, is
426explained well with this idea.
427Let's see an example:
428
429.. code-block:: console
430
431    $ cat input.py
432    class Foo:
433	def func (self):
434	    pass
435
436    $ ctags -o - --extras=+q --fields=+E input.py
437    Foo	input.py	/^class Foo:$/;"	c
438    Foo.func	input.py	/^    def func (self):$/;"	m	class:Foo	extra:qualified
439    func	input.py	/^    def func (self):$/;"	m	class:Foo
440
441`Foo` and `func` are in `input.py`. So they are no extra tags.  In
442other hand, `Foo.func` is not in `input.py` as is. The name is
443generated by ctags as a qualified extra tag entry.
444`whitespaceSwapped` extra flag of  `Robot` parser is also aligned well
445on the idea.
446
447I don't say all parsers follows this idea.
448
449.. code-block:: console
450
451    $ cat input.cc
452    class A
453    {
454      A operator+ (int);
455    };
456
457    $ ctags --kinds-all='*' --fields= -o - input.cc
458    A	input.cc	/^class A$/
459    operator +	input.cc	/^  A operator+ (int);$/
460
461In this example `operator+` is in `input.cc`.
462In other hand, `operator +`  is in the ctags output as non extra tag entry.
463See a whitespace between the keyword `operator` and `+` operator.
464This is an exception of the first idea.
465
466The second idea is that if the *inclusion* of a tag cannot be
467controlled well with ``--kind-<LANG>=[+|-]...``, the tag may be an
468extra.
469
470.. code-block:: console
471
472    $ cat input.c
473    static int foo (void)
474    {
475	    return 0;
476    }
477    int bar (void)
478    {
479	    return 1;
480    }
481
482    $ ctags --sort=no -o - --extras=+F input.c
483    foo	input.c	/^static int foo (void)$/;"	f	typeref:typename:int	file:
484    bar	input.c	/^int bar (void)$/;"	f	typeref:typename:int
485
486    $ ctags -o - --extras=-F input.c
487    foo	input.c	/^static int foo (void)$/;"	f	typeref:typename:int	file:
488
489    $
490
491Function `foo` of C language is included only when `F` extra flag
492is enabled. Both `foo` and `bar` are functions. Their inclusions
493can be controlled with `f` kind of C language: ``--kind-C=[+|-]f``.
494
495The difference between static modifier or implicit extern modifier in
496a function definition is handled by `F` extra flag.
497
498Basically the concept kind is for handling the kinds of language
499objects: functions, variables, macros, types, etc. The concept extra
500can handle the other aspects like scope (static or extern).
501
502However, a parser developer can take another approach instead of
503introducing parser specific extra; one can prepare `staticFunction` and
504`exportedFunction` as kinds of one's parser.  The second idea is a
505just guide; the parser developer must decide suitable approach for the
506target language.
507
508Anyway, in the second idea, ``--extras`` is for controlling inclusion
509of tags. If what you want is not about inclusion, ``--param-<LANG>``
510can be used as the last resort.
511
512
513Parser specific parameter
514~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
515
516.. NOT REVIEWED YET
517
518To control the detail of a parser, ``--param-<LANG>`` option is introduced.
519``--kinds-<LANG>``, ``--fields-<LANG>``, ``--extras-<LANG>``
520can be used for customizing the behavior of a parser specified with ``<LANG>``.
521
522``--param-<LANG>`` should be used for aspects of the parser that
523the options(kinds, fields, extras) cannot handle well.
524
525A parser defines a set of parameters. Each parameter has name and
526takes an argument. A user can set a parameter with following notation
527::
528
529   --param-<LANG>.name=arg
530
531An example of specifying a parameter
532::
533
534   --param-CPreProcessor.if0=true
535
536Here `if0` is a name of parameter of CPreProcessor parser and
537`true` is the value of it.
538
539All available parameters can be listed with ``--list-params`` option.
540
541.. code-block:: console
542
543    $ ctags --list-params
544    #PARSER         NAME     DESCRIPTION
545    CPreProcessor   if0      examine code within "#if 0" branch (true or [false])
546    CPreProcessor   ignore   a token to be specially handled
547
548(At this time only CPreProcessor parser has parameters.)
549