1.. _ctags-client-tools(7):
2
3==============================================================
4ctags-client-tools
5==============================================================
6---------------------------------------------------------------------------------
7Hints for developing a tool using @CTAGS_NAME_EXECUTABLE@ command and tags output
8---------------------------------------------------------------------------------
9:Version: @VERSION@
10:Manual group: Universal Ctags
11:Manual section: 7
12
13SYNOPSIS
14--------
15|	**@CTAGS_NAME_EXECUTABLE@** [options] [file(s)]
16|	**@ETAGS_NAME_EXECUTABLE@** [options] [file(s)]
17
18
19DESCRIPTION
20-----------
21**Client tool** means a tool running the @CTAGS_NAME_EXECUTABLE@ command
22and/or reading a tags file generated by @CTAGS_NAME_EXECUTABLE@ command.
23This man page gathers hints for people who develop client tools.
24
25
26PSEUDO-TAGS
27-----------
28**Pseudo-tags**, stored in a tag file, indicate how
29@CTAGS_NAME_EXECUTABLE@ generated the tags file: whether the
30tags file is sorted or not, which version of tags file format is used,
31the name of tags generator, and so on. The opposite term for
32pseudo-tags is **regular-tags**. A regular-tag is for a language
33object in an input file. A pseudo-tag is for the tags file
34itself. Client tools may use pseudo-tags as reference for processing
35regular-tags.
36
37A pseudo-tag is stored in a tags file in the same format as
38regular-tags as described in tags(5), except that pseudo-tag names
39are prefixed with "!_". For the general information about
40pseudo-tags, see "TAG FILE INFORMATION" in tags(5).
41
42An example of a pseudo tag::
43
44	!_TAG_PROGRAM_NAME	Universal Ctags	/Derived from Exuberant Ctags/
45
46The value, "2", associated with the pseudo tag "TAG_PROGRAM_NAME", is
47used in the field for input file. The description, "Derived from
48Exuberant Ctags", is used in the field for pattern.
49
50Universal Ctags extends the naming scheme of the classical pseudo-tags
51available in Exuberant Ctags for emitting language specific
52information as pseudo tags::
53
54	!_{pseudo-tag-name}!{language-name}	{associated-value}	/{description}/
55
56The language-name is appended to the pseudo-tag name with a separator, "!".
57
58An example of pseudo tag with a language suffix::
59
60	!_TAG_KIND_DESCRIPTION!C	f,function	/function definitions/
61
62This pseudo-tag says "the function kind of C language is enabled
63when generating this tags file." ``--pseudo-tags`` is the option for
64enabling/disabling individual pseudo-tags. When enabling/disabling a
65pseudo tag with the option, specify the tag name only
66"TAG_KIND_DESCRIPTION", without the prefix ("!_") or the suffix ("!C").
67
68
69Options for Pseudo-tags
70~~~~~~~~~~~~~~~~~~~~~~~
71``--extras=+p`` (or ``--extras=+{pseudo}``)
72	Forces writing pseudo-tags.
73
74	@CTAGS_NAME_EXECUTABLE@ emits pseudo-tags by default when writing tags
75	to a regular file (e.g. "tags'.) However, when specifying ``-o -``
76	or ``-f -`` for writing tags to standard output,
77	@CTAGS_NAME_EXECUTABLE@ doesn't emit pseudo-tags. ``--extras=+p`` or
78	``--extras=+{pseudo}`` will force pseudo-tags to be written.
79
80``--list-pseudo-tags``
81	Lists available types of pseudo-tags and shows whether they are enabled or disabled.
82
83	Running @CTAGS_NAME_EXECUTABLE@ with ``--list-pseudo-tags`` option
84	lists available pseudo-tags. Some of pseudo-tags newly introduced
85	in Universal Ctags project are disabled by default. Use
86	``--pseudo-tags=...`` to enable them.
87
88``--pseudo-tags=[+|-]names|*``
89	Specifies a list of pseudo-tag types to include in the output.
90
91	The parameters are a set of pseudo tag names. Valid pseudo tag names
92	can be listed with ``--list-pseudo-tags``. Surround each name in the set
93	with braces, like "{TAG_PROGRAM_AUTHOR}". You don't have to include the "!_"
94	pseudo tag prefix when specifying a name in the option argument for ``--pseudo-tags=``
95	option.
96
97	pseudo-tags don't have a notation using one-letter flags.
98
99	If a name is preceded by either the '+' or '-' characters, that
100	tags's effect has been added or removed. Otherwise the names replace
101	any current settings. All entries are included if '*' is given.
102
103``--fields=+E`` (or ``--fields=+{extras}``)
104	Attach "extras:pseudo" field to pseudo-tags.
105
106	An example of pseudo tags with the field::
107
108		!_TAG_PROGRAM_NAME	Universal Ctags	/Derived from Exuberant Ctags/	extras:pseudo
109
110	If the name of a normal tag in a tag file starts with "!_", a
111	client tool cannot distinguish whether the tag is a regular-tag or
112	pseudo-tag.  The fields attached with this option help the tool
113	distinguish them.
114
115
116List of notable pseudo-tags
117~~~~~~~~~~~~~~~~~~~~~~~~~~~
118Running ctags with ``--list-pseudo-tags`` option lists available types
119of pseudo-tags with short descriptions. This subsection shows hints
120for using notable ones.
121
122``TAG_EXTRA_DESCRIPTION``  (new in Universal Ctags)
123	Indicates the names and descriptions of enabled extras::
124
125	  !_TAG_EXTRA_DESCRIPTION	{extra-name}	/description/
126	  !_TAG_EXTRA_DESCRIPTION!{language-name}	{extra-name}	/description/
127
128	If your tool relies on some extra tags (extras), refer to
129	the pseudo-tags of this type. A tool can reject the tags file that
130	doesn't include expected extras, and raise an error in an early
131	stage of processing.
132
133	An example of the pseudo-tags::
134
135	  $ @CTAGS_NAME_EXECUTABLE@ --extras=+p --pseudo-tags='{TAG_EXTRA_DESCRIPTION}' -o - input.c
136	  !_TAG_EXTRA_DESCRIPTION	anonymous	/Include tags for non-named objects like lambda/
137	  !_TAG_EXTRA_DESCRIPTION	fileScope	/Include tags of file scope/
138	  !_TAG_EXTRA_DESCRIPTION	pseudo	/Include pseudo tags/
139	  !_TAG_EXTRA_DESCRIPTION	subparser	/Include tags generated by subparsers/
140	  ...
141
142	A client tool can know "{anonymous}", "{fileScope}", "{pseudo}",
143	and "{subparser}" extras are enabled from the output.
144
145``TAG_FIELD_DESCRIPTION``  (new in Universal Ctags)
146	Indicates the names and descriptions of enabled fields::
147
148	  !_TAG_FIELD_DESCRIPTION	{field-name}	/description/
149	  !_TAG_FIELD_DESCRIPTION!{language-name}	{field-name}	/description/
150
151	If your tool relies on some fields, refer to the pseudo-tags of
152	this type.  A tool can reject a tags file that doesn't include
153	expected fields, and raise an error in an early stage of
154	processing.
155
156	An example of the pseudo-tags::
157
158	  $ @CTAGS_NAME_EXECUTABLE@ --fields-C=+'{macrodef}' --extras=+p --pseudo-tags='{TAG_FIELD_DESCRIPTION}' -o - input.c
159	  !_TAG_FIELD_DESCRIPTION	file	/File-restricted scoping/
160	  !_TAG_FIELD_DESCRIPTION	input	/input file/
161	  !_TAG_FIELD_DESCRIPTION	name	/tag name/
162	  !_TAG_FIELD_DESCRIPTION	pattern	/pattern/
163	  !_TAG_FIELD_DESCRIPTION	typeref	/Type and name of a variable or typedef/
164	  !_TAG_FIELD_DESCRIPTION!C	macrodef	/macro definition/
165	  ...
166
167	A client tool can know "{file}", "{input}", "{name}", "{pattern}",
168	and "{typeref}" fields are enabled from the output.
169	The fields are common in languages. In addition to the common fields,
170	the tool can known "{macrodef}" field of C language is also enabled.
171
172``TAG_FILE_ENCODING``  (new in Universal Ctags)
173	TBW
174
175``TAG_FILE_FORMAT``
176	See also tags(5).
177
178``TAG_FILE_SORTED``
179	See also tags(5).
180
181``TAG_KIND_DESCRIPTION`` (new in Universal Ctags)
182	Indicates the names and descriptions of enabled kinds::
183
184	  !_TAG_KIND_DESCRIPTION!{language-name}	{kind-letter},{kind-name}	/description/
185
186	If your tool relies on some kinds, refer to the pseudo-tags of
187	this type.  A tool can reject the tags file that doesn't include
188	expected kinds, and raise an error in an early stage of
189	processing.
190
191	Kinds are language specific, so a language name is  always
192	appended to the tag name as suffix.
193
194	An example of the pseudo-tags::
195
196	  $ @CTAGS_NAME_EXECUTABLE@ --extras=+p --kinds-C=vfm --pseudo-tags='{TAG_KIND_DESCRIPTION}' -o - input.c
197	  !_TAG_KIND_DESCRIPTION!C	f,function	/function definitions/
198	  !_TAG_KIND_DESCRIPTION!C	m,member	/struct, and union members/
199	  !_TAG_KIND_DESCRIPTION!C	v,variable	/variable definitions/
200	  ...
201
202	A client tool can know "{function}", "{member}", and "{variable}"
203	kinds of C language are enabled from the output.
204
205``TAG_KIND_SEPARATOR`` (new in Universal Ctags)
206	TBW
207
208``TAG_OUTPUT_EXCMD`` (new in Universal Ctags)
209	Indicates the specified type of EX command with ``--excmd`` option.
210
211``TAG_OUTPUT_FILESEP`` (new in Universal Ctags)
212	TBW
213
214``TAG_OUTPUT_MODE`` (new in Universal Ctags)
215	TBW
216
217``TAG_PATTERN_LENGTH_LIMIT`` (new in Universal Ctags)
218	TBW
219
220``TAG_PROC_CWD`` (new in Universal Ctags)
221	Indicates the working directory of @CTAGS_NAME_EXECUTABLE@ during processing.
222
223	This pseudo-tag helps a client tool solve the absolute paths for
224	the input files for tag entries even when they are tagged with
225	relative paths.
226
227	An example of the pseudo-tags::
228
229	  $ cat tags
230	  !_TAG_PROC_CWD	/tmp/	//
231	  main	input.c	/^int main (void) { return 0; }$/;"	f	typeref:typename:int
232	  ...
233
234	From the regular tag for "main", the client tool can know the
235	"main" is at "input.c".  However, it is a relative path. So if the
236	directory where @CTAGS_NAME_EXECUTABLE@ run and the directory
237	where the client tool runs are different, the client tool cannot
238	find "input.c" from the file system. In that case,
239	``TAG_PROC_CWD`` gives the tool a hint; "input.c" may be at "/tmp".
240
241``TAG_PROGRAM_NAME``
242	TBW
243
244``TAG_ROLE_DESCRIPTION`` (new in Universal Ctags)
245	Indicates the names and descriptions of enabled roles::
246
247	  !_TAG_ROLE_DESCRIPTION!{language-name}!{kind-name}	{role-name}	/description/
248
249	If your tool relies on some roles, refer to the pseudo-tags of
250	this type. Note that a role owned by a disabled kind is not listed
251	even if the role itself is enabled.
252
253REDUNDANT-KINDS
254---------------
255TBW
256
257MULTIPLE-LANGUAGES FOR AN INPUT FILE
258------------------------------------
259Universal ctags can run multiple parsers.
260That means a parser, which supports multiple parsers, may output tags for
261different languages.  ``language``/``l`` field can be used to show the language
262for each tag.
263
264.. code-block:: console
265
266	$ cat /tmp/foo.html
267	<html>
268	<script>var x = 1</script>
269	<h1>title</h1>
270	</html>
271	$ ./ctags -o - --extras=+g /tmp/foo.html
272	title	/tmp/foo.html	/^  <h1>title<\/h1>$/;"	h
273	x	/tmp/foo.html	/var x = 1/;"	v
274	$ ./ctags -o - --extras=+g --fields=+l /tmp/foo.html
275	title	/tmp/foo.html	/^  <h1>title<\/h1>$/;"	h	language:HTML
276	x	/tmp/foo.html	/var x = 1/;"	v	language:JavaScript
277
278UTILIZING READTAGS
279-----------------------------------
280See readtags(1) to know how to use readtags. This section is for discussing
281some notable topics for client tools.
282
283Build Filter/Sorter Expressions
284~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
285Certain escape sequences in expressions are recognized by readtags. For
286example, when searching for a tag that matches ``a\?b``, if using a filter
287expression like ``'(eq? $name "a\?b")'``, since ``\?`` is translated into a
288single ``?`` by readtags, it actually searches for ``a?b``.
289
290Another problem is if a single quote appear in filter expressions (which is
291also wrapped by single quotes), it terminates the expression, producing broken
292expressions, and may even cause unintended shell injection. Single quotes can
293be escaped using ``'"'"'``.
294
295So, client tools need to:
296
297* Replace ``\`` by ``\\``
298* Replace ``'`` by ``'"'"'``
299
300inside the expressions. If the expression also contains strings, ``"`` in the
301strings needs to be replaced by ``\"``.
302
303Client tools written in Lisp could build the expression using lists. ``prin1``
304(in Common Lisp style Lisps) and ``write`` (in Scheme style Lisps) can
305translate the list into a string that can be directly used. For example, in
306EmacsLisp:
307
308.. code-block:: EmacsLisp
309
310   (let ((name "hi"))
311     (prin1 `(eq? $name ,name)))
312   => "(eq\\? $name "hi")"
313
314The "?" is escaped, and readtags can handle it. Scheme style Lisps should do
315proper escaping so the expression readtags gets is just the expression passed
316into ``write``. Common Lisp style Lisps may produce unrecognized escape
317sequences by readtags, like ``\#``. Readtags provides some aliases for these
318Lisps:
319
320* Use ``true`` for ``#t``.
321* Use ``false`` for ``#f``.
322* Use ``nil`` or ``()`` for ``()``.
323* Use ``(string->regexp "PATTERN")`` for ``#/PATTERN/``. Use
324  ``(string->regexp "PATTERN" :case-fold true)`` for ``#/PATTERN/i``. Notice
325  that ``string->regexp`` doesn't require escaping "/" in the pattern.
326
327Notice that even when the client tool uses this method, ``'`` still needs to be
328replaced by ``'"'"'`` to prevent broken expressions and shell injection.
329
330Another thing to notice is that missing fields are represented by ``#f``, and
331applying string operators to them will produce an error. You should always
332check if a field is missing before applying string operators. See the
333"Filtering" section in readtags(1) to know how to do this. Run "readtags -H
334filter" to see which operators take string arguments.
335
336Parse Readtags Output
337~~~~~~~~~~~~~~~~~~~~~
338In the output of readtags, tabs can appear in all field values (e.g., the tag
339name itself could contain tabs), which makes it hard to split the line into
340fields. Client tools should use the ``-E`` option, which keeps the escape
341sequences in the tags file, so the only field that could contain tabs is the
342pattern field.
343
344The pattern field could:
345
346- Use a line number. It will look like ``number;"`` (e.g. ``10;"``).
347- Use a search pattern. It will look like ``/pattern/;"`` or ``?pattern?;"``.
348  Notice that the search pattern could contain tabs.
349- Combine these two, like ``number;/pattern/;"`` or ``number;?pattern?;"``.
350
351These are true for tags files using extended format, which is the default one.
352The legacy format (i.e. ``--format=1``) doesn't include the semicolons. It's
353old and barely used, so we won't discuss it here.
354
355Client tools could split the line using the following steps:
356
357* Find the first 2 tabs in the line, so we get the name and input field.
358* From the 2nd tab:
359
360  * If a ``/`` follows, then the pattern delimiter is ``/``.
361  * If a ``?`` follows, then the pattern delimiter is ``?``.
362  * If a number follows, then:
363
364    * If a ``;/`` follows the number, then the delimiter is ``/``.
365    * If a ``;?`` follows the number, then the delimiter is ``?``.
366    * If a ``;"`` follows the number, then the field uses only line number, and
367      there's no pattern delimiter (since there's no regex pattern). In this
368      case the pattern field ends at the 3rd tab.
369
370* After the opening delimiter, find the next unescaped pattern delimiter, and
371  that's the closing delimiter. It will be followed by ``;"`` and then a tab.
372  That's the end of the pattern field. By "unescaped pattern delimiter", we
373  mean there's an even number (including 0) of backslashes before it.
374* From here, split the rest of the line into fields by tabs.
375
376Then, the escape sequences in fields other than the pattern field should be
377translated. See "Proposal" in tags(5) to know about all the escape sequences.
378
379Make Use of the Pattern Field
380~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
381
382The pattern field specifies how to find a tag in its source file. The code
383generating this field seems to have a long history, so there are some pitfalls
384and it's a bit hard to handle. A client tool could simply require the ``line:``
385field and jump to the line it specifies, to avoid using the pattern field. But
386anyway, we'll discuss how to make the best use of it here.
387
388You should take the words here merely as suggestions, and not standards. A
389client tool could definitely develop better (or simpler) ways to use the
390pattern field.
391
392From the last section, we know the pattern field could contain a line number
393and a search pattern. When it only contains the line number, handling it is
394easy: you simply go to that line.
395
396The search pattern resembles an EX command, but as we'll see later, it's
397actually not a valid one, so some manual work are required to process it.
398
399The search pattern could look like ``/pat/``, called "forward search pattern",
400or ``?pat?``, called "backward search pattern". Using a search pattern means
401even if the source file is updated, as long as the part containing the tag
402doesn't change, we could still locate the tag correctly by searching.
403
404When the pattern field only contains the search pattern, you just search for
405it. The search direction (forward/backward) doesn't matter, as it's decided
406solely by whether the ``-B`` option is enabled, and not the actual context. You
407could always start the search from say the beginning of the file.
408
409When both the search pattern and the line number are presented, you could make
410good use of the line number, by going to the line first, then searching for the
411nearest occurrence of the pattern. A way to do this is to search both forward
412and backward for the pattern, and when there is a occurrence on both sides, go
413to the nearer one.
414
415What's good about this is when there are multiple identical lines in the source
416file (e.g. the COMMON block in Fortran), this could help us find the correct
417one, even after the source file is updated and the tag position is shifted by a
418few lines.
419
420Now let's discuss how to search for the pattern. After you trim the ``/`` or
421``?`` around it, the pattern resembles a regex pattern. It should be a regex
422pattern, as required by being a valid EX command, but it's actually not, as
423you'll see below.
424
425It could begin with a ``^``, which means the pattern starts from the beginning
426of a line. It could also end with an *unescaped* ``$`` which means the pattern
427ends at the end of a line. Let's keep this information, and trim them too.
428
429Now the remaining part is the actual string containing the tag. Some characters
430are escaped:
431
432* ``\``.
433* ``$``, but only at the end of the string.
434* ``/``, but only in forward search patterns.
435* ``?``, but only in backward search patterns.
436
437You need to unescape these to get the literal string. Now you could convert
438this literal string to a regexp that matches it (by escaping, like
439``re.escape`` in Python or ``regexp-quote`` in Elisp), and assemble it with
440``^`` or ``$`` if the pattern originally has it, and finally search for the tag
441using this regexp.
442
443Remark: About a Previous Format of the Pattern Field
444~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
445
446In some earlier versions of Universal Ctags, the line number in the pattern
447field is the actual line number minus one, for forward search patterns; or plus
448one, for backward search patterns. The idea is to resemble an EX command: you
449go to the line, then search forward/backward for the pattern, and you can
450always find the correct one. But this denies the purpose of using a search
451pattern: to tolerate file updates. For example, the tag is at line 50,
452according to this scheme, the pattern field should be::
453
454	49;/pat/;"
455
456Then let's assume that some code above are removed, and the tag is now at
457line 45. Now you can't find it if you search forward from line 49.
458
459Due to this reason, Universal Ctags turns to use the actual line number. A
460client tool could distinguish them by the ``TAG_OUTPUT_EXCMD`` pseudo tag, it's
461"combine" for the old scheme, and "combineV2" for the present scheme. But
462probably there's no need to treat them differently, since "search for the
463nearest occurrence from the line" gives good result on both schemes.
464
465JSON OUTPUT
466-----------
467Universal Ctags supports `JSON <https://www.json.org/>`_ (strictly
468speaking `JSON Lines <https://jsonlines.org/>`_) output format if the
469ctags executable is built with ``libjansson``.  JSON output goes to
470standard output by default.
471
472Format
473~~~~~~
474Each JSON line represents a tag.
475
476.. code-block:: console
477
478	$ ctags --extras=+p --output-format=json --fields=-s input.py
479	{"_type": "ptag", "name": "JSON_OUTPUT_VERSION", "path": "0.0", "pattern": "in development"}
480	{"_type": "ptag", "name": "TAG_FILE_SORTED", "path": "1", "pattern": "0=unsorted, 1=sorted, 2=foldcase"}
481	...
482	{"_type": "tag", "name": "Klass", "path": "/tmp/input.py", "pattern": "/^class Klass:$/", "language": "Python", "kind": "class"}
483	{"_type": "tag", "name": "method", "path": "/tmp/input.py", "pattern": "/^    def method(self):$/", "language": "Python", "kind": "member", "scope": "Klass", "scopeKind": "class"}
484	...
485
486A key not starting with ``_`` is mapped to a field of ctags.
487"``--output-format=json --list-fields``" options list the fields.
488
489A key starting with ``_`` represents meta information of the JSON
490line.  Currently only ``_type`` key is used. If the value for the key
491is ``tag``, the JSON line represents a normal tag. If the value is
492``ptag``, the line represents a pseudo-tag.
493
494The output format can be changed in the
495future. ``JSON_OUTPUT_VERSION`` pseudo-tag provides a change
496client-tools to handle the changes.  Current version is "0.0". A
497client-tool can extract the version with ``path`` key from the
498pseudo-tag.
499
500The JSON output format is newly designed and has no limitation found
501in the default tags file format.
502
503* The values for ``kind`` key are represented in long-name flags.
504  No one-letter is here.
505
506* Scope names and scope kinds have distinguished keys: ``scope`` and ``scopeKind``.
507  They are combined in the default tags file format.
508
509Data type used in a field
510~~~~~~~~~~~~~~~~~~~~~~~~~
511Values for the most of all keys are represented in JSON string type.
512However, some of them are represented in string, integer, and/or boolean type.
513
514"``--output-format=json --list-fields``" options show What kind of data type
515used in a field of JSON.
516
517.. code-block:: console
518
519	$ ctags --output-format=json --list-fields
520	#LETTER NAME           ENABLED LANGUAGE         JSTYPE FIXED DESCRIPTION
521	F       input          yes     NONE             s--    no    input file
522	...
523	P       pattern        yes     NONE             s-b    no    pattern
524	...
525	f       file           yes     NONE             --b    no    File-restricted scoping
526	...
527	e       end            no      NONE             -i-    no    end lines of various items
528	...
529
530``JSTYPE`` column shows the data types.
531
532'``s``'
533	string
534
535'``i``'
536	integer
537
538'``b``'
539	boolean (true or false)
540
541For an example, the value for ``pattern`` field of ctags takes a string or a boolean value.
542
543SEE ALSO
544--------
545ctags(1), ctags-lang-python(7), ctags-incompatibilities(7), tags(5), readtags(1)
546