1\input texinfo @c -*-texinfo-*-
2@c %**start of header
3@setfilename find-maint.info
4@include versionmaint.texi
5
6@settitle Maintaining GNU Findutils @value{VERSION}
7@c For double-sided printing, uncomment:
8@c @setchapternewpage odd
9@c %**end of header
10
11@iftex
12@finalout
13@end iftex
14
15@dircategory GNU organization
16@direntry
17* Maintaining Findutils: (find-maint).        Maintaining GNU findutils
18@end direntry
19
20@copying
21This manual explains how GNU findutils is maintained, how changes should
22be made and tested, and what resources exist to help developers.
23
24This document corresponds to version @value{VERSION} of the GNU findutils.
25
26Copyright @copyright{} 2007--2021 Free Software Foundation, Inc.
27
28@quotation
29Permission is granted to copy, distribute and/or modify this document
30under the terms of the GNU Free Documentation License, Version 1.3 or
31any later version published by the Free Software Foundation; with no
32Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
33A copy of the license is included in the section entitled
34``GNU Free Documentation License''.
35@end quotation
36@end copying
37
38@titlepage
39@title Maintaining GNU Findutils
40@subtitle version @value{VERSION}, @value{UPDATED}
41@author by James Youngman
42
43@page
44@vskip 0pt plus 1filll
45@insertcopying
46@end titlepage
47
48@contents
49
50@ifnottex
51@node Top, Introduction, (dir), (dir)
52@top Maintaining GNU Findutils
53
54@insertcopying
55@end ifnottex
56
57@menu
58* Introduction::
59* Maintaining GNU Programs::
60* Design Issues::
61* Coding Conventions::
62* Tools::
63* Using the GNU Portability Library::
64* Documentation::
65* Testing::
66* Bugs::
67* Distributions::
68* Internationalisation::
69* Security::
70* Making Releases::
71* GNU Free Documentation License::
72@end menu
73
74
75
76
77
78@node Introduction
79@chapter Introduction
80
81This document explains how to contribute to and maintain GNU
82Findutils.  It concentrates on developer-specific issues.  For
83information about how to use the software please refer to
84@xref{Introduction, ,Introduction,find,The Findutils manual}.
85
86This manual aims to be useful without necessarily being verbose.  It's
87also a recent document, so there will be a many areas in which
88improvements can be made.  If you find that the document misses out
89important information or any part of the document is be so terse as to
90be unuseful, please ask for help on the @email{bug-findutils@@gnu.org}
91mailing list.  We'll try to improve this document too.
92
93
94@node Maintaining GNU Programs
95@chapter Maintaining GNU Programs
96
97GNU Findutils is part of the GNU Project and so there are a number of
98documents which set out standards for the maintenance of GNU
99software.
100
101@table @file
102@item standards.texi
103GNU Project Coding Standards.  All changes to findutils should comply
104with these standards.  In some areas we go somewhat beyond the
105requirements of the standards, but these cases are explained in this
106manual.
107@item maintain.texi
108Information for Maintainers of GNU Software.  This document provides
109guidance for GNU maintainers.  Everybody with commit access should
110read this document.   Everybody else is welcome to do so too, of
111course.
112@end table
113
114
115
116@node Design Issues
117@chapter Design Issues
118
119The findutils package is installed on many many systems, usually as a
120fundamental component.  The programs in the package are often used in
121order to successfully boot or fix the system.
122
123This fact means that for findutils we bear in mind considerations that
124may not apply so much as for other packages.  For example, the fact
125that findutils is often a base component motivates us to
126@itemize
127@item Limit dependencies on libraries
128@item Avoid dependencies on other large packages (for example, interpreters)
129@item Be conservative when making changes to the 'stable' release branch
130@end itemize
131
132All those considerations come before functionality.  Functional
133enhancements are still made to findutils, but these are almost
134exclusively introduced in the 'development' release branch, to allow
135extensive testing and proving.
136
137Sometimes it is useful to have a priority list to provide guidance
138when making design trade-offs.   For findutils, that priority list is:
139
140@enumerate
141@item Correctness
142@item Standards compliance
143@item Security
144@item Backward compatibility
145@item Performance
146@item Functionality
147@end enumerate
148
149For example, we support the @code{-exec} action because POSIX
150compliance requires this, even though there are security problems with
151it and we would otherwise prefer people to use @code{-execdir}.  There
152are also cases where some performance is sacrificed in the name of
153security.  For example, the sanity checks that @code{find} performs
154while traversing a directory tree may slow it down.   We adopt
155functional changes, and functional changes are allowed to make
156@code{find} slower, but only if there is no detectable impact on users
157who don't use the feature.
158
159Backward-incompatible changes do get made in order to comply with
160standards (for example the behaviour of @code{-perm -...} changed in
161order to comply with POSIX).  However, they don't get made in order to
162provide better ease of use; for example the semantics of @code{-size
163-2G} are almost always unexpected by users, but we retain the current
164behaviour because of backward compatibility and for its similarity to
165the block-rounding behaviour of @code{-size -30}.  We might introduce
166a change which does not have the unfortunate rounding behaviour, but
167we would choose another syntax (for example @code{-size '<2G'}) for
168this.
169
170In a general sense, we try to do test-driven development of the
171findutils code; that is, we try to implement test cases for new
172features and bug fixes before modifying the code to make the test
173pass.  Some features of the code are tested well, but the test
174coverage for other features is less good.  If you are about to modify
175the code for a predicate and aren't sure about the test coverage, use
176@code{grep} on the test directories and measure the coverage with
177@code{lcov} or another test coverage tool.
178
179You should be able to use the @code{coverage} Makefile target (it's
180defined in @code{maint.mk} to generate a test coverage report for
181findutils.   Due to limitations in @code{lcov}, this only works if
182your build directory is the same asthe source directory (that is,
183you're not using a VPATH build configuration).
184
185Lastly, we try not to depend on having a ``working system''.  The
186findutils suite is used for diagnosis of problems, and this applies
187especially to @code{find}.  We should ensure that @code{find} still
188works on relatively broken systems, for example systems with damaged
189@file{/etc/passwd} or @code{/etc/fstab} files.  Another interesting
190example is the case where a system is a client of one or more
191unresponsive NFS servers.  On such a system, if you try to stat all
192mount points, your program will hang indefinitely, waiting for the
193remote NFS server to respond.
194
195Another interesting but unusual case is broken NFS servers and corrupt
196filesystems; sometimes they return `impossible' file modes.  It's
197important that find does not entirely fail when encountering such a
198file.
199
200
201@node Coding Conventions
202@chapter Coding Conventions
203
204Coding style documents which set out to establish a uniform look and
205feel to source code have worthy goals, for example greater ease of
206maintenance and readability.  However, I do not believe that in
207general coding style guide authors can envisage every situation, and
208it is always possible that it might on occasion be necessary to break
209the letter of the style guide in order to honour its spirit, or to
210better achieve the style guide's goals.
211
212I've certainly seen many style guides outside the free software world
213which make bald statements such as ``functions shall have exactly one
214return statement''.  The desire to ensure consistency and obviousness
215of control flow is laudable, but it is all too common for such bald
216requirements to be followed unthinkingly.  Certainly I've seen such
217coding standards result in unmaintainable code with terrible
218infelicities such as functions containing @code{if} statements nested
219nine levels deep.  I suppose such coding standards don't survive in
220free software projects because they tend to drive away potential
221contributors or tend to generate heated discussions on mailing lists.
222Equally, a nine-level-deep function in a free software program would
223quickly get refactored, assuming it is obvious what the function is
224supposed to do...
225
226Be that as it may, the approach I will take for this document is to
227explain some idioms and practices in use in the findutils source code,
228and leave it up to the reader's engineering judgement to decide which
229considerations apply to the code they are working on, and whether or
230not there is sufficient reason to ignore the guidance in current
231circumstances.
232
233
234@menu
235* Make the Compiler Find the Bugs::
236* Factor Out Repeated Code::
237* Debugging is For Users Too::
238* Don't Trust the File System Contents::
239* The File System Is Being Modified::
240@end menu
241
242@node    Make the Compiler Find the Bugs
243@section Make the Compiler Find the Bugs
244
245Finding bugs is tedious.  If I have a filesystem containing two
246million files, and a find command line should print one million of
247them, but in fact it misses out 1%, you can tell the program is
248printing the wrong result only if you know the right answer for that
249filesystem at that time.  If you don't know this, you may just not
250find out about that bug.  For this reason it is important to have a
251comprehensive test suite.
252
253The test suite is of course not the only way to find the bugs.  The
254findutils source code makes liberal use of the assert macro.  While on
255the one hand these might be a performance drain, the performance
256impact of most of these is negligible compared to the time taken to
257fetch even one sector from a disk drive.
258
259Assertions should not be used to check the results of operations which
260may be affected by the program's external environment.  For example,
261never assert that a file could be opened successfully.  Errors
262relating to problems with the program's execution environment should
263be diagnosed with a user-oriented error message.  An assertion failure
264should always denote a bug in the program.
265
266Avoid using @code{assert} to mark not-fully-implemented features of
267your code as such.  Finish the implementation, disable the code, or
268leave the unfinished version on a local branch.
269
270Several programs in the findutils suite perform self-checks.  See for
271example the function @code{pred_sanity_check} in @file{find/pred.c}.
272This is generally desirable.
273
274There are also a number of small ways in which we can help the
275compiler to find the bugs for us.
276
277@subsection Constants in Equality Testing
278
279It's a common error to write @code{=} when @code{==} is meant.
280Sometimes this happens in new code and is simply due to finger
281trouble.  Sometimes it is the result of the inadvertent deletion of a
282character.  In any case, there is a subset of cases where we can
283persuade the compiler to generate an error message when we make this
284mistake; this is where the equality test is with a constant.
285
286This is an example of a vulnerable piece of code.
287
288@example
289if (x == 2)
290 ...
291@end example
292
293A simple typo converts the above into
294
295@example
296if (x = 2)
297 ...
298@end example
299
300We've introduced a bug; the condition is always true, and the value of
301@code{x} has been changed.  However, a simple change to our practice
302would have made us immune to this problem:
303
304@example
305if (2 == x)
306 ...
307@end example
308
309Usually, the Emacs keystroke @kbd{M-t} can be used to swap the operands.
310
311
312@subsection Spelling of ASCII NUL
313
314Strings in C are just sequences of characters terminated by a NUL.
315The ASCII NUL character has the numerical value zero.  It is normally
316represented in C code as @samp{\0}.  Here is a typical piece of C
317code:
318
319@example
320*p = '\0';
321@end example
322
323Consider what happens if there is an unfortunate typo:
324
325@example
326*p = '0';
327@end example
328
329We have changed the meaning of our program and the compiler cannot
330diagnose this as an error.  Our string is no longer terminated.  Bad
331things will probably happen.  It would be better if the compiler could
332help us diagnose this problem.
333
334In C, the type of @code{'\0'} is in fact int, not char.  This provides
335us with a simple way to avoid this error.  The constant @code{0} has
336the same value and type as the constant @code{'\0'}.  However, it is
337not as vulnerable to typos.    For this reason I normally prefer to
338use this code:
339
340@example
341*p = 0;
342@end example
343
344
345@node    Factor Out Repeated Code
346@section Factor Out Repeated Code
347
348Repeated code imposes a greater maintenance burden and increases the
349exposure to bugs.  For example, if you discover that something you
350want to implement has some similarity with an existing piece of code,
351don't cut and paste it.  Instead, factor the code out.  The risk of
352cutting and pasting the code, particularly if you do this several
353times, is that you end up with several copies of the same code.
354
355If the original code had a bug, you now have N places where this needs
356to be fixed.  It's all to easy to miss some out when trying to fix the
357bug.  Equally, it's quite possible that when pasting the code into
358some function, the pasted code was not quite adapted correctly to its
359new environment.  To pick a contrived example, perhaps it modifies a
360global variable which it (that [original] code) shouldn't be touching
361in its new home.  Worse, perhaps it makes some unstated assumption about
362the nature of the input arguments which is in fact not true for the
363context of the now duplicated code.
364
365A good example of the use of refactoring in findutils is the
366@code{collect_arg} function in @file{find/parser.c}.  A less clear-cut
367but larger example is the factoring out of code which would otherwise
368have been duplicated between @file{find/oldfind.c} and
369@code{find/ftsfind.c}.
370
371The findutils test suite is comprehensive enough that refactoring code
372should not generally be a daunting prospect from a testing point of
373view.  Nevertheless there are some areas which are only
374lightly-tested:
375
376@enumerate
377@item Tests on the ages of files
378@item Code which deals with the values returned by operating system calls (for example handling of ENOENT)
379@item Code dealing with OS limits (for example, limits on path length
380or exec arguments)
381@item Code relating to features not all systems have (for example
382Solaris Doors)
383@end enumerate
384
385Please exercise caution when working in those areas.
386
387
388@node    Debugging is For Users Too
389@section Debugging is For Users Too
390
391Debug and diagnostic code is often used to verify that a program is
392working in the way its author thinks it should be.  But users are
393often uncertain about what a program is doing, too.  Exposing them a
394little more diagnostic information can help.  Much of the diagnostic
395code in @code{find}, for example, is controlled by the @samp{-D} flag,
396as opposed to C preprocessor directives.
397
398Making diagnostic messages available to users also means that the
399phrasing of the diagnostic messages becomes important, too.
400
401
402@node    Don't Trust the File System Contents
403@section Don't Trust the File System Contents
404
405People use @code{find} to search in directories created by other
406people.  Sometimes they do this to check to suspicious activity (for
407example to look for new setuid binaries).  This means that it would be
408bad if @code{find} were vulnerable to, say, a security problem
409exploitable by constructing a specially-crafted filename.  The same
410consideration would apply to @code{locate} and @code{updatedb}.
411
412Henry Spencer said this well in his fifth commandment:
413@quotation
414Thou shalt check the array bounds of all strings (indeed, all arrays),
415for surely where thou typest @samp{foo} someone someday shall type
416@samp{supercalifragilisticexpialidocious}.
417@end quotation
418
419Symbolic links can often be a problem.  If @code{find} calls
420@code{lstat} on something and discovers that it is a directory, it's
421normal for @code{find} to recurse into it.  Even if the @code{chdir}
422system call is used immediately, there is still a window of
423opportunity between the @code{lstat} and the @code{chdir} in which a
424malicious person could rename the directory and substitute a symbolic
425link to some other directory.
426
427@node    The File System Is Being Modified
428@section The File System Is Being Modified
429
430The filesystem gets modified while you are traversing it.  For,
431example, it's normal for files to get deleted while @code{find} is
432traversing a directory.  Issuing an error message seems helpful when a
433file is deleted from the one directory you are interested in, but if
434@code{find} is searching 15000 directories, such a message becomes
435less helpful.
436
437Bear in mind also that it is possible for the directory @code{find} is
438searching to be concurrently moved elsewhere in the file system,
439and that the directory in which @code{find} was started could be
440deleted.
441
442Henry Spencer's sixth commandment is also apposite here:
443@quotation
444If a function be advertised to return an error code in the event of
445difficulties, thou shalt check for that code, yea, even though the
446checks triple the size of thy code and produce aches in thy typing
447fingers, for if thou thinkest ``it cannot happen to me'', the gods
448shall surely punish thee for thy arrogance.
449@end quotation
450
451There are a lot of files out there.  They come in all dates and
452sizes.  There is a condition out there in the real world to exercise
453every bit of the code base.  So we try to test that code base before
454someone falls over a bug.
455
456
457@node Tools
458@chapter Tools
459Most of the tools required to build findutils are mentioned in the
460file @file{README-hacking}.  We also use some other tools:
461
462@table @asis
463@item System call traces
464Much of the execution time of find is spent waiting for filesystem
465operations.  A system call trace (for example, that provided by
466@code{strace}) shows what system calls are being made.   Using this
467information we can work to remove unnecessary file system operations.
468
469@item Valgrind
470Valgrind is a tool which dynamically verifies the memory accesses a
471program makes to ensure that they are valid (for example, that the
472behaviour of the program does not in any way depend on the contents of
473uninitialized memory).
474
475@item DejaGnu
476DejaGnu is the test framework used to run the findutils test suite
477(the @code{runtest} program is part of DejaGnu).  It would be ideal if
478everybody building @code{findutils} also ran the test suite, but many
479people don't have DejaGnu installed.  When changes are made to
480findutils, DejaGnu is invoked a lot. @xref{Testing}, for more
481information.
482@end table
483
484@node Using the GNU Portability Library
485@chapter Using the GNU Portability Library
486The Gnulib library (@url{https://www.gnu.org/software/gnulib/}) makes a
487variety of systems look more like a GNU/Linux system and also applies
488a bunch of automatic bug fixes and workarounds.  Some of these also
489apply to GNU/Linux systems too.  For example, the Gnulib regex
490implementation is used when we determine that we are building on a
491GNU libc system with a bug in the regex implementation.
492
493
494@section How and Why we Import the Gnulib Code
495Gnulib does not have a release process which results in a source
496tarball you can download.  Instead, the code is simply made available
497by GIT, so we import gnulib via the submodule feature.  The bootstrap
498script performs the necessary steps.
499
500Findutils does not use all the Gnulib code.  The modules we need are
501listed in the file @file{bootstrap.conf}.
502
503The upshot of all this is that we can use the findutils git repository
504to track which version of Gnulib every findutils release uses.
505
506A small number of files are installed by automake and will therefore
507vary according to which version of automake was used to generate a
508release.  This includes for example boiler-plate GNU files such as
509@file{ABOUT-NLS}, @file{INSTALL} and @file{COPYING}.
510
511
512@section How We Fix Gnulib Bugs
513Gnulib is used by quite a number of GNU projects, and this means that
514it gets plenty of testing.  Therefore there are relatively few bugs in
515the Gnulib code, but it does happen from time to time.
516
517However, since there is no waiting around for a Gnulib source release
518tarball, Gnulib bugs are generally fixed quickly.  Here is an outline
519of the way we would contribute a fix to Gnulib (assuming you know it
520is not already fixed in the current Gnulib git tree):
521
522@table @asis
523@item Check you already completed a copyright assignment for Gnulib
524@item Begin with a vanilla git tree
525Download the Findutils source code from git (or use the tree you have
526already)
527@item Run the bootstrap script
528@item Run configure
529@item Build findutils
530Build findutils and run the test suite, which should pass.  In our
531example we assume you have just noticed a bug in Gnulib, not that
532recent Gnulib changes broke the findutils regression tests.
533@item Write a test case
534If in fact Gnulib did break the findutils regression tests, you can probably
535skip this step, since you already have a test case demonstrating the problem.
536Otherwise, write a findutils test case for the bug and/or a Gnulib test case.
537@item Fix the Gnulib bug
538Make sure your editor follows symbolic links so that your changes to
539@file{gnulib/...} actually affect the files in the git working
540directory you checked out earlier.   Observe that your test now passes.
541@item Prepare a Gnulib patch
542In the gnulib subdirectory, use @code{git format-patch} to prepare the
543patch.  Follow the normal usage for checkin comments (take a look at
544the output of @code{git log}).  Check that the patch conforms with the
545GNU coding standards, and email it to the Gnulib mailing list.
546@item Wait for the patch to be applied
547Once your bug fix has been applied, you can update your gnulib
548directory from git, and then check in the change to the submodule as
549normal (you can check @code{git help submodule} for details).
550@end table
551
552There is an alternative to the method above; it is possible to store
553local diffs to be patched into gnulib beneath the
554@file{gnulib-local}.  Normally however, there is no need for this,
555since gnulib updates are very prompt.
556
557@section How to update Gnulib to latest
558With a non-dirty working tree, the command @code{make update-gnulib-to-latest}
559(or the shorter alias @code{make gnulib-sync} allows, well, to update the
560gnulib submodule.  In detail, that is:
561@enumerate
562@item Fetching the latest upstream gnulib reference.
563@item Copying the files which should stay in sync like
564@file{bootstrap} from gnulib into the findutils working tree.
565@item And finally showing the @code{git status} for the gnulib submodule
566and the above copied files.
567@end enumerate
568After that, the maintainer compares if all is correct, if the findutils build
569and run correct, and finally commits with the new gnulib version, e.g. via
570@code{git gui}.
571
572The @code{gnulib-sync} target can be run any time - after a @code{configure}
573run -, and only rejects to run if the working tree is dirty.
574
575@node Documentation
576@chapter Documentation
577
578The findutils git tree includes several different types of
579documentation.
580
581@section git change log
582The git change log for the source tree contains check-in messages
583which describe each check-in.   These have a standard format:
584
585@smallexample
586Summary of the change.
587
588(ChangeLog-style detail)
589@end smallexample
590
591Here, the format of the detail part follows the standard GNU ChangeLog
592style, but without whitespace in the left margin and without
593author/date headers.   Take a look at the output of @code{git log} to
594see some examples.   The README-hacking file also contains an example
595with an explanation.
596
597@section User Documentation
598User-oriented documentation is provided as manual pages and in
599Texinfo.  See
600@ref{Introduction,,Introduction,find,The Findutils manual}.
601
602Please make sure both sets of documentation are updated if you make a
603change to the code.  The GNU coding standards do not normally call for
604maintaining manual pages on the grounds of effort duplication.
605However, the manual page format is more convenient for quick
606reference, and so it's worth maintaining both types of documentation.
607However, the manual pages are normally rather more terse than the
608Texinfo documentation.  The manual pages are suitable for reference
609use, but the Texinfo manual should also include introductory and
610tutorial material.
611
612We make the user documentation available on the web, on the GNU
613project web site.  These web pages are source-controlled via CVS
614(still!).  If you are a member of the @samp{findutils} project on
615Savannah you should be able to check the web pages out like this
616(@samp{$USER} is a placeholder for your Savannah username):
617
618@smallexample
619cvs -d  :ext:$USER@@cvs.savannah.gnu.org:/web/findutils checkout findutils/manual
620@end smallexample
621
622You can automatically update the documentation in this repository
623using the script @samp{build-aux/update-online-manual.sh} in the
624findutils Git repository.
625
626@section Build Guidance
627
628@table @file
629@item ABOUT-NLS
630Describes the Free Translation Project, the translation status of
631various GNU projects, and how to participate by translating an
632application.
633@item AUTHORS
634Lists the authors of findutils.
635@item COPYING
636The copyright license covering findutils; currently, the GNU GPL,
637version 3.
638@item INSTALL
639Generic installation instructions for installing GNU programs.
640@item README
641Information about how to compile findutils in particular
642@item README-hacking
643Describes how to build findutils from the code in git.
644@item THANKS
645Thanks for people who contributed to findutils.  Generally, if
646someone's contribution was significant enough to need a copyright
647assignment, their name should go in here.
648@item TODO
649Mainly obsolete.  Please add bugs to the Savannah bug tracker instead
650of adding entries to this file.
651@end table
652
653
654@section Release Information
655@table @file
656@item NEWS
657Enumerates the user-visible change in each release.  Typical changes
658are fixed bugs, functionality changes and documentation changes.
659Include the date when a release is made.
660@item ChangeLog
661This file enumerates all changes to the findutils source code (with
662the possible exception of @file{.cvsignore} and @code{.gitignore}
663changes).  The level of detail used for this file should be sufficient
664to answer the questions ``what changed?'' and ``why was it changed?''.
665The file is generated from the git commit messages during @code{make dist}.
666If a change fixes a bug, always give the bug reference number in the
667@file{NEWS} file and of course also in the checkin message.
668In general, it should be possible to enumerate all
669material changes to a function by searching for its name in
670@file{ChangeLog}.  Mention when each release is made.
671@end table
672
673@node Testing
674@chapter Testing
675This chapter will explain the general procedures for adding tests to
676the test suite, and the functions defined in the findutils-specific
677DejaGnu configuration.  Where appropriate references will be made to
678the DejaGnu documentation.
679
680@node Bugs
681@chapter Bugs
682
683Bugs are logged in the Savannah bug tracker
684@url{https://savannah.gnu.org/bugs/?group=findutils}.  The tracker
685offers several fields but their use is largely obvious.  The
686life-cycle of a bug is like this:
687
688
689@table @asis
690@item Open
691Someone, usually a maintainer, a distribution maintainer or a user,
692creates a bug by filling in the form.   They fill in field values as
693they see fit.  This will generate an email to
694@email{bug-findutils@@gnu.org}.
695
696@item Triage
697The bug hangs around with @samp{Status=None} until someone begins to
698work on it.  At that point they set the ``Assigned To'' field and will
699sometimes set the status to @samp{In Progress}, especially if the bug
700will take a while to fix.
701
702@item Non-bugs
703Quite a lot of reports are not actually bugs; for these the usual
704procedure is to explain why the problem is not a bug, set the status
705to @samp{Invalid} and close the bug.   Make sure you set the
706@samp{Assigned to} field to yourself before closing the bug.
707
708@item Fixing
709When you commit a bug fix into git (or in the case of a contributed
710patch, commit the change), mark the bug as @samp{Fixed}.  Make sure
711you include a new test case where this is relevant.  If you can figure
712out which releases are affected, please also set the @samp{Release}
713field to the earliest release which is affected by the bug.
714Indicate which source branch the fix is included in (for example,
7154.2.x or 4.3.x).  Don't close the bug yet.
716
717@item Release
718When a release is made which includes the bug fix, make sure the bug
719is listed in the NEWS file.  Once the release is made, fill in the
720@samp{Fixed Release} field and close the bug.
721@end table
722
723
724@node Distributions
725@chapter Distributions
726Almost all GNU/Linux distributions include findutils, but only some of
727them have a package maintainer who is a member of the mailing list.
728Distributions don't often feed back patches to the
729@email{bug-findutils@@gnu.org} list, but on the other hand many of
730their patches relate only to standards for file locations and so
731forth, and are therefore distribution specific.  On an irregular basis
732I check the current patches being used by one or two distributions,
733but the total number of GNU/Linux distributions is large enough that
734we could not hope to cover them all.
735
736Often, bugs are raised against a distribution's bug tracker instead of
737GNU's.    Periodically (about every six months) I take a look at some
738of the more accessible bug trackers to indicate which bugs have been
739fixed upstream.
740
741Many distributions include both findutils and the slocate package,
742which provides a replacement @code{locate}.
743
744
745@node Internationalisation
746@chapter Internationalisation
747Translation is essentially automated from the maintainer's point of
748view.  The TP mails the maintainer when a new PO file is available,
749and we just download it and check it in.  The @file{bootstrap} script
750copies @file{.po} files into the working tree.  For more information,
751please see
752@url{https://translationproject.org/domain/findutils.html}.
753
754
755@node Security
756@chapter Security
757
758See @ref{Security Considerations, ,Security Considerations,find,The
759Findutils manual}, for a full description of the findutils approach to
760security considerations and discussion of particular tools.
761
762If someone reports a security bug publicly, we should fix this as
763rapidly as possible.  If necessary, this can mean issuing a fixed
764release containing just the one bug fix.  We try to avoid issuing
765releases which include both significant security fixes and functional
766changes.
767
768Where someone reports a security problem privately, we generally try
769to construct and test a patch without pushing the intermediate code to
770the public repository.
771
772Once everything has been tested, this allows us to make a release and
773push the patch.  The advantage of doing things this way is that we
774avoid situations where people watching for git commits can figure out
775and exploit a security problem before a fixed release is available.
776
777It's important that security problems be fixed promptly, but don't
778rush so much that things go wrong.  Make sure the new release really
779fixes the problem.  It's usually best not to include functional
780changes in your security-fix release.
781
782If the security problem is serious, send an alert to
783@email{vendor-sec@@lst.de}.  The members of the list include most
784GNU/Linux distributions.  The point of doing this is to allow them to
785prepare to release your security fix to their customers, once the fix
786becomes available.    Here is an example alert:-
787
788@smallexample
789GNU findutils heap buffer overrun (potential privilege escalation)
790
791
792
793I. BACKGROUND
794=============
795
796GNU findutils is a set of programs which search for files on Unix-like
797systems.  It is maintained by the GNU Project of the Free Software
798Foundation.  For more information, see
799@url{https://www.gnu.org/software/findutils}.
800
801
802II. DESCRIPTION
803===============
804
805When GNU locate reads filenames from an old-format locate database,
806they are read into a fixed-length buffer allocated on the heap.
807Filenames longer than the 1026-byte buffer can cause a buffer overrun.
808The overrunning data can be chosen by any person able to control the
809names of filenames created on the local system.  This will normally
810include all local users, but in many cases also remote users (for
811example in the case of FTP servers allowing uploads).
812
813III. ANALYSIS
814=============
815
816Findutils supports three different formats of locate database, its
817native format "LOCATE02", the slocate variant of LOCATE02, and a
818traditional ("old") format that locate uses on other Unix systems.
819
820When locate reads filenames from a LOCATE02 database (the default
821format), the buffer into which data is read is automatically extended
822to accommodate the length of the filenames.
823
824This automatic buffer extension does not happen for old-format
825databases.  Instead a 1026-byte buffer is used.  When a longer
826pathname appears in the locate database, the end of this buffer is
827overrun.  The buffer is allocated on the heap (not the stack).
828
829If the locate database is in the default LOCATE02 format, the locate
830program does perform automatic buffer extension, and the program is
831not vulnerable to this problem.  The software used to build the
832old-format locate database is not itself vulnerable to the same
833attack.
834
835Most installations of GNU findutils do not use the old database
836format, and so will not be vulnerable.
837
838
839IV. DETECTION
840=============
841
842Software
843--------
844All existing releases of findutils are affected.
845
846
847Installations
848-------------
849
850To discover the longest path name on a given system, you can use the
851following command (requires GNU findutils and GNU coreutils):
852
853@verbatim
854find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
855@end verbatim
856
857V. EXAMPLE
858==========
859
860This section includes a shell script which determines which of a list
861of locate binaries is vulnerable to the problem.  The shell script has
862been tested only on glibc based systems having a mktemp binary.
863
864NOTE: This script deliberately overruns the buffer in order to
865determine if a binary is affected.  Therefore running it on your
866system may have undesirable effects.  We recommend that you read the
867script before running it.
868
869@verbatim
870#! /bin/sh
871set +m
872if vanilla_db="$(mktemp nicedb.XXXXXX)" ; then
873    if updatedb --prunepaths="" --old-format --localpaths="/tmp" \
874	--output="$@{vanilla_db@}" ; then
875	true
876    else
877	rm -f "$@{vanilla_db@}"
878	vanilla_db=""
879	echo "Failed to create old-format locate database; skipping the sanity checks" >&2
880    fi
881fi
882
883make_overrun_db() @{
884    # Start with a valid database
885    cat "$@{vanilla_db@}"
886    # Make the final entry really long
887    dd if=/dev/zero  bs=1 count=1500 2>/dev/null | tr '\000' 'x'
888@}
889
890
891
892ulimit -c 0
893
894usage() @{ echo "usage: $0 binary [binary...]" >&2; exit $1; @}
895[ $# -eq 0 ] && usage 1
896
897bad=""
898good=""
899ugly=""
900if dbfile="$(mktemp nasty.XXXXXX)"
901then
902    make_overrun_db > "$dbfile"
903    for locate ; do
904      ver="$locate = $("$locate"  --version | head -1)"
905      if [ -z "$vanilla_db" ] || "$locate" -d "$vanilla_db" "" >/dev/null ; then
906	  "$locate" -d "$dbfile" "" >/dev/null
907	  if [ $? -gt 128 ] ; then
908	      bad="$bad
909vulnerable: $ver"
910	  else
911	      good="$good
912good: $ver"
913	  fi
914       else
915	  # the regular locate failed
916	  ugly="$ugly
917buggy, may or may not be vulnerable: $ver"
918       fi
919    done
920    rm -f "$@{dbfile@}" "$@{vanilla_db@}"
921    # good: unaffected.  bad: affected (vulnerable).
922    # ugly: doesn't even work for a normal old-format database.
923    echo "$good"
924    echo "$bad"
925    echo "$ugly"
926else
927  exit 1
928fi
929@end verbatim
930
931
932
933
934VI. VENDOR RESPONSE
935===================
936
937The GNU project discovered the problem while 'locate' was being worked
938on; this is the first public announcement of the problem.
939
940The GNU findutils mantainer has issued a patch as p[art of this
941announcement.  The patch appears below.
942
943A source release of findutils-4.2.31 will be issued on 2007-05-30.
944That release will of course include the patch.  The patch will be
945committed to the public CVS repository at the same time.  Public
946announcements of the release, including a description of the bug, will
947be made at the same time as the release.
948
949A release of findutils-4.3.x will follow and will also include the
950patch.
951
952
953VII. PATCH
954==========
955
956This patch should apply to findutils-4.2.23 and later.
957Findutils-4.2.23 was released almost two years ago.
958@verbatim
959Index: locate/locate.c
960===================================================================
961RCS file: /cvsroot/findutils/findutils/locate/locate.c,v
962retrieving revision 1.58.2.2
963diff -u -p -r1.58.2.2 locate.c
964--- locate/locate.c	22 Apr 2007 16:57:42 -0000	1.58.2.2
965+++ locate/locate.c	28 May 2007 10:18:16 -0000
966@@@@ -124,9 +124,9 @@@@ extern int errno;
967
968 #include "locatedb.h"
969 #include <getline.h>
970-#include "../gnulib/lib/xalloc.h"
971-#include "../gnulib/lib/error.h"
972-#include "../gnulib/lib/human.h"
973+#include "xalloc.h"
974+#include "error.h"
975+#include "human.h"
976 #include "dirname.h"
977 #include "closeout.h"
978 #include "nextelem.h"
979@@@@ -468,10 +468,36 @@@@ visit_justprint_unquoted(struct process_
980   return VISIT_CONTINUE;
981 @}
982
983+static void
984+toolong (struct process_data *procdata)
985+@{
986+  error (EXIT_FAILURE, 0,
987+	 _("locate database %s contains a "
988+	   "filename longer than locate can handle"),
989+	 procdata->dbfile);
990+@}
991+
992+static void
993+extend (struct process_data *procdata, size_t siz1, size_t siz2)
994+@{
995+  /* Figure out if the addition operation is safe before performing it. */
996+  if (SIZE_MAX - siz1 < siz2)
997+    @{
998+      toolong (procdata);
999+    @}
1000+  else if (procdata->pathsize < (siz1+siz2))
1001+    @{
1002+      procdata->pathsize = siz1+siz2;
1003+      procdata->original_filename = x2nrealloc (procdata->original_filename,
1004+						&procdata->pathsize,
1005+						1);
1006+    @}
1007+@}
1008+
1009 static int
1010 visit_old_format(struct process_data *procdata, void *context)
1011 @{
1012-  register char *s;
1013+  register size_t i;
1014   (void) context;
1015
1016   /* Get the offset in the path where this path info starts.  */
1017@@@@ -479,20 +505,35 @@@@ visit_old_format(struct process_data *pr
1018     procdata->count += getw (procdata->fp) - LOCATEDB_OLD_OFFSET;
1019   else
1020     procdata->count += procdata->c - LOCATEDB_OLD_OFFSET;
1021+  assert(procdata->count > 0);
1022
1023-  /* Overlay the old path with the remainder of the new.  */
1024-  for (s = procdata->original_filename + procdata->count;
1025+  /* Overlay the old path with the remainder of the new.  Read
1026+   * more data until we get to the next filename.
1027+   */
1028+  for (i=procdata->count;
1029        (procdata->c = getc (procdata->fp)) > LOCATEDB_OLD_ESCAPE;)
1030-    if (procdata->c < 0200)
1031-      *s++ = procdata->c;		/* An ordinary character.  */
1032-    else
1033-      @{
1034-	/* Bigram markers have the high bit set. */
1035-	procdata->c &= 0177;
1036-	*s++ = procdata->bigram1[procdata->c];
1037-	*s++ = procdata->bigram2[procdata->c];
1038-      @}
1039-  *s-- = '\0';
1040+    @{
1041+      if (procdata->c < 0200)
1042+	@{
1043+	  /* An ordinary character. */
1044+	  extend (procdata, i, 1u);
1045+	  procdata->original_filename[i++] = procdata->c;
1046+	@}
1047+      else
1048+	@{
1049+	  /* Bigram markers have the high bit set. */
1050+	  extend (procdata, i, 2u);
1051+	  procdata->c &= 0177;
1052+	  procdata->original_filename[i++] = procdata->bigram1[procdata->c];
1053+	  procdata->original_filename[i++] = procdata->bigram2[procdata->c];
1054+	@}
1055+    @}
1056+
1057+  /* Consider the case where we executed the loop body zero times; we
1058+   * still need space for the terminating null byte.
1059+   */
1060+  extend (procdata, i, 1u);
1061+  procdata->original_filename[i] = 0;
1062
1063   procdata->munged_filename = procdata->original_filename;
1064@end verbatim
1065
1066
1067VIII. THANKS
1068============
1069
1070Thanks to Rob Holland <rob@@inversepath.com> and Tavis Ormandy.
1071
1072
1073VIII. CVE INFORMATION
1074=====================
1075
1076No CVE candidate number has yet been assigned for this vulnerability.
1077If someone provides one, I will include it in the public announcement
1078and change logs.
1079@end smallexample
1080
1081The original announcement above was sent out with a cleartext PGP
1082signature, of course, but that has been omitted from the example.
1083
1084Once a fixed release is available, announce the new release using the
1085normal channels.  Any CVE number assigned for the problem should be
1086included in the @file{ChangeLog} and @file{NEWS} entries. See
1087@url{https://cve.mitre.org/} for an explanation of CVE numbers.
1088
1089
1090
1091@node Making Releases
1092@chapter Making Releases
1093This section will explain how to make a findutils release.   For the
1094time being here is a terse description of the main steps:
1095
1096@set RELEASE X.Y.Z
1097@set RELTAG v@value{RELEASE}
1098
1099@enumerate
1100@item Commit changes; make sure your working directory has no
1101uncommitted changes.
1102@item Update translation files; re-run bootstrap to download the
1103newest @samp{.po} files.
1104@item Make sure compiler warnings would block the release; re-run
1105@samp{configure} with the options
1106@code{--enable-compiler-warnings --enable-compiler-warnings-are-errors}.
1107@item Test; make sure that all changes you have made have tests, and
1108that the tests pass.
1109Verify this with @code{env RUN_EXPENSIVE_TESTS=yes make distcheck}.
1110@c The RUN_EXPENSIVE_TESTS environment variable is checked in init.cfg.
1111@item Bugs; make sure all Savannah bug entries fixed in this release
1112are marked as fixed in Savannah.  Optionally close them too to save
1113duplicate work (otherwise, close them after the release is uploaded).
1114@item Add new release in Savannah field values; see the @code{Bugs >
1115Edit Field Values} menu item.  Add a field value for the release you
1116are about to make so that users can report bugs in it.
1117@item Update version; make sure that the NEWS file
1118is updated with the new release number (and checked in).
1119@c There is no longer any need to update configure.ac, since it no
1120@c longer contains version information.
1121@item Tag the release; findutils releases are tagged like this for
1122example: v4.5.5.  You can create a tag with the a command like this:
1123@c we use @example here because @value will not work within @code or @samp.
1124@example
1125git tag -s -m "Findutils release @value{RELEASE}" @value{RELTAG}
1126@end example
1127@noindent
1128@item Build the release tarball; do this with @code{make distcheck}.
1129Copy the tarball somewhere safe.
1130@item Merge; if the release (and signed tag) were made on a
1131local branch, merge the branch to your local master.
1132@item Push; push your master to origin/master.
1133@item Push the new release tag; assuming that the name of your remote is
1134@samp{origin}, this is:
1135@example
1136git push origin tag @value{RELTAG}
1137@end example
1138@item Prepare the upload and upload it.
1139You can do this with
1140@c we use @example here because @value will not work within @code or @samp.
1141@example
1142build-aux/gnupload --to ftp.gnu.org:findutils findutils-@value{RELEASE}.tar.xz
1143@end example
1144@noindent
1145Use @code{alpha.gnu.org:findutils} for an alpha or beta release.
1146@xref{Automated FTP Uploads, ,Automated FTP
1147Uploads, maintain, Information for Maintainers of GNU Software},
1148for detailed upload instructions.
1149@item Check the FTP upload worked; you can look for an email from the
1150robot or check the contents of the actual FTP site.
1151@item Make a release announcement; include an extract from the NEWS
1152file which explains what's changed.  Announcements for test releases
1153should just go to @email{bug-findutils@@gnu.org}.  Announcements for
1154stable releases should go to @email{info-gnu@@gnu.org} as well.
1155@item Post-release administrativa: add a new dummy release header in NEWS:
1156
1157@code{* Major changes in release ?.?.?, YYYY-MM-DD}
1158
1159and update the @code{old_NEWS_hash} in @file{cfg.mk} with
1160@code{make update-NEWS-hash}.
1161Commit both changes.
1162@c make update-NEWS-hash supports make news-check but we normally
1163@c don't do that (and I'm not sure that the current NEWS file would
1164@c pass the check anyway).
1165@item Close bugs; any bugs recorded on Savannah which were fixed in this
1166release should now be marked as closed if there were not already.
1167Update the @samp{Fixed Release} field of these bugs appropriately and
1168make sure the @samp{Assigned to} field is populated.
1169@end enumerate
1170
1171
1172@node GNU Free Documentation License
1173@appendix GNU Free Documentation License
1174@include fdl.texi
1175
1176@bye
1177
1178@comment texi related words used by Emacs' spell checker ispell.el
1179
1180@comment LocalWords: texinfo setfilename settitle setchapternewpage
1181@comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt
1182@comment LocalWords: filll dir samp dfn noindent xref pxref
1183@comment LocalWords: var deffn texi deffnx itemx emph asis
1184@comment LocalWords: findex smallexample subsubsection cindex
1185@comment LocalWords: dircategory direntry itemize
1186
1187@comment other words used by Emacs' spell checker ispell.el
1188@comment LocalWords: README fred updatedb xargs Plett Rendell akefile
1189@comment LocalWords: args grep Filesystems fo foo fOo wildcards iname
1190@comment LocalWords: ipath regex iregex expr fubar regexps
1191@comment LocalWords: metacharacters macs sr sc inode lname ilname
1192@comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime
1193@comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm
1194@comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid
1195@comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth
1196@comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs
1197@comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall
1198@comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP
1199@comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron
1200@comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram
1201@comment LocalWords: bigrams cd chmod comp crc CVS dbfile eof
1202@comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX
1203@comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME
1204@comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks
1205@comment LocalWords: ois ok Pinard printindex proc procs prunefs
1206@comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str
1207@comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel
1208@comment LocalWords: wildcard zlogout basename execdir wholename iwholename
1209@comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX
1210