1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@setfilename find-maint.info 4@include versionmaint.texi 5 6@settitle Maintaining GNU Findutils @value{VERSION} 7@c For double-sided printing, uncomment: 8@c @setchapternewpage odd 9@c %**end of header 10 11@iftex 12@finalout 13@end iftex 14 15@dircategory GNU organization 16@direntry 17* Maintaining Findutils: (find-maint). Maintaining GNU findutils 18@end direntry 19 20@copying 21This manual explains how GNU findutils is maintained, how changes should 22be made and tested, and what resources exist to help developers. 23 24This document corresponds to version @value{VERSION} of the GNU findutils. 25 26Copyright @copyright{} 2007--2021 Free Software Foundation, Inc. 27 28@quotation 29Permission is granted to copy, distribute and/or modify this document 30under the terms of the GNU Free Documentation License, Version 1.3 or 31any later version published by the Free Software Foundation; with no 32Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 33A copy of the license is included in the section entitled 34``GNU Free Documentation License''. 35@end quotation 36@end copying 37 38@titlepage 39@title Maintaining GNU Findutils 40@subtitle version @value{VERSION}, @value{UPDATED} 41@author by James Youngman 42 43@page 44@vskip 0pt plus 1filll 45@insertcopying 46@end titlepage 47 48@contents 49 50@ifnottex 51@node Top, Introduction, (dir), (dir) 52@top Maintaining GNU Findutils 53 54@insertcopying 55@end ifnottex 56 57@menu 58* Introduction:: 59* Maintaining GNU Programs:: 60* Design Issues:: 61* Coding Conventions:: 62* Tools:: 63* Using the GNU Portability Library:: 64* Documentation:: 65* Testing:: 66* Bugs:: 67* Distributions:: 68* Internationalisation:: 69* Security:: 70* Making Releases:: 71* GNU Free Documentation License:: 72@end menu 73 74 75 76 77 78@node Introduction 79@chapter Introduction 80 81This document explains how to contribute to and maintain GNU 82Findutils. It concentrates on developer-specific issues. For 83information about how to use the software please refer to 84@xref{Introduction, ,Introduction,find,The Findutils manual}. 85 86This manual aims to be useful without necessarily being verbose. It's 87also a recent document, so there will be a many areas in which 88improvements can be made. If you find that the document misses out 89important information or any part of the document is be so terse as to 90be unuseful, please ask for help on the @email{bug-findutils@@gnu.org} 91mailing list. We'll try to improve this document too. 92 93 94@node Maintaining GNU Programs 95@chapter Maintaining GNU Programs 96 97GNU Findutils is part of the GNU Project and so there are a number of 98documents which set out standards for the maintenance of GNU 99software. 100 101@table @file 102@item standards.texi 103GNU Project Coding Standards. All changes to findutils should comply 104with these standards. In some areas we go somewhat beyond the 105requirements of the standards, but these cases are explained in this 106manual. 107@item maintain.texi 108Information for Maintainers of GNU Software. This document provides 109guidance for GNU maintainers. Everybody with commit access should 110read this document. Everybody else is welcome to do so too, of 111course. 112@end table 113 114 115 116@node Design Issues 117@chapter Design Issues 118 119The findutils package is installed on many many systems, usually as a 120fundamental component. The programs in the package are often used in 121order to successfully boot or fix the system. 122 123This fact means that for findutils we bear in mind considerations that 124may not apply so much as for other packages. For example, the fact 125that findutils is often a base component motivates us to 126@itemize 127@item Limit dependencies on libraries 128@item Avoid dependencies on other large packages (for example, interpreters) 129@item Be conservative when making changes to the 'stable' release branch 130@end itemize 131 132All those considerations come before functionality. Functional 133enhancements are still made to findutils, but these are almost 134exclusively introduced in the 'development' release branch, to allow 135extensive testing and proving. 136 137Sometimes it is useful to have a priority list to provide guidance 138when making design trade-offs. For findutils, that priority list is: 139 140@enumerate 141@item Correctness 142@item Standards compliance 143@item Security 144@item Backward compatibility 145@item Performance 146@item Functionality 147@end enumerate 148 149For example, we support the @code{-exec} action because POSIX 150compliance requires this, even though there are security problems with 151it and we would otherwise prefer people to use @code{-execdir}. There 152are also cases where some performance is sacrificed in the name of 153security. For example, the sanity checks that @code{find} performs 154while traversing a directory tree may slow it down. We adopt 155functional changes, and functional changes are allowed to make 156@code{find} slower, but only if there is no detectable impact on users 157who don't use the feature. 158 159Backward-incompatible changes do get made in order to comply with 160standards (for example the behaviour of @code{-perm -...} changed in 161order to comply with POSIX). However, they don't get made in order to 162provide better ease of use; for example the semantics of @code{-size 163-2G} are almost always unexpected by users, but we retain the current 164behaviour because of backward compatibility and for its similarity to 165the block-rounding behaviour of @code{-size -30}. We might introduce 166a change which does not have the unfortunate rounding behaviour, but 167we would choose another syntax (for example @code{-size '<2G'}) for 168this. 169 170In a general sense, we try to do test-driven development of the 171findutils code; that is, we try to implement test cases for new 172features and bug fixes before modifying the code to make the test 173pass. Some features of the code are tested well, but the test 174coverage for other features is less good. If you are about to modify 175the code for a predicate and aren't sure about the test coverage, use 176@code{grep} on the test directories and measure the coverage with 177@code{lcov} or another test coverage tool. 178 179You should be able to use the @code{coverage} Makefile target (it's 180defined in @code{maint.mk} to generate a test coverage report for 181findutils. Due to limitations in @code{lcov}, this only works if 182your build directory is the same asthe source directory (that is, 183you're not using a VPATH build configuration). 184 185Lastly, we try not to depend on having a ``working system''. The 186findutils suite is used for diagnosis of problems, and this applies 187especially to @code{find}. We should ensure that @code{find} still 188works on relatively broken systems, for example systems with damaged 189@file{/etc/passwd} or @code{/etc/fstab} files. Another interesting 190example is the case where a system is a client of one or more 191unresponsive NFS servers. On such a system, if you try to stat all 192mount points, your program will hang indefinitely, waiting for the 193remote NFS server to respond. 194 195Another interesting but unusual case is broken NFS servers and corrupt 196filesystems; sometimes they return `impossible' file modes. It's 197important that find does not entirely fail when encountering such a 198file. 199 200 201@node Coding Conventions 202@chapter Coding Conventions 203 204Coding style documents which set out to establish a uniform look and 205feel to source code have worthy goals, for example greater ease of 206maintenance and readability. However, I do not believe that in 207general coding style guide authors can envisage every situation, and 208it is always possible that it might on occasion be necessary to break 209the letter of the style guide in order to honour its spirit, or to 210better achieve the style guide's goals. 211 212I've certainly seen many style guides outside the free software world 213which make bald statements such as ``functions shall have exactly one 214return statement''. The desire to ensure consistency and obviousness 215of control flow is laudable, but it is all too common for such bald 216requirements to be followed unthinkingly. Certainly I've seen such 217coding standards result in unmaintainable code with terrible 218infelicities such as functions containing @code{if} statements nested 219nine levels deep. I suppose such coding standards don't survive in 220free software projects because they tend to drive away potential 221contributors or tend to generate heated discussions on mailing lists. 222Equally, a nine-level-deep function in a free software program would 223quickly get refactored, assuming it is obvious what the function is 224supposed to do... 225 226Be that as it may, the approach I will take for this document is to 227explain some idioms and practices in use in the findutils source code, 228and leave it up to the reader's engineering judgement to decide which 229considerations apply to the code they are working on, and whether or 230not there is sufficient reason to ignore the guidance in current 231circumstances. 232 233 234@menu 235* Make the Compiler Find the Bugs:: 236* Factor Out Repeated Code:: 237* Debugging is For Users Too:: 238* Don't Trust the File System Contents:: 239* The File System Is Being Modified:: 240@end menu 241 242@node Make the Compiler Find the Bugs 243@section Make the Compiler Find the Bugs 244 245Finding bugs is tedious. If I have a filesystem containing two 246million files, and a find command line should print one million of 247them, but in fact it misses out 1%, you can tell the program is 248printing the wrong result only if you know the right answer for that 249filesystem at that time. If you don't know this, you may just not 250find out about that bug. For this reason it is important to have a 251comprehensive test suite. 252 253The test suite is of course not the only way to find the bugs. The 254findutils source code makes liberal use of the assert macro. While on 255the one hand these might be a performance drain, the performance 256impact of most of these is negligible compared to the time taken to 257fetch even one sector from a disk drive. 258 259Assertions should not be used to check the results of operations which 260may be affected by the program's external environment. For example, 261never assert that a file could be opened successfully. Errors 262relating to problems with the program's execution environment should 263be diagnosed with a user-oriented error message. An assertion failure 264should always denote a bug in the program. 265 266Avoid using @code{assert} to mark not-fully-implemented features of 267your code as such. Finish the implementation, disable the code, or 268leave the unfinished version on a local branch. 269 270Several programs in the findutils suite perform self-checks. See for 271example the function @code{pred_sanity_check} in @file{find/pred.c}. 272This is generally desirable. 273 274There are also a number of small ways in which we can help the 275compiler to find the bugs for us. 276 277@subsection Constants in Equality Testing 278 279It's a common error to write @code{=} when @code{==} is meant. 280Sometimes this happens in new code and is simply due to finger 281trouble. Sometimes it is the result of the inadvertent deletion of a 282character. In any case, there is a subset of cases where we can 283persuade the compiler to generate an error message when we make this 284mistake; this is where the equality test is with a constant. 285 286This is an example of a vulnerable piece of code. 287 288@example 289if (x == 2) 290 ... 291@end example 292 293A simple typo converts the above into 294 295@example 296if (x = 2) 297 ... 298@end example 299 300We've introduced a bug; the condition is always true, and the value of 301@code{x} has been changed. However, a simple change to our practice 302would have made us immune to this problem: 303 304@example 305if (2 == x) 306 ... 307@end example 308 309Usually, the Emacs keystroke @kbd{M-t} can be used to swap the operands. 310 311 312@subsection Spelling of ASCII NUL 313 314Strings in C are just sequences of characters terminated by a NUL. 315The ASCII NUL character has the numerical value zero. It is normally 316represented in C code as @samp{\0}. Here is a typical piece of C 317code: 318 319@example 320*p = '\0'; 321@end example 322 323Consider what happens if there is an unfortunate typo: 324 325@example 326*p = '0'; 327@end example 328 329We have changed the meaning of our program and the compiler cannot 330diagnose this as an error. Our string is no longer terminated. Bad 331things will probably happen. It would be better if the compiler could 332help us diagnose this problem. 333 334In C, the type of @code{'\0'} is in fact int, not char. This provides 335us with a simple way to avoid this error. The constant @code{0} has 336the same value and type as the constant @code{'\0'}. However, it is 337not as vulnerable to typos. For this reason I normally prefer to 338use this code: 339 340@example 341*p = 0; 342@end example 343 344 345@node Factor Out Repeated Code 346@section Factor Out Repeated Code 347 348Repeated code imposes a greater maintenance burden and increases the 349exposure to bugs. For example, if you discover that something you 350want to implement has some similarity with an existing piece of code, 351don't cut and paste it. Instead, factor the code out. The risk of 352cutting and pasting the code, particularly if you do this several 353times, is that you end up with several copies of the same code. 354 355If the original code had a bug, you now have N places where this needs 356to be fixed. It's all to easy to miss some out when trying to fix the 357bug. Equally, it's quite possible that when pasting the code into 358some function, the pasted code was not quite adapted correctly to its 359new environment. To pick a contrived example, perhaps it modifies a 360global variable which it (that [original] code) shouldn't be touching 361in its new home. Worse, perhaps it makes some unstated assumption about 362the nature of the input arguments which is in fact not true for the 363context of the now duplicated code. 364 365A good example of the use of refactoring in findutils is the 366@code{collect_arg} function in @file{find/parser.c}. A less clear-cut 367but larger example is the factoring out of code which would otherwise 368have been duplicated between @file{find/oldfind.c} and 369@code{find/ftsfind.c}. 370 371The findutils test suite is comprehensive enough that refactoring code 372should not generally be a daunting prospect from a testing point of 373view. Nevertheless there are some areas which are only 374lightly-tested: 375 376@enumerate 377@item Tests on the ages of files 378@item Code which deals with the values returned by operating system calls (for example handling of ENOENT) 379@item Code dealing with OS limits (for example, limits on path length 380or exec arguments) 381@item Code relating to features not all systems have (for example 382Solaris Doors) 383@end enumerate 384 385Please exercise caution when working in those areas. 386 387 388@node Debugging is For Users Too 389@section Debugging is For Users Too 390 391Debug and diagnostic code is often used to verify that a program is 392working in the way its author thinks it should be. But users are 393often uncertain about what a program is doing, too. Exposing them a 394little more diagnostic information can help. Much of the diagnostic 395code in @code{find}, for example, is controlled by the @samp{-D} flag, 396as opposed to C preprocessor directives. 397 398Making diagnostic messages available to users also means that the 399phrasing of the diagnostic messages becomes important, too. 400 401 402@node Don't Trust the File System Contents 403@section Don't Trust the File System Contents 404 405People use @code{find} to search in directories created by other 406people. Sometimes they do this to check to suspicious activity (for 407example to look for new setuid binaries). This means that it would be 408bad if @code{find} were vulnerable to, say, a security problem 409exploitable by constructing a specially-crafted filename. The same 410consideration would apply to @code{locate} and @code{updatedb}. 411 412Henry Spencer said this well in his fifth commandment: 413@quotation 414Thou shalt check the array bounds of all strings (indeed, all arrays), 415for surely where thou typest @samp{foo} someone someday shall type 416@samp{supercalifragilisticexpialidocious}. 417@end quotation 418 419Symbolic links can often be a problem. If @code{find} calls 420@code{lstat} on something and discovers that it is a directory, it's 421normal for @code{find} to recurse into it. Even if the @code{chdir} 422system call is used immediately, there is still a window of 423opportunity between the @code{lstat} and the @code{chdir} in which a 424malicious person could rename the directory and substitute a symbolic 425link to some other directory. 426 427@node The File System Is Being Modified 428@section The File System Is Being Modified 429 430The filesystem gets modified while you are traversing it. For, 431example, it's normal for files to get deleted while @code{find} is 432traversing a directory. Issuing an error message seems helpful when a 433file is deleted from the one directory you are interested in, but if 434@code{find} is searching 15000 directories, such a message becomes 435less helpful. 436 437Bear in mind also that it is possible for the directory @code{find} is 438searching to be concurrently moved elsewhere in the file system, 439and that the directory in which @code{find} was started could be 440deleted. 441 442Henry Spencer's sixth commandment is also apposite here: 443@quotation 444If a function be advertised to return an error code in the event of 445difficulties, thou shalt check for that code, yea, even though the 446checks triple the size of thy code and produce aches in thy typing 447fingers, for if thou thinkest ``it cannot happen to me'', the gods 448shall surely punish thee for thy arrogance. 449@end quotation 450 451There are a lot of files out there. They come in all dates and 452sizes. There is a condition out there in the real world to exercise 453every bit of the code base. So we try to test that code base before 454someone falls over a bug. 455 456 457@node Tools 458@chapter Tools 459Most of the tools required to build findutils are mentioned in the 460file @file{README-hacking}. We also use some other tools: 461 462@table @asis 463@item System call traces 464Much of the execution time of find is spent waiting for filesystem 465operations. A system call trace (for example, that provided by 466@code{strace}) shows what system calls are being made. Using this 467information we can work to remove unnecessary file system operations. 468 469@item Valgrind 470Valgrind is a tool which dynamically verifies the memory accesses a 471program makes to ensure that they are valid (for example, that the 472behaviour of the program does not in any way depend on the contents of 473uninitialized memory). 474 475@item DejaGnu 476DejaGnu is the test framework used to run the findutils test suite 477(the @code{runtest} program is part of DejaGnu). It would be ideal if 478everybody building @code{findutils} also ran the test suite, but many 479people don't have DejaGnu installed. When changes are made to 480findutils, DejaGnu is invoked a lot. @xref{Testing}, for more 481information. 482@end table 483 484@node Using the GNU Portability Library 485@chapter Using the GNU Portability Library 486The Gnulib library (@url{https://www.gnu.org/software/gnulib/}) makes a 487variety of systems look more like a GNU/Linux system and also applies 488a bunch of automatic bug fixes and workarounds. Some of these also 489apply to GNU/Linux systems too. For example, the Gnulib regex 490implementation is used when we determine that we are building on a 491GNU libc system with a bug in the regex implementation. 492 493 494@section How and Why we Import the Gnulib Code 495Gnulib does not have a release process which results in a source 496tarball you can download. Instead, the code is simply made available 497by GIT, so we import gnulib via the submodule feature. The bootstrap 498script performs the necessary steps. 499 500Findutils does not use all the Gnulib code. The modules we need are 501listed in the file @file{bootstrap.conf}. 502 503The upshot of all this is that we can use the findutils git repository 504to track which version of Gnulib every findutils release uses. 505 506A small number of files are installed by automake and will therefore 507vary according to which version of automake was used to generate a 508release. This includes for example boiler-plate GNU files such as 509@file{ABOUT-NLS}, @file{INSTALL} and @file{COPYING}. 510 511 512@section How We Fix Gnulib Bugs 513Gnulib is used by quite a number of GNU projects, and this means that 514it gets plenty of testing. Therefore there are relatively few bugs in 515the Gnulib code, but it does happen from time to time. 516 517However, since there is no waiting around for a Gnulib source release 518tarball, Gnulib bugs are generally fixed quickly. Here is an outline 519of the way we would contribute a fix to Gnulib (assuming you know it 520is not already fixed in the current Gnulib git tree): 521 522@table @asis 523@item Check you already completed a copyright assignment for Gnulib 524@item Begin with a vanilla git tree 525Download the Findutils source code from git (or use the tree you have 526already) 527@item Run the bootstrap script 528@item Run configure 529@item Build findutils 530Build findutils and run the test suite, which should pass. In our 531example we assume you have just noticed a bug in Gnulib, not that 532recent Gnulib changes broke the findutils regression tests. 533@item Write a test case 534If in fact Gnulib did break the findutils regression tests, you can probably 535skip this step, since you already have a test case demonstrating the problem. 536Otherwise, write a findutils test case for the bug and/or a Gnulib test case. 537@item Fix the Gnulib bug 538Make sure your editor follows symbolic links so that your changes to 539@file{gnulib/...} actually affect the files in the git working 540directory you checked out earlier. Observe that your test now passes. 541@item Prepare a Gnulib patch 542In the gnulib subdirectory, use @code{git format-patch} to prepare the 543patch. Follow the normal usage for checkin comments (take a look at 544the output of @code{git log}). Check that the patch conforms with the 545GNU coding standards, and email it to the Gnulib mailing list. 546@item Wait for the patch to be applied 547Once your bug fix has been applied, you can update your gnulib 548directory from git, and then check in the change to the submodule as 549normal (you can check @code{git help submodule} for details). 550@end table 551 552There is an alternative to the method above; it is possible to store 553local diffs to be patched into gnulib beneath the 554@file{gnulib-local}. Normally however, there is no need for this, 555since gnulib updates are very prompt. 556 557@section How to update Gnulib to latest 558With a non-dirty working tree, the command @code{make update-gnulib-to-latest} 559(or the shorter alias @code{make gnulib-sync} allows, well, to update the 560gnulib submodule. In detail, that is: 561@enumerate 562@item Fetching the latest upstream gnulib reference. 563@item Copying the files which should stay in sync like 564@file{bootstrap} from gnulib into the findutils working tree. 565@item And finally showing the @code{git status} for the gnulib submodule 566and the above copied files. 567@end enumerate 568After that, the maintainer compares if all is correct, if the findutils build 569and run correct, and finally commits with the new gnulib version, e.g. via 570@code{git gui}. 571 572The @code{gnulib-sync} target can be run any time - after a @code{configure} 573run -, and only rejects to run if the working tree is dirty. 574 575@node Documentation 576@chapter Documentation 577 578The findutils git tree includes several different types of 579documentation. 580 581@section git change log 582The git change log for the source tree contains check-in messages 583which describe each check-in. These have a standard format: 584 585@smallexample 586Summary of the change. 587 588(ChangeLog-style detail) 589@end smallexample 590 591Here, the format of the detail part follows the standard GNU ChangeLog 592style, but without whitespace in the left margin and without 593author/date headers. Take a look at the output of @code{git log} to 594see some examples. The README-hacking file also contains an example 595with an explanation. 596 597@section User Documentation 598User-oriented documentation is provided as manual pages and in 599Texinfo. See 600@ref{Introduction,,Introduction,find,The Findutils manual}. 601 602Please make sure both sets of documentation are updated if you make a 603change to the code. The GNU coding standards do not normally call for 604maintaining manual pages on the grounds of effort duplication. 605However, the manual page format is more convenient for quick 606reference, and so it's worth maintaining both types of documentation. 607However, the manual pages are normally rather more terse than the 608Texinfo documentation. The manual pages are suitable for reference 609use, but the Texinfo manual should also include introductory and 610tutorial material. 611 612We make the user documentation available on the web, on the GNU 613project web site. These web pages are source-controlled via CVS 614(still!). If you are a member of the @samp{findutils} project on 615Savannah you should be able to check the web pages out like this 616(@samp{$USER} is a placeholder for your Savannah username): 617 618@smallexample 619cvs -d :ext:$USER@@cvs.savannah.gnu.org:/web/findutils checkout findutils/manual 620@end smallexample 621 622You can automatically update the documentation in this repository 623using the script @samp{build-aux/update-online-manual.sh} in the 624findutils Git repository. 625 626@section Build Guidance 627 628@table @file 629@item ABOUT-NLS 630Describes the Free Translation Project, the translation status of 631various GNU projects, and how to participate by translating an 632application. 633@item AUTHORS 634Lists the authors of findutils. 635@item COPYING 636The copyright license covering findutils; currently, the GNU GPL, 637version 3. 638@item INSTALL 639Generic installation instructions for installing GNU programs. 640@item README 641Information about how to compile findutils in particular 642@item README-hacking 643Describes how to build findutils from the code in git. 644@item THANKS 645Thanks for people who contributed to findutils. Generally, if 646someone's contribution was significant enough to need a copyright 647assignment, their name should go in here. 648@item TODO 649Mainly obsolete. Please add bugs to the Savannah bug tracker instead 650of adding entries to this file. 651@end table 652 653 654@section Release Information 655@table @file 656@item NEWS 657Enumerates the user-visible change in each release. Typical changes 658are fixed bugs, functionality changes and documentation changes. 659Include the date when a release is made. 660@item ChangeLog 661This file enumerates all changes to the findutils source code (with 662the possible exception of @file{.cvsignore} and @code{.gitignore} 663changes). The level of detail used for this file should be sufficient 664to answer the questions ``what changed?'' and ``why was it changed?''. 665The file is generated from the git commit messages during @code{make dist}. 666If a change fixes a bug, always give the bug reference number in the 667@file{NEWS} file and of course also in the checkin message. 668In general, it should be possible to enumerate all 669material changes to a function by searching for its name in 670@file{ChangeLog}. Mention when each release is made. 671@end table 672 673@node Testing 674@chapter Testing 675This chapter will explain the general procedures for adding tests to 676the test suite, and the functions defined in the findutils-specific 677DejaGnu configuration. Where appropriate references will be made to 678the DejaGnu documentation. 679 680@node Bugs 681@chapter Bugs 682 683Bugs are logged in the Savannah bug tracker 684@url{https://savannah.gnu.org/bugs/?group=findutils}. The tracker 685offers several fields but their use is largely obvious. The 686life-cycle of a bug is like this: 687 688 689@table @asis 690@item Open 691Someone, usually a maintainer, a distribution maintainer or a user, 692creates a bug by filling in the form. They fill in field values as 693they see fit. This will generate an email to 694@email{bug-findutils@@gnu.org}. 695 696@item Triage 697The bug hangs around with @samp{Status=None} until someone begins to 698work on it. At that point they set the ``Assigned To'' field and will 699sometimes set the status to @samp{In Progress}, especially if the bug 700will take a while to fix. 701 702@item Non-bugs 703Quite a lot of reports are not actually bugs; for these the usual 704procedure is to explain why the problem is not a bug, set the status 705to @samp{Invalid} and close the bug. Make sure you set the 706@samp{Assigned to} field to yourself before closing the bug. 707 708@item Fixing 709When you commit a bug fix into git (or in the case of a contributed 710patch, commit the change), mark the bug as @samp{Fixed}. Make sure 711you include a new test case where this is relevant. If you can figure 712out which releases are affected, please also set the @samp{Release} 713field to the earliest release which is affected by the bug. 714Indicate which source branch the fix is included in (for example, 7154.2.x or 4.3.x). Don't close the bug yet. 716 717@item Release 718When a release is made which includes the bug fix, make sure the bug 719is listed in the NEWS file. Once the release is made, fill in the 720@samp{Fixed Release} field and close the bug. 721@end table 722 723 724@node Distributions 725@chapter Distributions 726Almost all GNU/Linux distributions include findutils, but only some of 727them have a package maintainer who is a member of the mailing list. 728Distributions don't often feed back patches to the 729@email{bug-findutils@@gnu.org} list, but on the other hand many of 730their patches relate only to standards for file locations and so 731forth, and are therefore distribution specific. On an irregular basis 732I check the current patches being used by one or two distributions, 733but the total number of GNU/Linux distributions is large enough that 734we could not hope to cover them all. 735 736Often, bugs are raised against a distribution's bug tracker instead of 737GNU's. Periodically (about every six months) I take a look at some 738of the more accessible bug trackers to indicate which bugs have been 739fixed upstream. 740 741Many distributions include both findutils and the slocate package, 742which provides a replacement @code{locate}. 743 744 745@node Internationalisation 746@chapter Internationalisation 747Translation is essentially automated from the maintainer's point of 748view. The TP mails the maintainer when a new PO file is available, 749and we just download it and check it in. The @file{bootstrap} script 750copies @file{.po} files into the working tree. For more information, 751please see 752@url{https://translationproject.org/domain/findutils.html}. 753 754 755@node Security 756@chapter Security 757 758See @ref{Security Considerations, ,Security Considerations,find,The 759Findutils manual}, for a full description of the findutils approach to 760security considerations and discussion of particular tools. 761 762If someone reports a security bug publicly, we should fix this as 763rapidly as possible. If necessary, this can mean issuing a fixed 764release containing just the one bug fix. We try to avoid issuing 765releases which include both significant security fixes and functional 766changes. 767 768Where someone reports a security problem privately, we generally try 769to construct and test a patch without pushing the intermediate code to 770the public repository. 771 772Once everything has been tested, this allows us to make a release and 773push the patch. The advantage of doing things this way is that we 774avoid situations where people watching for git commits can figure out 775and exploit a security problem before a fixed release is available. 776 777It's important that security problems be fixed promptly, but don't 778rush so much that things go wrong. Make sure the new release really 779fixes the problem. It's usually best not to include functional 780changes in your security-fix release. 781 782If the security problem is serious, send an alert to 783@email{vendor-sec@@lst.de}. The members of the list include most 784GNU/Linux distributions. The point of doing this is to allow them to 785prepare to release your security fix to their customers, once the fix 786becomes available. Here is an example alert:- 787 788@smallexample 789GNU findutils heap buffer overrun (potential privilege escalation) 790 791 792 793I. BACKGROUND 794============= 795 796GNU findutils is a set of programs which search for files on Unix-like 797systems. It is maintained by the GNU Project of the Free Software 798Foundation. For more information, see 799@url{https://www.gnu.org/software/findutils}. 800 801 802II. DESCRIPTION 803=============== 804 805When GNU locate reads filenames from an old-format locate database, 806they are read into a fixed-length buffer allocated on the heap. 807Filenames longer than the 1026-byte buffer can cause a buffer overrun. 808The overrunning data can be chosen by any person able to control the 809names of filenames created on the local system. This will normally 810include all local users, but in many cases also remote users (for 811example in the case of FTP servers allowing uploads). 812 813III. ANALYSIS 814============= 815 816Findutils supports three different formats of locate database, its 817native format "LOCATE02", the slocate variant of LOCATE02, and a 818traditional ("old") format that locate uses on other Unix systems. 819 820When locate reads filenames from a LOCATE02 database (the default 821format), the buffer into which data is read is automatically extended 822to accommodate the length of the filenames. 823 824This automatic buffer extension does not happen for old-format 825databases. Instead a 1026-byte buffer is used. When a longer 826pathname appears in the locate database, the end of this buffer is 827overrun. The buffer is allocated on the heap (not the stack). 828 829If the locate database is in the default LOCATE02 format, the locate 830program does perform automatic buffer extension, and the program is 831not vulnerable to this problem. The software used to build the 832old-format locate database is not itself vulnerable to the same 833attack. 834 835Most installations of GNU findutils do not use the old database 836format, and so will not be vulnerable. 837 838 839IV. DETECTION 840============= 841 842Software 843-------- 844All existing releases of findutils are affected. 845 846 847Installations 848------------- 849 850To discover the longest path name on a given system, you can use the 851following command (requires GNU findutils and GNU coreutils): 852 853@verbatim 854find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L 855@end verbatim 856 857V. EXAMPLE 858========== 859 860This section includes a shell script which determines which of a list 861of locate binaries is vulnerable to the problem. The shell script has 862been tested only on glibc based systems having a mktemp binary. 863 864NOTE: This script deliberately overruns the buffer in order to 865determine if a binary is affected. Therefore running it on your 866system may have undesirable effects. We recommend that you read the 867script before running it. 868 869@verbatim 870#! /bin/sh 871set +m 872if vanilla_db="$(mktemp nicedb.XXXXXX)" ; then 873 if updatedb --prunepaths="" --old-format --localpaths="/tmp" \ 874 --output="$@{vanilla_db@}" ; then 875 true 876 else 877 rm -f "$@{vanilla_db@}" 878 vanilla_db="" 879 echo "Failed to create old-format locate database; skipping the sanity checks" >&2 880 fi 881fi 882 883make_overrun_db() @{ 884 # Start with a valid database 885 cat "$@{vanilla_db@}" 886 # Make the final entry really long 887 dd if=/dev/zero bs=1 count=1500 2>/dev/null | tr '\000' 'x' 888@} 889 890 891 892ulimit -c 0 893 894usage() @{ echo "usage: $0 binary [binary...]" >&2; exit $1; @} 895[ $# -eq 0 ] && usage 1 896 897bad="" 898good="" 899ugly="" 900if dbfile="$(mktemp nasty.XXXXXX)" 901then 902 make_overrun_db > "$dbfile" 903 for locate ; do 904 ver="$locate = $("$locate" --version | head -1)" 905 if [ -z "$vanilla_db" ] || "$locate" -d "$vanilla_db" "" >/dev/null ; then 906 "$locate" -d "$dbfile" "" >/dev/null 907 if [ $? -gt 128 ] ; then 908 bad="$bad 909vulnerable: $ver" 910 else 911 good="$good 912good: $ver" 913 fi 914 else 915 # the regular locate failed 916 ugly="$ugly 917buggy, may or may not be vulnerable: $ver" 918 fi 919 done 920 rm -f "$@{dbfile@}" "$@{vanilla_db@}" 921 # good: unaffected. bad: affected (vulnerable). 922 # ugly: doesn't even work for a normal old-format database. 923 echo "$good" 924 echo "$bad" 925 echo "$ugly" 926else 927 exit 1 928fi 929@end verbatim 930 931 932 933 934VI. VENDOR RESPONSE 935=================== 936 937The GNU project discovered the problem while 'locate' was being worked 938on; this is the first public announcement of the problem. 939 940The GNU findutils mantainer has issued a patch as p[art of this 941announcement. The patch appears below. 942 943A source release of findutils-4.2.31 will be issued on 2007-05-30. 944That release will of course include the patch. The patch will be 945committed to the public CVS repository at the same time. Public 946announcements of the release, including a description of the bug, will 947be made at the same time as the release. 948 949A release of findutils-4.3.x will follow and will also include the 950patch. 951 952 953VII. PATCH 954========== 955 956This patch should apply to findutils-4.2.23 and later. 957Findutils-4.2.23 was released almost two years ago. 958@verbatim 959Index: locate/locate.c 960=================================================================== 961RCS file: /cvsroot/findutils/findutils/locate/locate.c,v 962retrieving revision 1.58.2.2 963diff -u -p -r1.58.2.2 locate.c 964--- locate/locate.c 22 Apr 2007 16:57:42 -0000 1.58.2.2 965+++ locate/locate.c 28 May 2007 10:18:16 -0000 966@@@@ -124,9 +124,9 @@@@ extern int errno; 967 968 #include "locatedb.h" 969 #include <getline.h> 970-#include "../gnulib/lib/xalloc.h" 971-#include "../gnulib/lib/error.h" 972-#include "../gnulib/lib/human.h" 973+#include "xalloc.h" 974+#include "error.h" 975+#include "human.h" 976 #include "dirname.h" 977 #include "closeout.h" 978 #include "nextelem.h" 979@@@@ -468,10 +468,36 @@@@ visit_justprint_unquoted(struct process_ 980 return VISIT_CONTINUE; 981 @} 982 983+static void 984+toolong (struct process_data *procdata) 985+@{ 986+ error (EXIT_FAILURE, 0, 987+ _("locate database %s contains a " 988+ "filename longer than locate can handle"), 989+ procdata->dbfile); 990+@} 991+ 992+static void 993+extend (struct process_data *procdata, size_t siz1, size_t siz2) 994+@{ 995+ /* Figure out if the addition operation is safe before performing it. */ 996+ if (SIZE_MAX - siz1 < siz2) 997+ @{ 998+ toolong (procdata); 999+ @} 1000+ else if (procdata->pathsize < (siz1+siz2)) 1001+ @{ 1002+ procdata->pathsize = siz1+siz2; 1003+ procdata->original_filename = x2nrealloc (procdata->original_filename, 1004+ &procdata->pathsize, 1005+ 1); 1006+ @} 1007+@} 1008+ 1009 static int 1010 visit_old_format(struct process_data *procdata, void *context) 1011 @{ 1012- register char *s; 1013+ register size_t i; 1014 (void) context; 1015 1016 /* Get the offset in the path where this path info starts. */ 1017@@@@ -479,20 +505,35 @@@@ visit_old_format(struct process_data *pr 1018 procdata->count += getw (procdata->fp) - LOCATEDB_OLD_OFFSET; 1019 else 1020 procdata->count += procdata->c - LOCATEDB_OLD_OFFSET; 1021+ assert(procdata->count > 0); 1022 1023- /* Overlay the old path with the remainder of the new. */ 1024- for (s = procdata->original_filename + procdata->count; 1025+ /* Overlay the old path with the remainder of the new. Read 1026+ * more data until we get to the next filename. 1027+ */ 1028+ for (i=procdata->count; 1029 (procdata->c = getc (procdata->fp)) > LOCATEDB_OLD_ESCAPE;) 1030- if (procdata->c < 0200) 1031- *s++ = procdata->c; /* An ordinary character. */ 1032- else 1033- @{ 1034- /* Bigram markers have the high bit set. */ 1035- procdata->c &= 0177; 1036- *s++ = procdata->bigram1[procdata->c]; 1037- *s++ = procdata->bigram2[procdata->c]; 1038- @} 1039- *s-- = '\0'; 1040+ @{ 1041+ if (procdata->c < 0200) 1042+ @{ 1043+ /* An ordinary character. */ 1044+ extend (procdata, i, 1u); 1045+ procdata->original_filename[i++] = procdata->c; 1046+ @} 1047+ else 1048+ @{ 1049+ /* Bigram markers have the high bit set. */ 1050+ extend (procdata, i, 2u); 1051+ procdata->c &= 0177; 1052+ procdata->original_filename[i++] = procdata->bigram1[procdata->c]; 1053+ procdata->original_filename[i++] = procdata->bigram2[procdata->c]; 1054+ @} 1055+ @} 1056+ 1057+ /* Consider the case where we executed the loop body zero times; we 1058+ * still need space for the terminating null byte. 1059+ */ 1060+ extend (procdata, i, 1u); 1061+ procdata->original_filename[i] = 0; 1062 1063 procdata->munged_filename = procdata->original_filename; 1064@end verbatim 1065 1066 1067VIII. THANKS 1068============ 1069 1070Thanks to Rob Holland <rob@@inversepath.com> and Tavis Ormandy. 1071 1072 1073VIII. CVE INFORMATION 1074===================== 1075 1076No CVE candidate number has yet been assigned for this vulnerability. 1077If someone provides one, I will include it in the public announcement 1078and change logs. 1079@end smallexample 1080 1081The original announcement above was sent out with a cleartext PGP 1082signature, of course, but that has been omitted from the example. 1083 1084Once a fixed release is available, announce the new release using the 1085normal channels. Any CVE number assigned for the problem should be 1086included in the @file{ChangeLog} and @file{NEWS} entries. See 1087@url{https://cve.mitre.org/} for an explanation of CVE numbers. 1088 1089 1090 1091@node Making Releases 1092@chapter Making Releases 1093This section will explain how to make a findutils release. For the 1094time being here is a terse description of the main steps: 1095 1096@set RELEASE X.Y.Z 1097@set RELTAG v@value{RELEASE} 1098 1099@enumerate 1100@item Commit changes; make sure your working directory has no 1101uncommitted changes. 1102@item Update translation files; re-run bootstrap to download the 1103newest @samp{.po} files. 1104@item Make sure compiler warnings would block the release; re-run 1105@samp{configure} with the options 1106@code{--enable-compiler-warnings --enable-compiler-warnings-are-errors}. 1107@item Test; make sure that all changes you have made have tests, and 1108that the tests pass. 1109Verify this with @code{env RUN_EXPENSIVE_TESTS=yes make distcheck}. 1110@c The RUN_EXPENSIVE_TESTS environment variable is checked in init.cfg. 1111@item Bugs; make sure all Savannah bug entries fixed in this release 1112are marked as fixed in Savannah. Optionally close them too to save 1113duplicate work (otherwise, close them after the release is uploaded). 1114@item Add new release in Savannah field values; see the @code{Bugs > 1115Edit Field Values} menu item. Add a field value for the release you 1116are about to make so that users can report bugs in it. 1117@item Update version; make sure that the NEWS file 1118is updated with the new release number (and checked in). 1119@c There is no longer any need to update configure.ac, since it no 1120@c longer contains version information. 1121@item Tag the release; findutils releases are tagged like this for 1122example: v4.5.5. You can create a tag with the a command like this: 1123@c we use @example here because @value will not work within @code or @samp. 1124@example 1125git tag -s -m "Findutils release @value{RELEASE}" @value{RELTAG} 1126@end example 1127@noindent 1128@item Build the release tarball; do this with @code{make distcheck}. 1129Copy the tarball somewhere safe. 1130@item Merge; if the release (and signed tag) were made on a 1131local branch, merge the branch to your local master. 1132@item Push; push your master to origin/master. 1133@item Push the new release tag; assuming that the name of your remote is 1134@samp{origin}, this is: 1135@example 1136git push origin tag @value{RELTAG} 1137@end example 1138@item Prepare the upload and upload it. 1139You can do this with 1140@c we use @example here because @value will not work within @code or @samp. 1141@example 1142build-aux/gnupload --to ftp.gnu.org:findutils findutils-@value{RELEASE}.tar.xz 1143@end example 1144@noindent 1145Use @code{alpha.gnu.org:findutils} for an alpha or beta release. 1146@xref{Automated FTP Uploads, ,Automated FTP 1147Uploads, maintain, Information for Maintainers of GNU Software}, 1148for detailed upload instructions. 1149@item Check the FTP upload worked; you can look for an email from the 1150robot or check the contents of the actual FTP site. 1151@item Make a release announcement; include an extract from the NEWS 1152file which explains what's changed. Announcements for test releases 1153should just go to @email{bug-findutils@@gnu.org}. Announcements for 1154stable releases should go to @email{info-gnu@@gnu.org} as well. 1155@item Post-release administrativa: add a new dummy release header in NEWS: 1156 1157@code{* Major changes in release ?.?.?, YYYY-MM-DD} 1158 1159and update the @code{old_NEWS_hash} in @file{cfg.mk} with 1160@code{make update-NEWS-hash}. 1161Commit both changes. 1162@c make update-NEWS-hash supports make news-check but we normally 1163@c don't do that (and I'm not sure that the current NEWS file would 1164@c pass the check anyway). 1165@item Close bugs; any bugs recorded on Savannah which were fixed in this 1166release should now be marked as closed if there were not already. 1167Update the @samp{Fixed Release} field of these bugs appropriately and 1168make sure the @samp{Assigned to} field is populated. 1169@end enumerate 1170 1171 1172@node GNU Free Documentation License 1173@appendix GNU Free Documentation License 1174@include fdl.texi 1175 1176@bye 1177 1178@comment texi related words used by Emacs' spell checker ispell.el 1179 1180@comment LocalWords: texinfo setfilename settitle setchapternewpage 1181@comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt 1182@comment LocalWords: filll dir samp dfn noindent xref pxref 1183@comment LocalWords: var deffn texi deffnx itemx emph asis 1184@comment LocalWords: findex smallexample subsubsection cindex 1185@comment LocalWords: dircategory direntry itemize 1186 1187@comment other words used by Emacs' spell checker ispell.el 1188@comment LocalWords: README fred updatedb xargs Plett Rendell akefile 1189@comment LocalWords: args grep Filesystems fo foo fOo wildcards iname 1190@comment LocalWords: ipath regex iregex expr fubar regexps 1191@comment LocalWords: metacharacters macs sr sc inode lname ilname 1192@comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime 1193@comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm 1194@comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid 1195@comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth 1196@comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs 1197@comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall 1198@comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP 1199@comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron 1200@comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram 1201@comment LocalWords: bigrams cd chmod comp crc CVS dbfile eof 1202@comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX 1203@comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME 1204@comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks 1205@comment LocalWords: ois ok Pinard printindex proc procs prunefs 1206@comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str 1207@comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel 1208@comment LocalWords: wildcard zlogout basename execdir wholename iwholename 1209@comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX 1210