1;;;; ftw.scm --- file system tree walk 2 3;;;; Copyright (C) 2002, 2003, 2006, 2011, 2012, 2014, 2016, 2018 Free Software Foundation, Inc. 4;;;; 5;;;; This library is free software; you can redistribute it and/or 6;;;; modify it under the terms of the GNU Lesser General Public 7;;;; License as published by the Free Software Foundation; either 8;;;; version 3 of the License, or (at your option) any later version. 9;;;; 10;;;; This library is distributed in the hope that it will be useful, 11;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of 12;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 13;;;; Lesser General Public License for more details. 14;;;; 15;;;; You should have received a copy of the GNU Lesser General Public 16;;;; License along with this library; if not, write to the Free Software 17;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 18 19;;; Author: Thien-Thi Nguyen <ttn@gnu.org> 20 21;;; Commentary: 22 23;; Two procedures are provided: `ftw' and `nftw'. 24 25;; NOTE: The following description was adapted from the GNU libc info page, w/ 26;; significant modifications for a more "Schemey" interface. Most noticible 27;; are the inlining of `struct FTW *' parameters `base' and `level' and the 28;; omission of `descriptors' parameters. 29 30;; * Types 31;; 32;; The X/Open specification defines two procedures to process whole 33;; hierarchies of directories and the contained files. Both procedures 34;; of this `ftw' family take as one of the arguments a callback procedure 35;; which must be of these types. 36;; 37;; - Data Type: __ftw_proc_t 38;; (lambda (filename statinfo flag) ...) => status 39;; 40;; Type for callback procedures given to the `ftw' procedure. The 41;; first parameter is a filename, the second parameter is the 42;; vector value as returned by calling `stat' on FILENAME. 43;; 44;; The last parameter is a symbol giving more information about 45;; FILENAM. It can have one of the following values: 46;; 47;; `regular' 48;; The current item is a normal file or files which do not fit 49;; into one of the following categories. This means 50;; especially special files, sockets etc. 51;; 52;; `directory' 53;; The current item is a directory. 54;; 55;; `invalid-stat' 56;; The `stat' call to fill the object pointed to by the second 57;; parameter failed and so the information is invalid. 58;; 59;; `directory-not-readable' 60;; The item is a directory which cannot be read. 61;; 62;; `symlink' 63;; The item is a symbolic link. Since symbolic links are 64;; normally followed seeing this value in a `ftw' callback 65;; procedure means the referenced file does not exist. The 66;; situation for `nftw' is different. 67;; 68;; - Data Type: __nftw_proc_t 69;; (lambda (filename statinfo flag base level) ...) => status 70;; 71;; The first three arguments have the same as for the 72;; `__ftw_proc_t' type. A difference is that for the third 73;; argument some additional values are defined to allow finer 74;; differentiation: 75;; 76;; `directory-processed' 77;; The current item is a directory and all subdirectories have 78;; already been visited and reported. This flag is returned 79;; instead of `directory' if the `depth' flag is given to 80;; `nftw' (see below). 81;; 82;; `stale-symlink' 83;; The current item is a stale symbolic link. The file it 84;; points to does not exist. 85;; 86;; The last two parameters are described below. They contain 87;; information to help interpret FILENAME and give some information 88;; about current state of the traversal of the directory hierarchy. 89;; 90;; `base' 91;; The value specifies which part of the filename argument 92;; given in the first parameter to the callback procedure is 93;; the name of the file. The rest of the string is the path 94;; to locate the file. This information is especially 95;; important if the `chdir' flag for `nftw' was set since then 96;; the current directory is the one the current item is found 97;; in. 98;; 99;; `level' 100;; While processing the directory the procedures tracks how 101;; many directories have been examined to find the current 102;; item. This nesting level is 0 for the item given starting 103;; item (file or directory) and is incremented by one for each 104;; entered directory. 105;; 106;; * Procedure: (ftw filename proc . options) 107;; Do a file system tree walk starting at FILENAME using PROC. 108;; 109;; The `ftw' procedure calls the callback procedure given in the 110;; parameter PROC for every item which is found in the directory 111;; specified by FILENAME and all directories below. The procedure 112;; follows symbolic links if necessary but does not process an item 113;; twice. If FILENAME names no directory this item is the only 114;; object reported by calling the callback procedure. 115;; 116;; The filename given to the callback procedure is constructed by 117;; taking the FILENAME parameter and appending the names of all 118;; passed directories and then the local file name. So the 119;; callback procedure can use this parameter to access the file. 120;; Before the callback procedure is called `ftw' calls `stat' for 121;; this file and passes the information up to the callback 122;; procedure. If this `stat' call was not successful the failure is 123;; indicated by setting the flag argument of the callback procedure 124;; to `invalid-stat'. Otherwise the flag is set according to the 125;; description given in the description of `__ftw_proc_t' above. 126;; 127;; The callback procedure is expected to return non-#f to indicate 128;; that no error occurred and the processing should be continued. 129;; If an error occurred in the callback procedure or the call to 130;; `ftw' shall return immediately the callback procedure can return 131;; #f. This is the only correct way to stop the procedure. The 132;; program must not use `throw' or similar techniques to continue 133;; the program in another place. [Can we relax this? --ttn] 134;; 135;; The return value of the `ftw' procedure is #t if all callback 136;; procedure calls returned #t and all actions performed by the 137;; `ftw' succeeded. If some procedure call failed (other than 138;; calling `stat' on an item) the procedure returns #f. If a 139;; callback procedure returns a value other than #t this value is 140;; returned as the return value of `ftw'. 141;; 142;; * Procedure: (nftw filename proc . control-flags) 143;; Do a new-style file system tree walk starting at FILENAME using PROC. 144;; Various optional CONTROL-FLAGS alter the default behavior. 145;; 146;; The `nftw' procedures works like the `ftw' procedures. It calls 147;; the callback procedure PROC for all items it finds in the 148;; directory FILENAME and below. 149;; 150;; The differences are that for one the callback procedure is of a 151;; different type. It takes also `base' and `level' parameters as 152;; described above. 153;; 154;; The second difference is that `nftw' takes additional optional 155;; arguments which are zero or more of the following symbols: 156;; 157;; physical' 158;; While traversing the directory symbolic links are not 159;; followed. I.e., if this flag is given symbolic links are 160;; reported using the `symlink' value for the type parameter 161;; to the callback procedure. Please note that if this flag is 162;; used the appearance of `symlink' in a callback procedure 163;; does not mean the referenced file does not exist. To 164;; indicate this the extra value `stale-symlink' exists. 165;; 166;; mount' 167;; The callback procedure is only called for items which are on 168;; the same mounted file system as the directory given as the 169;; FILENAME parameter to `nftw'. 170;; 171;; chdir' 172;; If this flag is given the current working directory is 173;; changed to the directory containing the reported object 174;; before the callback procedure is called. 175;; 176;; depth' 177;; If this option is given the procedure visits first all files 178;; and subdirectories before the callback procedure is called 179;; for the directory itself (depth-first processing). This 180;; also means the type flag given to the callback procedure is 181;; `directory-processed' and not `directory'. 182;; 183;; The return value is computed in the same way as for `ftw'. 184;; `nftw' returns #t if no failure occurred in `nftw' and all 185;; callback procedure call return values are also #t. For internal 186;; errors such as memory problems the error `ftw-error' is thrown. 187;; If the return value of a callback invocation is not #t this 188;; very same value is returned. 189 190;;; Code: 191 192(define-module (ice-9 ftw) 193 #:use-module (ice-9 match) 194 #:use-module (ice-9 vlist) 195 #:use-module (srfi srfi-1) 196 #:autoload (ice-9 i18n) (string-locale<?) 197 #:export (ftw nftw 198 file-system-fold 199 file-system-tree 200 scandir)) 201 202(define-macro (getuid-or-false) 203 (if (defined? 'getuid) 204 (getuid) 205 #f)) 206 207(define-macro (getgid-or-false) 208 (if (defined? 'getgid) 209 (getgid) 210 #f)) 211 212(define (directory-files dir) 213 (let ((dir-stream (opendir dir))) 214 (let loop ((new (readdir dir-stream)) 215 (acc '())) 216 (if (eof-object? new) 217 (begin 218 (closedir dir-stream) 219 acc) 220 (loop (readdir dir-stream) 221 (if (or (string=? "." new) ;;; ignore 222 (string=? ".." new)) ;;; ignore 223 acc 224 (cons new acc))))))) 225 226(define (pathify . nodes) 227 (let loop ((nodes nodes) 228 (result "")) 229 (if (null? nodes) 230 (or (and (string=? "" result) "") 231 (substring result 1 (string-length result))) 232 (loop (cdr nodes) (string-append result "/" (car nodes)))))) 233 234;; `visited?-proc' returns a test procedure VISITED? which when called as 235;; (VISITED? stat-obj) returns #f the first time a distinct file is seen, 236;; then #t on any subsequent sighting of it. 237;; 238;; stat:dev and stat:ino together uniquely identify a file (see "Attribute 239;; Meanings" in the glibc manual). Often there'll be just one dev, and 240;; usually there's just a handful mounted, so the strategy here is a small 241;; hash table indexed by dev, containing hash tables indexed by ino. 242;; 243;; On some file systems, stat:ino is always zero. In that case, 244;; a string hash of the full file name is used. 245;; 246;; It'd be possible to make a pair (dev . ino) and use that as the key to a 247;; single hash table. It'd use an extra pair for every file visited, but 248;; might be a little faster if it meant less scheme code. 249;; 250(define (visited?-proc size) 251 (let ((dev-hash (make-hash-table 7))) 252 (lambda (s name) 253 (and s 254 (let* ((ino-hash (hashv-ref dev-hash (stat:dev s))) 255 (%ino (stat:ino s)) 256 (ino (if (= 0 %ino) 257 (string-hash name) 258 %ino))) 259 (or ino-hash 260 (begin 261 (set! ino-hash (make-hash-table size)) 262 (hashv-set! dev-hash (stat:dev s) ino-hash))) 263 (or (hashv-ref ino-hash ino) 264 (begin 265 (hashv-set! ino-hash ino #t) 266 #f))))))) 267 268(define (stat-dir-readable?-proc uid gid) 269 (lambda (s) 270 (let* ((perms (stat:perms s)) 271 (perms-bit-set? (lambda (mask) 272 (logtest mask perms)))) 273 (or (equal? uid 0) 274 (and (equal? uid (stat:uid s)) 275 (perms-bit-set? #o400)) 276 (and (equal? gid (stat:gid s)) 277 (perms-bit-set? #o040)) 278 (perms-bit-set? #o004))))) 279 280(define (stat&flag-proc dir-readable? . control-flags) 281 (let* ((directory-flag (if (memq 'depth control-flags) 282 'directory-processed 283 'directory)) 284 (stale-symlink-flag (if (memq 'nftw-style control-flags) 285 'stale-symlink 286 'symlink)) 287 (physical? (memq 'physical control-flags)) 288 (easy-flag (lambda (s) 289 (let ((type (stat:type s))) 290 (if (eq? 'directory type) 291 (if (dir-readable? s) 292 directory-flag 293 'directory-not-readable) 294 'regular))))) 295 (lambda (name) 296 (let ((s (false-if-exception (lstat name)))) 297 (cond ((not s) 298 (values s 'invalid-stat)) 299 ((eq? 'symlink (stat:type s)) 300 (let ((s-follow (false-if-exception (stat name)))) 301 (cond ((not s-follow) 302 (values s stale-symlink-flag)) 303 ((and s-follow physical?) 304 (values s 'symlink)) 305 ((and s-follow (not physical?)) 306 (values s-follow (easy-flag s-follow)))))) 307 (else (values s (easy-flag s)))))))) 308 309(define (clean name) 310 (let ((last-char-index (1- (string-length name)))) 311 (if (char=? #\/ (string-ref name last-char-index)) 312 (substring name 0 last-char-index) 313 name))) 314 315(define (ftw filename proc . options) 316 (let* ((visited? (visited?-proc (cond ((memq 'hash-size options) => cadr) 317 (else 211)))) 318 (stat&flag (stat&flag-proc 319 (stat-dir-readable?-proc (getuid-or-false) 320 (getgid-or-false))))) 321 (letrec ((go (lambda (fullname) 322 (call-with-values (lambda () (stat&flag fullname)) 323 (lambda (s flag) 324 (or (visited? s fullname) 325 (let ((ret (proc fullname s flag))) ; callback 326 (or (eq? #t ret) 327 (throw 'ftw-early-exit ret)) 328 (and (eq? 'directory flag) 329 (for-each 330 (lambda (child) 331 (go (pathify fullname child))) 332 (directory-files fullname))) 333 #t))))))) 334 (catch 'ftw-early-exit 335 (lambda () (go (clean filename))) 336 (lambda (key val) val))))) 337 338(define (nftw filename proc . control-flags) 339 (let* ((od (getcwd)) ; orig dir 340 (odev (let ((s (false-if-exception (lstat filename)))) 341 (if s (stat:dev s) -1))) 342 (same-dev? (if (memq 'mount control-flags) 343 (lambda (s) (= (stat:dev s) odev)) 344 (lambda (s) #t))) 345 (base-sub (lambda (name base) (substring name 0 base))) 346 (maybe-cd (if (memq 'chdir control-flags) 347 (if (absolute-file-name? filename) 348 (lambda (fullname base) 349 (or (= 0 base) 350 (chdir (base-sub fullname base)))) 351 (lambda (fullname base) 352 (chdir 353 (pathify od (base-sub fullname base))))) 354 (lambda (fullname base) #t))) 355 (maybe-cd-back (if (memq 'chdir control-flags) 356 (lambda () (chdir od)) 357 (lambda () #t))) 358 (depth-first? (memq 'depth control-flags)) 359 (visited? (visited?-proc 360 (cond ((memq 'hash-size control-flags) => cadr) 361 (else 211)))) 362 (has-kids? (if depth-first? 363 (lambda (flag) (eq? flag 'directory-processed)) 364 (lambda (flag) (eq? flag 'directory)))) 365 (stat&flag (apply stat&flag-proc 366 (stat-dir-readable?-proc (getuid-or-false) 367 (getgid-or-false)) 368 (cons 'nftw-style control-flags)))) 369 (letrec ((go (lambda (fullname base level) 370 (call-with-values (lambda () (stat&flag fullname)) 371 (lambda (s flag) 372 (letrec ((self (lambda () 373 (maybe-cd fullname base) 374 ;; the callback 375 (let ((ret (proc fullname s flag 376 base level))) 377 (maybe-cd-back) 378 (or (eq? #t ret) 379 (throw 'nftw-early-exit ret))))) 380 (kids (lambda () 381 (and (has-kids? flag) 382 (for-each 383 (lambda (child) 384 (go (pathify fullname child) 385 (1+ (string-length 386 fullname)) 387 (1+ level))) 388 (directory-files fullname)))))) 389 (or (visited? s fullname) 390 (not (same-dev? s)) 391 (if depth-first? 392 (begin (kids) (self)) 393 (begin (self) (kids))))))) 394 #t))) 395 (let ((ret (catch 'nftw-early-exit 396 (lambda () (go (clean filename) 0 0)) 397 (lambda (key val) val)))) 398 (chdir od) 399 ret)))) 400 401 402;;; 403;;; `file-system-fold' & co. 404;;; 405 406(define-syntax-rule (errno-if-exception expr) 407 (catch 'system-error 408 (lambda () 409 expr) 410 (lambda args 411 (system-error-errno args)))) 412 413(define* (file-system-fold enter? leaf down up skip error init file-name 414 #:optional (stat lstat)) 415 "Traverse the directory at FILE-NAME, recursively. Enter 416sub-directories only when (ENTER? PATH STAT RESULT) returns true. When 417a sub-directory is entered, call (DOWN PATH STAT RESULT), where PATH is 418the path of the sub-directory and STAT the result of (stat PATH); when 419it is left, call (UP PATH STAT RESULT). For each file in a directory, 420call (LEAF PATH STAT RESULT). When ENTER? returns false, call (SKIP 421PATH STAT RESULT). When an `opendir' or STAT call raises an exception, 422call (ERROR PATH STAT ERRNO RESULT), with ERRNO being the operating 423system error number that was raised. 424 425Return the result of these successive applications. 426When FILE-NAME names a flat file, (LEAF PATH STAT INIT) is returned. 427The optional STAT parameter defaults to `lstat'." 428 429 ;; Use drive and inode number as a hash key. If the filesystem 430 ;; doesn't use inodes, fall back to a string hash. 431 (define (mark v s fname) 432 (vhash-cons (cons (stat:dev s) 433 (if (= 0 (stat:ino s)) 434 (string-hash fname) 435 (stat:ino s))) 436 #t v)) 437 438 (define (visited? v s fname) 439 (vhash-assoc (cons (stat:dev s) 440 (if (= 0 (stat:ino s)) 441 (string-hash fname) 442 (stat:ino s))) 443 v)) 444 445 (let loop ((name file-name) 446 (path "") 447 (dir-stat (errno-if-exception (stat file-name))) 448 (result init) 449 (visited vlist-null)) 450 451 (define full-name 452 (if (string=? path "") 453 name 454 (string-append path "/" name))) 455 456 (cond 457 ((integer? dir-stat) 458 ;; FILE-NAME is not readable. 459 (error full-name #f dir-stat result)) 460 ((visited? visited dir-stat full-name) 461 (values result visited)) 462 ((eq? 'directory (stat:type dir-stat)) ; true except perhaps the 1st time 463 (if (enter? full-name dir-stat result) 464 (let ((dir (errno-if-exception (opendir full-name))) 465 (visited (mark visited dir-stat full-name))) 466 (cond 467 ((directory-stream? dir) 468 (let liip ((entry (readdir dir)) 469 (result (down full-name dir-stat result)) 470 (subdirs '())) 471 (cond ((eof-object? entry) 472 (begin 473 (closedir dir) 474 (let ((r+v 475 (fold (lambda (subdir result+visited) 476 (call-with-values 477 (lambda () 478 (loop (car subdir) 479 full-name 480 (cdr subdir) 481 (car result+visited) 482 (cdr result+visited))) 483 cons)) 484 (cons result visited) 485 subdirs))) 486 (values (up full-name dir-stat (car r+v)) 487 (cdr r+v))))) 488 ((or (string=? entry ".") 489 (string=? entry "..")) 490 (liip (readdir dir) 491 result 492 subdirs)) 493 (else 494 (let* ((child (string-append full-name "/" entry)) 495 (st (errno-if-exception (stat child)))) 496 (if (integer? st) ; CHILD is a dangling symlink? 497 (liip (readdir dir) 498 (error child #f st result) 499 subdirs) 500 (if (eq? (stat:type st) 'directory) 501 (liip (readdir dir) 502 result 503 (alist-cons entry st subdirs)) 504 (liip (readdir dir) 505 (leaf child st result) 506 subdirs)))))))) 507 (else 508 ;; Directory FULL-NAME not readable, but it is stat'able. 509 (values (error full-name dir-stat dir result) 510 visited)))) 511 (values (skip full-name dir-stat result) 512 (mark visited dir-stat full-name)))) 513 (else 514 ;; Caller passed a FILE-NAME that names a flat file, not a directory. 515 (leaf full-name dir-stat result))))) 516 517(define* (file-system-tree file-name 518 #:optional (enter? (lambda (n s) #t)) 519 (stat lstat)) 520 "Return a tree of the form (FILE-NAME STAT CHILDREN ...) where STAT is 521the result of (STAT FILE-NAME) and CHILDREN are similar structures for 522each file contained in FILE-NAME when it designates a directory. The 523optional ENTER? predicate is invoked as (ENTER? NAME STAT) and should 524return true to allow recursion into directory NAME; the default value is 525a procedure that always returns #t. When a directory does not match 526ENTER?, it nonetheless appears in the resulting tree, only with zero 527children. The optional STAT parameter defaults to `lstat'. Return #f 528when FILE-NAME is not readable." 529 (define (enter?* name stat result) 530 (enter? name stat)) 531 (define (leaf name stat result) 532 (match result 533 (((siblings ...) rest ...) 534 (cons (alist-cons (basename name) (cons stat '()) siblings) 535 rest)))) 536 (define (down name stat result) 537 (cons '() result)) 538 (define (up name stat result) 539 (match result 540 (((children ...) (siblings ...) rest ...) 541 (cons (alist-cons (basename name) (cons stat children) 542 siblings) 543 rest)))) 544 (define skip ; keep an entry for skipped directories 545 leaf) 546 (define (error name stat errno result) 547 (if (string=? name file-name) 548 result 549 (leaf name stat result))) 550 551 (match (file-system-fold enter?* leaf down up skip error '(()) 552 file-name stat) 553 (((tree)) tree) 554 ((()) #f))) ; FILE-NAME is unreadable 555 556(define* (scandir name #:optional (select? (const #t)) 557 (entry<? string-locale<?)) 558 "Return the list of the names of files contained in directory NAME 559that match predicate SELECT? (by default, all files.) The returned list 560of file names is sorted according to ENTRY<?, which defaults to 561`string-locale<?'. Return #f when NAME is unreadable or is not a 562directory." 563 564 ;; This procedure is implemented in terms of 'readdir' instead of 565 ;; 'file-system-fold' to avoid the extra 'stat' call that the latter 566 ;; makes for each entry. 567 568 (define (opendir* directory) 569 (catch 'system-error 570 (lambda () 571 (opendir directory)) 572 (const #f))) 573 574 (and=> (opendir* name) 575 (lambda (stream) 576 (let loop ((entry (readdir stream)) 577 (files '())) 578 (if (eof-object? entry) 579 (begin 580 (closedir stream) 581 (sort files entry<?)) 582 (loop (readdir stream) 583 (if (select? entry) 584 (cons entry files) 585 files))))))) 586 587;;; ftw.scm ends here 588