• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..21-Sep-2021-

.gitignoreH A D21-Sep-2021418 2015

LICENSEH A D21-Sep-20211.3 KiB2620

README.mdH A D21-Sep-202110.6 KiB224181

azure-pipelines.ymlH A D21-Sep-20211.1 KiB5448

debug_development.goH A D21-Sep-2021284 158

debug_release.goH A D21-Sep-2021128 72

dirent.goH A D21-Sep-20214 KiB10549

doc.goH A D21-Sep-20211.4 KiB431

go.modH A D21-Sep-202145 42

go.sumH A D21-Sep-20210

inoWithFileno.goH A D21-Sep-2021157 105

inoWithIno.goH A D21-Sep-2021151 105

modeType.goH A D21-Sep-2021693 2311

modeTypeWithType.goH A D21-Sep-20211 KiB3826

modeTypeWithoutType.goH A D21-Sep-2021537 199

nameWithNamlen.goH A D21-Sep-2021827 3015

nameWithoutNamlen.goH A D21-Sep-20211.2 KiB4323

readdir.goH A D21-Sep-20212.4 KiB547

readdir_unix.goH A D21-Sep-20213.3 KiB12593

readdir_windows.goH A D21-Sep-20211.3 KiB6746

reclenFromNamlen.goH A D21-Sep-2021147 105

reclenFromReclen.goH A D21-Sep-2021173 105

scandir_unix.goH A D21-Sep-20214.8 KiB16293

scandir_windows.goH A D21-Sep-20213.4 KiB13471

scanner.goH A D21-Sep-2021953 4533

walk.goH A D21-Sep-202111.9 KiB321141

README.md

1# godirwalk
2
3`godirwalk` is a library for traversing a directory tree on a file
4system.
5
6[![GoDoc](https://godoc.org/github.com/karrick/godirwalk?status.svg)](https://godoc.org/github.com/karrick/godirwalk) [![Build Status](https://dev.azure.com/microsoft0235/microsoft/_apis/build/status/karrick.godirwalk?branchName=master)](https://dev.azure.com/microsoft0235/microsoft/_build/latest?definitionId=1&branchName=master)
7
8In short, why do I use this library?
9
101. It's faster than `filepath.Walk`.
111. It's more correct on Windows than `filepath.Walk`.
121. It's more easy to use than `filepath.Walk`.
131. It's more flexible than `filepath.Walk`.
14
15## Usage Example
16
17Additional examples are provided in the `examples/` subdirectory.
18
19This library will normalize the provided top level directory name
20based on the os-specific path separator by calling `filepath.Clean` on
21its first argument. However it always provides the pathname created by
22using the correct os-specific path separator when invoking the
23provided callback function.
24
25```Go
26    dirname := "some/directory/root"
27    err := godirwalk.Walk(dirname, &godirwalk.Options{
28        Callback: func(osPathname string, de *godirwalk.Dirent) error {
29            fmt.Printf("%s %s\n", de.ModeType(), osPathname)
30            return nil
31        },
32        Unsorted: true, // (optional) set true for faster yet non-deterministic enumeration (see godoc)
33    })
34```
35
36This library not only provides functions for traversing a file system
37directory tree, but also for obtaining a list of immediate descendants
38of a particular directory, typically much more quickly than using
39`os.ReadDir` or `os.ReadDirnames`.
40
41## Description
42
43Here's why I use `godirwalk` in preference to `filepath.Walk`,
44`os.ReadDir`, and `os.ReadDirnames`.
45
46### It's faster than `filepath.Walk`
47
48When compared against `filepath.Walk` in benchmarks, it has been
49observed to run between five and ten times the speed on darwin, at
50speeds comparable to the that of the unix `find` utility; about twice
51the speed on linux; and about four times the speed on Windows.
52
53How does it obtain this performance boost? It does less work to give
54you nearly the same output. This library calls the same `syscall`
55functions to do the work, but it makes fewer calls, does not throw
56away information that it might need, and creates less memory churn
57along the way by reusing the same scratch buffer for reading from a
58directory rather than reallocating a new buffer every time it reads
59file system entry data from the operating system.
60
61While traversing a file system directory tree, `filepath.Walk` obtains
62the list of immediate descendants of a directory, and throws away the
63file system node type information provided by the operating system
64that comes with the node's name. Then, immediately prior to invoking
65the callback function, `filepath.Walk` invokes `os.Stat` for each
66node, and passes the returned `os.FileInfo` information to the
67callback.
68
69While the `os.FileInfo` information provided by `os.Stat` is extremely
70helpful--and even includes the `os.FileMode` data--providing it
71requires an additional system call for each node.
72
73Because most callbacks only care about what the node type is, this
74library does not throw the type information away, but rather provides
75that information to the callback function in the form of a
76`os.FileMode` value. Note that the provided `os.FileMode` value that
77this library provides only has the node type information, and does not
78have the permission bits, sticky bits, or other information from the
79file's mode. If the callback does care about a particular node's
80entire `os.FileInfo` data structure, the callback can easiy invoke
81`os.Stat` when needed, and only when needed.
82
83#### Benchmarks
84
85##### macOS
86
87```Bash
88$ go test -bench=. -benchmem
89goos: darwin
90goarch: amd64
91pkg: github.com/karrick/godirwalk
92BenchmarkReadDirnamesStandardLibrary-12   50000       26250  ns/op       10360  B/op       16  allocs/op
93BenchmarkReadDirnamesThisLibrary-12       50000       24372  ns/op        5064  B/op       20  allocs/op
94BenchmarkFilepathWalk-12                      1  1099524875  ns/op   228415912  B/op   416952  allocs/op
95BenchmarkGodirwalk-12                         2   526754589  ns/op   103110464  B/op   451442  allocs/op
96BenchmarkGodirwalkUnsorted-12                 3   509219296  ns/op   100751400  B/op   378800  allocs/op
97BenchmarkFlameGraphFilepathWalk-12            1  7478618820  ns/op  2284138176  B/op  4169453  allocs/op
98BenchmarkFlameGraphGodirwalk-12               1  4977264058  ns/op  1031105328  B/op  4514423  allocs/op
99PASS
100ok  	github.com/karrick/godirwalk	21.219s
101```
102
103##### Linux
104
105```Bash
106$ go test -bench=. -benchmem
107goos: linux
108goarch: amd64
109pkg: github.com/karrick/godirwalk
110BenchmarkReadDirnamesStandardLibrary-12  100000       15458  ns/op       10360  B/op       16  allocs/op
111BenchmarkReadDirnamesThisLibrary-12      100000       14646  ns/op        5064  B/op       20  allocs/op
112BenchmarkFilepathWalk-12                      2   631034745  ns/op   228210216  B/op   416939  allocs/op
113BenchmarkGodirwalk-12                         3   358714883  ns/op   102988664  B/op   451437  allocs/op
114BenchmarkGodirwalkUnsorted-12                 3   355363915  ns/op   100629234  B/op   378796  allocs/op
115BenchmarkFlameGraphFilepathWalk-12            1  6086913991  ns/op  2282104720  B/op  4169417  allocs/op
116BenchmarkFlameGraphGodirwalk-12               1  3456398824  ns/op  1029886400  B/op  4514373  allocs/op
117PASS
118ok      github.com/karrick/godirwalk    19.179s
119```
120
121### It's more correct on Windows than `filepath.Walk`
122
123I did not previously care about this either, but humor me. We all love
124how we can write once and run everywhere. It is essential for the
125language's adoption, growth, and success, that the software we create
126can run unmodified on all architectures and operating systems
127supported by Go.
128
129When the traversed file system has a logical loop caused by symbolic
130links to directories, on unix `filepath.Walk` ignores symbolic links
131and traverses the entire directory tree without error. On Windows
132however, `filepath.Walk` will continue following directory symbolic
133links, even though it is not supposed to, eventually causing
134`filepath.Walk` to terminate early and return an error when the
135pathname gets too long from concatenating endless loops of symbolic
136links onto the pathname. This error comes from Windows, passes through
137`filepath.Walk`, and to the upstream client running `filepath.Walk`.
138
139The takeaway is that behavior is different based on which platform
140`filepath.Walk` is running. While this is clearly not intentional,
141until it is fixed in the standard library, it presents a compatibility
142problem.
143
144This library correctly identifies symbolic links that point to
145directories and will only follow them when `FollowSymbolicLinks` is
146set to true. Behavior on Windows and other operating systems is
147identical.
148
149### It's more easy to use than `filepath.Walk`
150
151Since this library does not invoke `os.Stat` on every file system node
152it encounters, there is no possible error event for the callback
153function to filter on. The third argument in the `filepath.WalkFunc`
154function signature to pass the error from `os.Stat` to the callback
155function is no longer necessary, and thus eliminated from signature of
156the callback function from this library.
157
158Also, `filepath.Walk` invokes the callback function with a solidus
159delimited pathname regardless of the os-specific path separator. This
160library invokes the callback function with the os-specific pathname
161separator, obviating a call to `filepath.Clean` in the callback
162function for each node prior to actually using the provided pathname.
163
164In other words, even on Windows, `filepath.Walk` will invoke the
165callback with `some/path/to/foo.txt`, requiring well written clients
166to perform pathname normalization for every file prior to working with
167the specified file. In truth, many clients developed on unix and not
168tested on Windows neglect this subtlety, and will result in software
169bugs when running on Windows. This library would invoke the callback
170function with `some\path\to\foo.txt` for the same file when running on
171Windows, eliminating the need to normalize the pathname by the client,
172and lessen the likelyhood that a client will work on unix but not on
173Windows.
174
175### It's more flexible than `filepath.Walk`
176
177#### Configurable Handling of Symbolic Links
178
179The default behavior of this library is to ignore symbolic links to
180directories when walking a directory tree, just like `filepath.Walk`
181does. However, it does invoke the callback function with each node it
182finds, including symbolic links. If a particular use case exists to
183follow symbolic links when traversing a directory tree, this library
184can be invoked in manner to do so, by setting the
185`FollowSymbolicLinks` parameter to true.
186
187#### Configurable Sorting of Directory Children
188
189The default behavior of this library is to always sort the immediate
190descendants of a directory prior to visiting each node, just like
191`filepath.Walk` does. This is usually the desired behavior. However,
192this does come at slight performance and memory penalties required to
193sort the names when a directory node has many entries. Additionally if
194caller specifies `Unsorted` enumeration, reading directories is lazily
195performed as the caller consumes entries. If a particular use case
196exists that does not require sorting the directory's immediate
197descendants prior to visiting its nodes, this library will skip the
198sorting step when the `Unsorted` parameter is set to true.
199
200Here's an interesting read of the potential hazzards of traversing a
201file system hierarchy in a non-deterministic order. If you know the
202problem you are solving is not affected by the order files are
203visited, then I encourage you to use `Unsorted`. Otherwise skip
204setting this option.
205
206[Researchers find bug in Python script may have affected hundreds of studies](https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/)
207
208#### Configurable Post Children Callback
209
210This library provides upstream code with the ability to specify a
211callback to be invoked for each directory after its children are
212processed. This has been used to recursively delete empty directories
213after traversing the file system in a more efficient manner. See the
214`examples/clean-empties` directory for an example of this usage.
215
216#### Configurable Error Callback
217
218This library provides upstream code with the ability to specify a
219callback to be invoked for errors that the operating system returns,
220allowing the upstream code to determine the next course of action to
221take, whether to halt walking the hierarchy, as it would do were no
222error callback provided, or skip the node that caused the error. See
223the `examples/walk-fast` directory for an example of this usage.
224