• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

bin/H17-May-2020-379

src/H17-May-2020-829652

PKG-INFOH A D17-May-20208 KiB233165

README.rstH A D17-May-20205.5 KiB211144

setup.cfgH A D17-May-202038 53

setup.pyH A D17-May-20202.9 KiB7748

README.rst

1======
2bitrot
3======
4
5Detects bit rotten files on the hard drive to save your precious photo
6and music collection from slow decay.
7
8Usage
9-----
10
11Go to the desired directory and simply invoke::
12
13  $ bitrot
14
15This will start digging through your directory structure recursively
16indexing all files found. The index is stored in a ``.bitrot.db`` file
17which is a SQLite 3 database.
18
19Next time you run ``bitrot`` it will add new files and update the index
20for files with a changed modification date. Most importantly however, it
21will report all errors, e.g. files that changed on the hard drive but
22still have the same modification date.
23
24All paths stored in ``.bitrot.db`` are relative so it's safe to rescan
25a folder after moving it to another drive. Just remember to move it in
26a way that doesn't touch modification dates. Otherwise the checksum
27database is useless.
28
29Performance
30-----------
31
32Obviously depends on how fast the underlying drive is.  Historically
33the script was single-threaded because back in 2013 checksum
34calculations on a single core still outran typical drives, including
35the mobile SSDs of the day.  In 2020 this is no longer the case so the
36script now uses a process pool to calculate SHA1 hashes and perform
37`stat()` calls.
38
39No rigorous performance tests have been done.  Scanning a ~1000 file
40directory totalling ~5 GB takes 2.2s on a 2018 MacBook Pro 15" with
41a AP0512M SSD.  Back in 2013, that same feat on a 2015 MacBook Air with
42a SM0256G SSD took over 20 seconds.
43
44On that same 2018 MacBook Pro 15", scanning a 60+ GB music library takes
4524 seconds.  Back in 2013, with a typical 5400 RPM laptop hard drive
46it took around 15 minutes.  How times have changed!
47
48Tests
49-----
50
51There's a simple but comprehensive test scenario using
52`BATS <https://github.com/sstephenson/bats>`.  Run the
53file in the `tests` directory to run it.
54
55Change Log
56----------
57
581.0.0
59~~~~~
60
61* significantly sped up execution on solid state drives by using
62  a process pool executor to calculate SHA1 hashes and perform `stat()`
63  calls; use `-w1` if your runs on slow magnetic drives were
64  negatively affected by this change
65
66* sped up execution by pre-loading all SQLite-stored hashes to memory
67  and doing comparisons using Python sets
68
69* all UTF-8 filenames are now normalized to NFKD in the database to
70  enable cross-operating system checks
71
72* the SQLite database is now vacuumed to minimize its size
73
74* bugfix: additional Python 3 fixes when Unicode names were encountered
75
760.9.2
77~~~~~
78
79* bugfix: one place in the code incorrectly hardcoded UTF-8 as the
80  filesystem encoding
81
820.9.1
83~~~~~
84
85* bugfix: print the path that failed to decode with FSENCODING
86
87* bugfix: when using -q, don't hide warnings about files that can't be
88  statted or read
89
90* bugfix: -s is no longer broken on Python 3
91
920.9.0
93~~~~~
94
95* bugfix: bitrot.db checksum checking messages now obey --quiet
96
97* Python 3 compatibility
98
990.8.0
100~~~~~
101
102* bitrot now keeps track of its own database's bitrot by storing
103  a checksum of .bitrot.db in .bitrot.sha512
104
105* bugfix: now properly uses the filesystem encoding to decode file names
106  for use with the .bitrotdb database. Report and original patch by
107  pallinger.
108
1090.7.1
110~~~~~
111
112* bugfix: SHA1 computation now works correctly on Windows; previously
113  opened files in text-mode. This fix will change hashes of files
114  containing some specific bytes like 0x1A.
115
1160.7.0
117~~~~~
118
119* when a file changes or is renamed, the timestamp of the last check is
120  updated, too
121
122* bugfix: files that disappeared during the run are now properly ignored
123
124* bugfix: files that are locked or with otherwise denied access are
125  skipped. If they were read before, they will be considered "missing"
126  in the report.
127
128* bugfix: if there are multiple files with the same content in the
129  scanned directory tree, renames are now handled properly for them
130
131* refactored some horrible code to be a little less horrible
132
1330.6.0
134~~~~~
135
136* more control over performance with ``--commit-interval`` and
137  ``--chunk-size`` command-line arguments
138
139* bugfix: symbolic links are now properly skipped (or can be followed if
140  ``--follow-links`` is passed)
141
142* bugfix: files that cannot be opened are now gracefully skipped
143
144* bugfix: fixed a rare division by zero when run in an empty directory
145
1460.5.1
147~~~~~
148
149* bugfix: warn about test mode only in test mode
150
1510.5.0
152~~~~~
153
154* ``--test`` command-line argument for testing the state without
155  updating the database on disk (works for testing databases you don't
156  have write access to)
157
158* size of the data read is reported upon finish
159
160* minor performance updates
161
1620.4.0
163~~~~~
164
165* renames are now reported as such
166
167* all non-regular files (e.g. symbolic links, pipes, sockets) are now
168  skipped
169
170* progress presented in percentage
171
1720.3.0
173~~~~~
174
175* ``--sum`` command-line argument for easy comparison of multiple
176  databases
177
1780.2.1
179~~~~~
180
181* fixed regression from 0.2.0 where new files caused a ``KeyError``
182  exception
183
1840.2.0
185~~~~~
186
187* ``--verbose`` and ``--quiet`` command-line arguments
188
189* if a file is no longer there, its entry is removed from the database
190
1910.1.0
192~~~~~
193
194* First published version.
195
196Authors
197-------
198
199Glued together by `Łukasz Langa <mailto:lukasz@langa.pl>`_. Multiple
200improvements by
201`Ben Shepherd <mailto:bjashepherd@gmail.com>`_,
202`Jean-Louis Fuchs <mailto:ganwell@fangorn.ch>`_,
203`Marcus Linderoth <marcus@thingsquare.com>`_,
204`p1r473 <mailto:subwayjared@gmail.com>`_,
205`Peter Hofmann <mailto:scm@uninformativ.de>`_,
206`Phil Lundrigan <mailto:philipbl@cs.utah.edu>`_,
207`Reid Williams <rwilliams@ideo.com>`_,
208`Stan Senotrusov <senotrusov@gmail.com>`_,
209`Yang Zhang <mailto:yaaang@gmail.com>`_, and
210`Zhuoyun Wei <wzyboy@wzyboy.org>`_.
211