1 2scandir, a better directory iterator and faster os.walk() 3========================================================= 4 5.. image:: https://img.shields.io/pypi/v/scandir.svg 6 :target: https://pypi.python.org/pypi/scandir 7 :alt: scandir on PyPI (Python Package Index) 8 9.. image:: https://travis-ci.org/benhoyt/scandir.svg?branch=master 10 :target: https://travis-ci.org/benhoyt/scandir 11 :alt: Travis CI tests (Linux) 12 13.. image:: https://ci.appveyor.com/api/projects/status/github/benhoyt/scandir?branch=master&svg=true 14 :target: https://ci.appveyor.com/project/benhoyt/scandir 15 :alt: Appveyor tests (Windows) 16 17 18``scandir()`` is a directory iteration function like ``os.listdir()``, 19except that instead of returning a list of bare filenames, it yields 20``DirEntry`` objects that include file type and stat information along 21with the name. Using ``scandir()`` increases the speed of ``os.walk()`` 22by 2-20 times (depending on the platform and file system) by avoiding 23unnecessary calls to ``os.stat()`` in most cases. 24 25 26Now included in a Python near you! 27---------------------------------- 28 29``scandir`` has been included in the Python 3.5 standard library as 30``os.scandir()``, and the related performance improvements to 31``os.walk()`` have also been included. So if you're lucky enough to be 32using Python 3.5 (release date September 13, 2015) you get the benefit 33immediately, otherwise just 34`download this module from PyPI <https://pypi.python.org/pypi/scandir>`_, 35install it with ``pip install scandir``, and then do something like 36this in your code: 37 38.. code-block:: python 39 40 # Use the built-in version of scandir/walk if possible, otherwise 41 # use the scandir module version 42 try: 43 from os import scandir, walk 44 except ImportError: 45 from scandir import scandir, walk 46 47`PEP 471 <https://www.python.org/dev/peps/pep-0471/>`_, which is the 48PEP that proposes including ``scandir`` in the Python standard library, 49was `accepted <https://mail.python.org/pipermail/python-dev/2014-July/135561.html>`_ 50in July 2014 by Victor Stinner, the BDFL-delegate for the PEP. 51 52This ``scandir`` module is intended to work on Python 2.6+ and Python 533.2+ (and it has been tested on those versions). 54 55 56Background 57---------- 58 59Python's built-in ``os.walk()`` is significantly slower than it needs to be, 60because -- in addition to calling ``listdir()`` on each directory -- it calls 61``stat()`` on each file to determine whether the filename is a directory or not. 62But both ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux/OS 63X already tell you whether the files returned are directories or not, so 64no further ``stat`` system calls are needed. In short, you can reduce the number 65of system calls from about 2N to N, where N is the total number of files and 66directories in the tree. 67 68In practice, removing all those extra system calls makes ``os.walk()`` about 69**7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS 70X.** So we're not talking about micro-optimizations. See more benchmarks 71in the "Benchmarks" section below. 72 73Somewhat relatedly, many people have also asked for a version of 74``os.listdir()`` that yields filenames as it iterates instead of returning them 75as one big list. This improves memory efficiency for iterating very large 76directories. 77 78So as well as a faster ``walk()``, scandir adds a new ``scandir()`` function. 79They're pretty easy to use, but see "The API" below for the full docs. 80 81 82Benchmarks 83---------- 84 85Below are results showing how many times as fast ``scandir.walk()`` is than 86``os.walk()`` on various systems, found by running ``benchmark.py`` with no 87arguments: 88 89==================== ============== ============= 90System version Python version Times as fast 91==================== ============== ============= 92Windows 7 64-bit 2.7.7 64-bit 10.4 93Windows 7 64-bit SSD 2.7.7 64-bit 10.3 94Windows 7 64-bit NFS 2.7.6 64-bit 36.8 95Windows 7 64-bit SSD 3.4.1 64-bit 9.9 96Windows 7 64-bit SSD 3.5.0 64-bit 9.5 97CentOS 6.2 64-bit 2.6.6 64-bit 3.9 98Ubuntu 14.04 64-bit 2.7.6 64-bit 5.8 99Mac OS X 10.9.3 2.7.5 64-bit 3.8 100==================== ============== ============= 101 102All of the above tests were done using the fast C version of scandir 103(source code in ``_scandir.c``). 104 105Note that the gains are less than the above on smaller directories and greater 106on larger directories. This is why ``benchmark.py`` creates a test directory 107tree with a standardized size. 108 109 110The API 111------- 112 113walk() 114~~~~~~ 115 116The API for ``scandir.walk()`` is exactly the same as ``os.walk()``, so just 117`read the Python docs <https://docs.python.org/3.5/library/os.html#os.walk>`_. 118 119scandir() 120~~~~~~~~~ 121 122The full docs for ``scandir()`` and the ``DirEntry`` objects it yields are 123available in the `Python documentation here <https://docs.python.org/3.5/library/os.html#os.scandir>`_. 124But below is a brief summary as well. 125 126 scandir(path='.') -> iterator of DirEntry objects for given path 127 128Like ``listdir``, ``scandir`` calls the operating system's directory 129iteration system calls to get the names of the files in the given 130``path``, but it's different from ``listdir`` in two ways: 131 132* Instead of returning bare filename strings, it returns lightweight 133 ``DirEntry`` objects that hold the filename string and provide 134 simple methods that allow access to the additional data the 135 operating system may have returned. 136 137* It returns a generator instead of a list, so that ``scandir`` acts 138 as a true iterator instead of returning the full list immediately. 139 140``scandir()`` yields a ``DirEntry`` object for each file and 141sub-directory in ``path``. Just like ``listdir``, the ``'.'`` 142and ``'..'`` pseudo-directories are skipped, and the entries are 143yielded in system-dependent order. Each ``DirEntry`` object has the 144following attributes and methods: 145 146* ``name``: the entry's filename, relative to the scandir ``path`` 147 argument (corresponds to the return values of ``os.listdir``) 148 149* ``path``: the entry's full path name (not necessarily an absolute 150 path) -- the equivalent of ``os.path.join(scandir_path, entry.name)`` 151 152* ``is_dir(*, follow_symlinks=True)``: similar to 153 ``pathlib.Path.is_dir()``, but the return value is cached on the 154 ``DirEntry`` object; doesn't require a system call in most cases; 155 don't follow symbolic links if ``follow_symlinks`` is False 156 157* ``is_file(*, follow_symlinks=True)``: similar to 158 ``pathlib.Path.is_file()``, but the return value is cached on the 159 ``DirEntry`` object; doesn't require a system call in most cases; 160 don't follow symbolic links if ``follow_symlinks`` is False 161 162* ``is_symlink()``: similar to ``pathlib.Path.is_symlink()``, but the 163 return value is cached on the ``DirEntry`` object; doesn't require a 164 system call in most cases 165 166* ``stat(*, follow_symlinks=True)``: like ``os.stat()``, but the 167 return value is cached on the ``DirEntry`` object; does not require a 168 system call on Windows (except for symlinks); don't follow symbolic links 169 (like ``os.lstat()``) if ``follow_symlinks`` is False 170 171* ``inode()``: return the inode number of the entry; the return value 172 is cached on the ``DirEntry`` object 173 174Here's a very simple example of ``scandir()`` showing use of the 175``DirEntry.name`` attribute and the ``DirEntry.is_dir()`` method: 176 177.. code-block:: python 178 179 def subdirs(path): 180 """Yield directory names not starting with '.' under given path.""" 181 for entry in os.scandir(path): 182 if not entry.name.startswith('.') and entry.is_dir(): 183 yield entry.name 184 185This ``subdirs()`` function will be significantly faster with scandir 186than ``os.listdir()`` and ``os.path.isdir()`` on both Windows and POSIX 187systems, especially on medium-sized or large directories. 188 189 190Further reading 191--------------- 192 193* `The Python docs for scandir <https://docs.python.org/3.5/library/os.html#os.scandir>`_ 194* `PEP 471 <https://www.python.org/dev/peps/pep-0471/>`_, the 195 (now-accepted) Python Enhancement Proposal that proposed adding 196 ``scandir`` to the standard library -- a lot of details here, 197 including rejected ideas and previous discussion 198 199 200Flames, comments, bug reports 201----------------------------- 202 203Please send flames, comments, and questions about scandir to Ben Hoyt: 204 205http://benhoyt.com/ 206 207File bug reports for the version in the Python 3.5 standard library 208`here <https://docs.python.org/3.5/bugs.html>`_, or file bug reports 209or feature requests for this module at the GitHub project page: 210 211https://github.com/benhoyt/scandir 212