1#! /usr/bin/env python3
2
3# Copyright 2007 Google Inc.
4#
5# This program is free software; you can redistribute it and/or
6# modify it under the terms of the GNU General Public License
7# as published by the Free Software Foundation; either version 2
8# of the License, or (at your option) any later version.
9#
10# This program is distributed in the hope that it will be useful,
11# but WITHOUT ANY WARRANTY; without even the implied warranty of
12# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
13# GNU General Public License for more details.
14#
15# You should have received a copy of the GNU General Public License
16# along with this program; if not, write to the Free Software
17# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
18# USA.
19#
20
21"""Classes enabling definition and composition of caches.
22
23This file defines caches used to speed up the does-this-file-exist
24test that forms the basis of the C preprocessor's include-file
25handling, and takes most of its time.
26
27When the preprocessor sees a line like "#include <foo/bar.h>" it looks
28for a file named "bar.h" in many directories: /usr/include/foo/bar.h,
29./foo/bar.h, and so forth.  More precisely, the preprocessor is given
30a "search path", which is a list of directory-names.  (By default, the
31search-path looks like ['/usr/include', '/usr/local/include', ...],
32but it's often extended via gcc flags like -I, -isystem, -iprefix,
33etc.)  To resolve a single #include like "#include <foo/bar.h>", the
34preprocessor goes through every directory in the search path, running
35
36   os.stat(os.path.join(current_working_dir, search_dir, 'foo/bar.h'))
37
38until the stat call succeeds.  With dozens of search-dirs to look
39through, dozens of #include lines per source file, and hundreds of
40source files per compilation, this can add up to millions of stat
41calls.  Many of these calls are exactly the same, so caching is a big
42win.
43
44The cache of stat calls takes a filename as input and produces a bool
45as output, saying if the filename exists.  For reasons that will
46become clear in a moment, we actually represent the input filename as
47a triple that breaks the filename into its three components:
48
49   1) currdir: the current working directory (usually os.path.absdir('.'))
50   2) searchdir: an element of the search path (eg '/usr/include', 'base')
51   3) includepath: the thing that comes after "#include" in source files
52                   ("foo/bar.h" in our examples above).
53
54Why do we break the input into three parts?  Consider what cache-lookups
55we have to do for a single source file:
56   cache[os.path.join(currdir, searchdir1, includepath1)]   # #include <ipath1>
57   cache[os.path.join(currdir, searchdir2, includepath1)]   # #include <ipath1>
58   cache[os.path.join(currdir, searchdir3, includepath1)]   # #include <ipath1>
59   [etc...until the cache-lookup returns True]
60   cache[os.path.join(currdir, searchdir1, includepath2)]   # #include <ipath2>
61   cache[os.path.join(currdir, searchdir2, includepath2)]   # #include <ipath2>
62   cache[os.path.join(currdir, searchdir3, includepath2)]   # #include <ipath2>
63   [etc]
64
65By having the key be a triple, we avoid all those unnecessary
66os.path.join calls.  But even if we do this, we notice bigger fish
67to fry: the Python interpreter still has to do a string-hash of
68currdir for every lookup, and also has to string-hash searchdirX and
69includepathX many times.  It would be much more efficient if we did
70those hashes ourselves, reducing the number of string-hashes from
71O(|search-path| * |#include lines|) to
72O(|search-path| + |#include lines|).
73
74This motivates (finally!) the data structures in this file.  We have
75three string-to-number maps, for mapping each currdir, searchdir, and
76includepath to a small integer.  We put that all together in a cache,
77that takes a triple of integers as its key and produces True if the
78file exists, False if it does not, or None if its status is unknown.
79
80The String-to-number Map(s)
81---------------------------
82The basic map that converts a filepath-path -- a currdir, searchdir,
83or includepath -- to a small integer is called MapToIndex.  MapToIndex
84provides mapping in both directions:
85   index: a dictionary mapping paths (strings) to indices in 1..N, and
86   string: an array of size N + 1 that implements the reverse mapping
87
88So:
89   obj.string[obj.index[path_as_string]] == path_as_string
90   obj.index[obj.string[path_as_number]] == path_as_number
91
92Note we map from 1..N, and not 0..N-1, which leave us 0 free to use as
93a synonym for None or False.
94
95There are also classes that specialize MapToIndex for specific purposes.
96
97DirectoryMapToIndex assumes the input is a directory, and in
98particular a directory that does not have a slash at the end of it (eg
99"/etc").  It adds the trailing slash before inserting into the map.
100This is useful because it allows us to use + to join this directory
101with a relative filename, rather than the slower os.path.join().
102
103RelpathMapToIndex assumes the input is a relative filepath, that is,
104one that does not start with /.  When combined with DirectoryMapToIndex
105entries, + can be used as a fast alternative to os.path.join().
106
107CanonicalMapToIndex is a MapToIndex that canonializes its input before
108inserting it into the map: resolving symlinks, getting rid of ..'s,
109etc.  It takes an absolute path as input.
110
111Other Caches
112------------
113Besides the maps from strings to integers, there are three other caches.
114One is the realpath-cache, that takes a filename and returns
115os.path.realpath(filename).  We cache this because os.path.realpath()
116is very slow.  This is called CanonicalPath.
117
118The second cache, the DirnameCache, maps an arbitrary pathname to
119dirname(pathname), that is, the directory the pathname is in.  The
120input pathname is represented by a (currdir_idx, searchdir_idx,
121includepath_idx) triple.  The output is likewise represented as a
122number: an index into the DirectoryMapToIndex structure.
123
124The third cache is called SystemdirPrefixCache.  It tells you, for a
125given absolute filepath, whether it is prefixed by a systemdir (that
126is, one of the searchdirs that's built into cpp, such as /usr/include).
127This is useful to cache because there are several systemdirs, and it's
128expensive to check them all each time.
129
130Naming Conventions
131------------------
132  currdir: the current working dir.
133  searchdir: an element of the search-path (places cpp looks for .h files).
134  includepath: the string a source file #includes.
135  realpath: a full filepath with all its symlinks resolved:
136     os.path.realpath(os.path.join(currdir, searchdir, includepath))
137  FOO_idx: the small integer associated with the string FOO.
138
139  includepath_map: the map that takes includepaths to their idx and back
140     (a RelpathMapToIndex).
141  directory_map: the map that takes currdirs and searchdirs to their
142     idx and back.  It also is used to store dirname(filepath) for arbitrary
143     filepaths -- basically, anything we know is a directory (a
144     DirectoryMapToIndex).
145  realpath_map: the map that takes full filepaths to their idx and back,
146     canonicalizing them first (by resolving symlinks) (a
147     CanonicalMapToIndex).
148
149  searchlist: a list of searchdirs.  In gcc/cpp documentation, this is
150     called the "search path", but for consistency, in this code we reserve
151     the name "path" to mean "filesystem component," never "list of dirs".
152     (A list of strings).
153  systemdir: a searchdir that's built into cpp, rather than set via -I.
154     (A string.)
155
156  resolved_filepath: given an includepath, and a (possibly implicit)
157     currdir and searchlist, the resolved_filepath is
158        os.path.join(currdir, searchdir, includepath)
159     for the first searchdir in searchlist for which the joined string
160     exists.  This path can be represented in many ways: 1) a string like
161     "foo/bar/baz.h" (if so, this string has been canonicalized to resolve
162     symlinks and the like); 2) an index into realpath_map associated with
163     that string; 3) a triple of indices; or 4) a pair of indices plus an
164     assumption that os.getcwd() == currdir.
165
166Pair Representation of Filepaths
167-------------------------------
168A file is uniquely determined by the triple
169   (currdir_idx, searchdir_idx, includepath_idx)
170For a single compilation unit, the code will often start with a
171chdir(currdir).  After that, we often refer to a file by the pair
172   (searchdir_idx, includepath_idx)
173which might be either an absolute filename or relative to $PWD.
174
175We refer to this pair as a filepath_pair.
176TODO(csilvers): find a better name?
177
178The function IsFilepathPair(x) tests whether x is a pair that could
179plausibly have a searchdir_idx as its first element and an
180includepath_idx as its second.
181
182Tests
183-----
184This code is currently only tested by regression tests of modules
185using this one.
186"""
187
188__author__ = "opensource@google.com (Nils Klarlund, Craig Silverstein)"
189
190import os
191import os.path
192import sys
193
194import basics
195import statistics
196import compiler_defaults
197
198DIR_ARRAY_SIZE = 500
199
200# We currently use the stat and realpath of GNU libc stat and
201# realpath. They are about an order of magnitude faster than their
202# Python counterparts, even when called through the Python/C
203# interface.
204
205try:
206  import distcc_pump_c_extensions
207  _OsPathExists = distcc_pump_c_extensions.OsPathExists
208  _OsPathIsFile = distcc_pump_c_extensions.OsPathIsFile
209  _PathRealpath = distcc_pump_c_extensions.Realpath
210  _path_realpath_works = True
211except ImportError:
212  _OsPathExists = os.path.exists
213  _OsPathIsFile = os.path.isfile
214  _PathRealpath = os.path.realpath
215  # os.path.realpath might have some bugs.  TODO(csilvers): check that here
216  _path_realpath_works = False
217
218Debug = basics.Debug
219DEBUG_TRACE = basics.DEBUG_TRACE
220DEBUG_TRACE1 = basics.DEBUG_TRACE1
221DEBUG_TRACE2 = basics.DEBUG_TRACE2
222DEBUG_WARNING = basics.DEBUG_WARNING
223NotCoveredError = basics.NotCoveredError
224
225
226####
227#### SIMPLE CACHES
228####
229
230class CanonicalPath(object):
231  """Memoizing calculation of realpaths.  realpath(x) is the 'canonical'
232  version of x, with all symbolic links eliminated.
233  """
234
235  def __init__(self):
236    self.cache = {}
237
238  def Canonicalize(self, filepath):
239    """Find a really canonical path, possibly memoized.
240
241    Arguments:
242      filepath: a filepath (string)
243    Returns:
244      the realpath of filepath (string)
245
246    The following is irrelevant if we always use the distcc_pump_c_extensions
247    realpath function.
248    ---
249    Apparently, in some versions of Python 2.4 at least, realpath does
250    *not* resolve the last component of a filepath if it is a link:
251       https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1213894&group_id=5470
252    Make up for that: follow that final link until a real realpath has
253    been found.
254
255    Also, realpath is not idempotent.
256
257    Solution (?): turn filepath into abspath before applying realpath;
258    then we can cache results as well (without worring about value of
259    current directory).
260
261    The final problem -- that os.path.realpath is very slow, at least
262    an order of magnitude slower than the gnu libc one --- is solved
263    through caching all uses through an object of the present class.
264    """
265    assert isinstance(filepath, str)
266    try:
267      return self.cache[filepath]
268    except KeyError:
269      if _path_realpath_works:
270        r = _PathRealpath(filepath)
271        self.cache[filepath] = r
272        return r
273      # Fix for os.path.realpath idempotencey bug (Python 2.4).
274      filepath_ = os.path.abspath(filepath)
275      filepath_ = _PathRealpath(filepath_)
276      # Fix for os.path.realpath bug (Python 2.4): symlinks at end not
277      # resolved.
278      for unused_i in range(10):
279        if not os.path.islink(filepath_):
280          break
281        filepath_ = os.path.join(os.path.dirname(filepath_),
282                                 os.readlink(filepath_))
283      else:
284        raise NotCoveredError("Too many symlinks in '%s'." % filepath)
285      self.cache[filepath] = filepath_
286      return filepath_
287
288class DirnameCache(object):
289  """Cache the mapping from filepath pairs to index of their directory names.
290
291  The key is a triple (currdir_idx, searchdir_idx, includepath_idx).  The
292  value is
293
294    (dir_idx, dir_realpath_idx)
295
296  where dir_idx is the index of dirname of the corresponding filepath, which
297  possibly is relative, and dir_realpath_idx is the realpath index of the
298  absolute location of the dirname.  The value currdir_idx is of possible
299  importance for deteterming dir_realpath_idx, but plays no role in determining
300  dir_idx."""
301
302  def __init__(self, includepath_map, directory_map, realpath_map):
303    """Constructor.
304      Arguments:
305        includepath_map: the map used to construct the includepath_idx
306           that will be passed in as arguments to Lookup().
307        directory_map: the map used to construct both the currdir_idx
308           and searchdir_idx that will be passed in as arguments to
309           Lookup().  It's also the data structure that produces dir_idx.
310        realpath_map: a string-to-int map of canonicalized filepaths
311    """
312    self.includepath_map = includepath_map
313    self.directory_map = directory_map
314    self.realpath_map = realpath_map
315    self.cache = {}
316
317  def Lookup(self, currdir_idx, searchdir_idx, includepath_idx):
318    """Return the directory and realpath indices of the dirname of the input.
319
320    Arguments:
321      currdir_idx: the directory index of the current directory
322      searchdir_idx: a directory_map index
323      includepath_idx: an includepath index
324    Returns:
325      a pair (directory map index, realpath index)
326
327    See class documentation.
328
329    Example: if the strings of the arguments indices put together make
330    '/usr/include/foo/bar.h', then this routine will insert '/usr/include/foo/'
331    into self.directory_map, and then return the corresponding pair (directory
332    index of /usr/include/foo/, real path index of /usr/include/foo/).  If the
333    arguments put together form "foo.h", then the directory index returned is
334    that of "", the current directory, and the realpath index is that of
335    currdir.
336    """
337    try:
338      return self.cache[(currdir_idx, searchdir_idx, includepath_idx)]
339    except KeyError:
340      directory = os.path.dirname(os.path.join(
341         self.directory_map.string[searchdir_idx],
342         self.includepath_map.string[includepath_idx]))
343      dir_idx = self.directory_map.Index(directory)
344      rp_idx = self.realpath_map.Index(
345          os.path.join(self.directory_map.string[currdir_idx],
346                       directory))
347      self.cache[(currdir_idx, searchdir_idx, includepath_idx)] = (dir_idx,
348                                                                   rp_idx)
349      return (dir_idx, rp_idx)
350
351
352class SystemdirPrefixCache(object):
353  """A cache of information about whether a file exists in a systemdir.
354
355  A systemdir is a searchdir that is built in to the C/C++
356  preprocessor.  That is, when the preprocessor is figuring out what
357  directory an #include is in, these are the directories it's
358  hard-coded in to check (you can add other directories via -I).  This
359  cache records, for a given filepath, whether it starts with a
360  systemdir.  This is useful to identify whether the path is likely to
361  correspond to a system include-file (such as stdio.h).  Such files are
362  unlikely to change, and are likely to already exist on the distcc
363  servers, both of which are useful things to know for optimization.
364
365  For speed, users can access self.cache directly, rather than going
366  through the StartsWithSystemdir API.  Be sure to call FillCache() to
367  make sure the cache is populated, before accessing it!
368  """
369
370  def __init__(self, systemdirs):
371    """Constructor.
372
373    Argument:
374      systemdirs: the list of system-directories the preprocessor
375        uses.  It's a list of strings, probably extracted from the
376        preprocessor itself.  Each systemdir should end in a slash.
377
378    In practice, systemdirs will start empty, and later some routine
379    (in parse_command.py) will magically fill it.  So be sure to wait
380    for that before calling FillCache!
381    TODO(csilvers): normalize this; ideally pass systemdirs in to FillCache.
382    """
383    self.systemdirs = systemdirs
384    # self.cache[i] will be True, False, or None for not-yet-checked.
385    self.cache = [None]
386
387  def FillCache(self, realpath_map):
388    """Ensures that there's a cache entry for every index in realpath_map.
389
390    Argument:
391     realpath_map: a string-to-int map of canonicalized filepaths we know.
392
393    After this function is called, the cache entry is True iff
394    realpath.startswith(systemdir) is True for any of the systemdirs
395    passed in to our constructor.
396    """
397    if len(self.cache) >= realpath_map.Length():
398      return    # we're already all full
399    for realpath_idx in range(len(self.cache), realpath_map.Length()):
400      realpath = realpath_map.string[realpath_idx]
401      for systemdir in self.systemdirs:
402        if realpath.startswith(systemdir):
403          self.cache.append(True)
404          break
405      else:    # we get here if the for never 'break'ed
406        self.cache.append(False)
407
408    assert len(self.cache) == realpath_map.Length()
409
410  def StartsWithSystemdir(self, realpath_idx, realpath_map):
411    """Return True iff realpath starts with a systemdir.
412
413    Arguments:
414     realpath_idx: the index of the realpath we want to check.
415     realpath_map: the map from realpath_idx to a string.
416
417    Return True iff realpath.startswith(systemdir) for any of the
418    systemdirs passed in to our constructor.  (For speed, you can
419    access self.cache directly instead of calling this, but make
420    sure FillCache() has been called first!)
421    """
422    self.FillCache(realpath_map)
423    return self.cache[realpath_idx]
424
425
426####
427#### MAP_TO_INDEX AND ITS SPECIALIZATIONS
428####
429
430class MapToIndex(object):
431  """Maps every object it sees to a unique small integer.  In
432  practice, this class is used to map path-components (which are strings).
433  """
434
435  def __init__(self):
436    """Constructor.
437
438    Instance variables:
439      map: a dictionary such that map[path] is the index of path
440      string: a list satisfying: string[i] is the path such that map[path] = i
441    """
442
443    # Do not make the mistake of letting a real index be 0. (Hint:
444    # because "if path:" then does not distinguish between 0 and None.)
445    self.index = {None:None}
446    self.string = [None]
447
448  def _Invariant_(self):
449    return len(self.index) == len(self.string)
450
451  def Index(self, path):
452    """Returns the index i > 0 of path."""
453    assert self._Invariant_()
454    try:
455      return self.index[path]
456    except KeyError:
457      self.index[path] = len(self.string)
458      self.string.append(path)
459      return len(self.string) - 1
460
461  def String(self, i):
462    """Returns the path such that Index(path) == i."""
463    assert self._Invariant_()
464    assert 0 < i < self.Length()
465    return self.string[i]
466
467  def Length(self):
468    """One more than the number of elements indexed."""
469    assert self._Invariant_()
470    return len(self.string)
471
472
473class DirectoryMapToIndex(MapToIndex):
474  """Like a normal MapToIndex, but assumes the keys are directories,
475  and in particular, directories without a trailing slash (eg "/etc").
476  It stores the directories in the map, but appends the trailing slash
477  first.  This is another type of normalization, and useful for cheap
478  path-joining (eg using + instead of os.path.join).
479  """
480
481  def Index(self, directory):
482    """Return index d > 0 of normalized directory.
483    Argument:
484      directory: a string, either empty or not ending in '/'.
485
486    The empty string is not changed, but other strings are stored with
487    a '/' appended.
488    """
489    if directory != "" and directory != "/":
490      assert directory[-1] != '/', directory
491      directory = directory + '/'
492    return MapToIndex.Index(self, directory)
493
494
495class RelpathMapToIndex(MapToIndex):
496  """Like a normal MapToIndex, but assumes the keys are relative
497  filesystem paths, that is, filesystem paths not starting with /.
498  This is useful for "cheap" normalization: this invariant ensures that
499  os.path.join(some-directorymap-string, some-relpathmap-string) can
500  be implemented using +.
501
502  We actually do allow storing absolute paths if option
503  --unsafe_absolute_includes is in use.  But, then, we're careful in Resolve
504  (below) to bail out.
505  """
506
507  def Index(self, relpath, ignore_absolute_path_warning=False):
508    """Return index d > 0 of relative path.
509    Args:
510      directory: a string not starting with /.
511      ignore_absolute_path_warning: a Boolean
512
513    The variable ignore_absolute_path_warning is set to True in order to
514    override the requirement that filepaths are relative. This is useful for the
515    compilation unit filepath and filepaths of -include's: they are permitted to
516    be absolute because the command line can still be rewritten on the server.
517    The server tweaks their location to become relative to the server root.
518    """
519    if os.path.isabs(relpath) and not ignore_absolute_path_warning:
520      if basics.opt_unsafe_absolute_includes:
521        Debug(DEBUG_WARNING,
522              "absolute filepath '%s' was IGNORED"
523              " (correctness of build may be affected)", relpath)
524      else:
525          raise NotCoveredError("Filepath must be relative but isn't: '%s'."
526                                " Consider setting INCLUDE_SERVER_ARGS='--"
527                                "unsafe_absolute_includes'."
528                                % relpath,
529                                send_email=False)
530    # Now, remove leading "./" so as not to start an infinite regression when
531    # say foo.c contains:
532    #
533    #   #include "./foo.c"
534    #
535    # which mighy seduce a recursive include analyzer down the forbidden path:
536    #
537    #   "foo.c", # "./foo.c", "././foo.c." etc.
538    while relpath.startswith("./"):
539      relpath = relpath[2:]
540    return MapToIndex.Index(self, relpath)
541
542
543class CanonicalMapToIndex(MapToIndex):
544  """Like a normal MapToIndex, but assumes the keys are absolute
545  filepaths, and canonicalizes them before inserting into the map.
546  'Canonicalize' means to do the equivalent of os.path.realpath(),
547  which mostly involves resolving symlinks in the filepath.
548  """
549
550  def __init__(self, canonicalize):
551    """Constructor.
552    Argument:
553      canonicalize: an instance of the CanonicalPath cache."""
554    MapToIndex.__init__(self)
555    self.canonicalize = canonicalize
556
557  def Index(self, filepath):
558    """Return the realpath index r of filepath.  filepath should be
559    an absolute filename.
560    """
561    return MapToIndex.Index(self, self.canonicalize(filepath))
562
563
564def RetrieveDirectoriesExceptSys(directory_map, realpath_map,
565                                 systemdir_prefix_cache, directory_idxs):
566  """Calculate the set of non-system directories of an index list.
567
568  Arguments:
569    directory_map: a DirectoryMapToIndex cache
570    realpath_map: a CanonicalMapToIndex cache
571    directory_idxs: a list or tuple of directory_map indices
572  Returns:
573    the corresponding tuple of directories except for those whose
574    realpath has a prefix that is a sysdir
575
576  The directories in the returned list have their trailing '/'
577  stripped.
578  """
579  result = []
580  for dir_idx in directory_idxs:
581    # Index the absolute path; this will let us know whether dir_idx is under a
582    # default systemdir of the compiler.
583    rp_idx = realpath_map.Index(os.path.join(
584	os.getcwd(), directory_map.string[dir_idx]))
585    systemdir_prefix_cache.FillCache(realpath_map)
586    if not systemdir_prefix_cache.cache[rp_idx]:
587      result.append(directory_map.string[dir_idx].rstrip('/'))
588  return tuple(result)
589
590
591####
592#### THE STAT CACHES
593####
594
595class SimpleBuildStat(object):
596  """Stat cache that works with strings, not indices."""
597
598  def __init__(self):
599    self.cache = {}
600
601  def Lookup(self, filepath):
602    """Returns true if filepath exists."""
603    try:
604      return self.cache[filepath]
605    except KeyError:
606      result = self.cache[filepath] = _OsPathExists(filepath)
607      return result
608
609
610class BuildStatCache(object):
611  """A highly optimized mechanism for stat queries of filepaths,
612  as represented by a triple of indexes: currdir_idx, searchdir_idx,
613  filepath_idx.  Given this input, we can say whether a regular file
614  represented by this triple exists on the filesystem, and if so,
615  what its canonical pathname is: that is, the pathname after all
616  symlinks have been resolved.
617
618  The hash table is three-level structure:
619   - build_stat[currdir_idx] contains an array for each includepath_idx
620   - build_stat[currdir_idx][includepath_idx] is this array, and
621   - build_stat[currdir_idx][includepath_idx][searchdir_idx] is either
622      * False if os.path.join(currdir, searchdir, includepath) does not exist
623      * True if it does
624      * None when it is not known whether it exists or not
625  In addition, we keep a parallel structure for the realpath, that lets us
626  quickly map from a filepath to os.path.realpath(filepath).
627   - real_stat[currdir_idx] contains an array for each fp
628   - real_stat[currdir_idx][includepath_idx] is this array, and
629   - real_stat[currdir_idx][includepath_idx][searchdir_idx] is either
630      * realpath_idx, such that realpath_map.string[realpath_idx] =
631           os.path.realpath(os.path.join(currdir, searchdir, includepath))
632        when build_stat[currdir_idx][includepath_idx][searchdir_idx] = True
633      * None, otherwise
634  """
635
636  def __init__(self, includepath_map, directory_map, realpath_map):
637    self.build_stat = {}
638    self.real_stat = {}
639    self.includepath_map = includepath_map
640    self.directory_map = directory_map
641    self.realpath_map = realpath_map
642    self.path_observations = []
643
644  def _Verify(self, currdir_idx, searchdir_idx, includepath_idx):
645    """Verify that the cached result is the same as obtained by stat call.
646    Prerequisite: we've done a chdir(currdir) before this call.
647    """
648    assert 1 <= includepath_idx < self.includepath_map.Length()
649    assert 1 <= searchdir_idx < self.directory_map.Length()
650    if __debug__: statistics.sys_stat_counter += 1
651
652    # Since we know directory_map entries end in /, and includepaths don't
653    # start with / (who does "#include </usr/include/string.h>"??), we can
654    # use + instead of the more expensive os.path.join().
655    # Make sure $PWD is currdir, so we don't need to include it in our stat().
656    assert os.getcwd() + '/' == self.directory_map.string[currdir_idx]
657    really_exists = _OsPathIsFile(
658      self.directory_map.string[searchdir_idx]
659      + self.includepath_map.string[includepath_idx])
660    cache_exists = self.build_stat[currdir_idx][includepath_idx][searchdir_idx]
661    assert isinstance(cache_exists, bool)
662    if cache_exists != really_exists:
663      filepath = os.path.join(self.directory_map.string[currdir_idx],
664                              self.directory_map.string[searchdir_idx],
665                              self.includepath_map.string[includepath_idx])
666      sys.exit("FATAL ERROR: "
667               "Cache inconsistency: '%s' %s, but earlier this path %s." % (
668        filepath,
669        really_exists and "exists" or "does not exist",
670        cache_exists and "existed" or "did not exist"))
671
672  def WarnAboutPathObservations(self, translation_unit):
673    """Print new paths found according to path observation expression option.
674
675    Args:
676      translation_unit: a string embedded in warning
677    """
678    for (includepath, relpath, realpath) in self.path_observations:
679      Debug(DEBUG_WARNING,
680            "For translation unit '%s',"
681            " lookup of file '%s' resolved to '%s' whose realpath is '%s'.",
682             translation_unit, includepath, relpath, realpath)
683    self.path_observations = []
684
685  def Resolve(self, includepath_idx, currdir_idx, searchdir_idx,
686              searchlist_idxs):
687    """Says whether (currdir_idx, searchdir_idx, includepath_idx) exists,
688    and if so what its canonicalized form is (with symlinks resolved).
689    TODO(csilvers): rearrange the order of the arguments.
690
691    Args:
692      includepath_idx: The index of an includepath, from e.g. "#include <foo>"
693      currdir_idx:     The index of the current working dir.  Note that we
694                       require os.getcwd() == currdir before calling Resolve!
695      searchdir_idx:   A single searchdir, which is prepended to searchlist,
696                       or None to not prepend to the searchlist.
697      searchlist_idxs: A list of directory indices.
698
699    Returns:
700      1) (None, None) if, for all sl_idx in [searchdir_idx] + searchlist_idxs,
701         os.path.join(currdir, sp, includepath) does not exist.
702      2) ((sl_idx, includepath_idx), realpath_idx)
703         if, for some sl_idx in [searchdir_idx] + searchlist_idxs,
704         os.path.join(currdir, sp, includepath) does exist.  In this case,
705         sl_idx is the index of the first searchlist entry for which the
706         exists-test succeeds, and realpath_idx is the index into the
707         realpath_map of os.path.join(currdir, sp, includepath).
708
709    Again, we require as a prequesite that os.getcwd() must equal currdir:
710       os.getcwd() + '/' == self.directory_map.string[currdir_idx]
711    """
712    includepath = self.includepath_map.string[includepath_idx]
713    if includepath.startswith('/'):
714      # We really don't want to start exploring absolute includepaths; what's
715      # the sl_idx to return for example? And what about the use of '+'
716      # (as an optimization) below instead of os.path.join.
717      return (None, None)
718    dir_map_string = self.directory_map.string   # memoize the fn pointer
719    build_stat = self.build_stat
720    real_stat = self.real_stat
721    if __debug__:
722      dir_map = self.directory_map
723      assert 0 < includepath_idx < self.includepath_map.Length()
724      assert 0 < currdir_idx < dir_map.Length()
725      assert searchdir_idx is None or 1 <= searchdir_idx < dir_map.Length()
726      for sl_idx in searchlist_idxs:
727        assert sl_idx < dir_map.Length()
728      assert os.getcwd() + '/' == dir_map_string[currdir_idx]
729      Debug(DEBUG_TRACE2, "Resolve: includepath: '%s', currdir: '%s', "
730            "searchdir: '%s', searchlist: %s" %
731            (includepath,
732             dir_map_string[currdir_idx],
733             searchdir_idx and dir_map_string[searchdir_idx],
734             "  \n".join([dir_map_string[idx] for idx in searchlist_idxs])))
735    try:
736      # Locate the array (list) relative to currdir_idx and includepath_idx
737      searchdir_stats = build_stat[currdir_idx][includepath_idx]
738      # Locate the corresponding array of realpath names
739      searchdir_realpaths = real_stat[currdir_idx][includepath_idx]
740    except KeyError:    # We'll need to grow the relevant arrays
741      currdir_stats = build_stat.setdefault(currdir_idx, {})
742      currdir_realpaths = real_stat.setdefault(currdir_idx, {})
743      searchdir_stats = currdir_stats[includepath_idx] = \
744                        [None] * DIR_ARRAY_SIZE
745      searchdir_realpaths = currdir_realpaths[includepath_idx] = \
746                        [None] * DIR_ARRAY_SIZE
747
748    # Try searchdir_idx if not None, then try every index in searchlist_idxs.
749    # This inner loop may be executed tens of millions of times.
750    # Do not try to form [searchdir_idx] + searchlist_idxs -- too expensive!
751    for searchlist in (searchdir_idx and [searchdir_idx] or [],
752                       searchlist_idxs):
753      for sl_idx in searchlist:
754        if __debug__:
755          statistics.search_counter += 1
756          statistics.build_stat_counter += 1
757        try:
758          # We expect that searchdir_stats[sl_idx] == False, because
759          # we've usually seen sl_idx before for our includepath and
760          # our currdir --- and includepath does not usually exist
761          # relative to the sp directory.  We're optimizing for this
762          # case of course. That should give us a rate of a couple of
763          # million iterations per second (for this case).
764          if searchdir_stats[sl_idx] == False:
765            if __debug__: self._Verify(currdir_idx, sl_idx, includepath_idx)
766            continue
767          if searchdir_stats[sl_idx]:
768            if __debug__: self._Verify(currdir_idx, sl_idx, includepath_idx)
769            return ((sl_idx, includepath_idx), searchdir_realpaths[sl_idx])
770        except IndexError:   # DIR_ARRAY_SIZE wasn't big enough; let's double
771          searchdir_stats.extend([None] * max(sl_idx, len(searchdir_stats)))
772          searchdir_realpaths.extend([None] * max(sl_idx, len(searchdir_stats)))
773
774        # If we get here, result is not cached yet.
775        if __debug__: statistics.sys_stat_counter += 1
776        # We do not explicitly take into account currdir_idx, because
777        # of the check above that os.getcwd is set to current_dir.
778        relpath = dir_map_string[sl_idx] + includepath
779        if _OsPathIsFile(relpath):
780          searchdir_stats[sl_idx] = True
781          rpath = os.path.join(dir_map_string[currdir_idx], relpath)
782          realpath_idx = searchdir_realpaths[sl_idx] = (
783            self.realpath_map.Index(rpath))
784          # This is the place to catch errant files according to user defined
785          # regular expression path_observation_re.
786          if basics.opt_path_observation_re:
787            realpath = self.realpath_map.string[realpath_idx]
788            if basics.opt_path_observation_re.search(realpath):
789              self.path_observations.append((includepath, relpath, realpath))
790          return ((sl_idx, includepath_idx), realpath_idx)
791        else:
792          searchdir_stats[sl_idx] = False
793
794    if __debug__: Debug(DEBUG_TRACE2, "Resolve: failed")
795    return (None, None)
796
797
798class SetUpCaches(object):
799  """Erect the edifice of caches.
800
801  Instance variables:
802    includepath_map: RelpathMapToIndex
803    directory_map: DirectoryMapToIndex
804    realpath_map: CanonicalMapToIndex
805
806    canonical_path: CanonicalPath
807    build_stat_cache: BuildStatCache
808    dirname_cache: DirnameCache
809    simple_build_stat: SimpleBuildStat
810
811    client_root: a path such as /dev/shm/tmpX.include_server-X-1
812                 (used during default system dir determination)
813
814    IsFilepathIndex: test for filepath index
815    IsDirectoryIndex: test for director index
816    IsRealpathIndex: test for realpath index
817    IsFilepathPair: test for filepath pair
818  """
819
820  def __init__(self, client_root):
821
822    # A memoizing (caching) class to canonicalize a path: mostly by
823    # resolving any symlinks in the path-component.
824    self.canonical_path = CanonicalPath()
825
826    # The index-map for includepath names: things seen after '#include'.
827    self.includepath_map = RelpathMapToIndex()
828
829    # The index-map for searchdir names and currdir as well.  Also used any
830    # other time we have something we know is a directory (eg dirname(foo)).
831    self.directory_map = DirectoryMapToIndex()
832
833    # The index-map for realpaths: the full pathname of an include, with
834    # symlinks resolved and such (hence the name realpath).
835    self.realpath_map = CanonicalMapToIndex(self.canonical_path.Canonicalize)
836
837    # A cache of the directory part of filepaths.  Note it uses the
838    # directory_map to actually store the mapping.
839    self.dirname_cache = DirnameCache(self.includepath_map, self.directory_map,
840                                      self.realpath_map)
841
842    # A cache of whether a realpath starts with a system searchdir or
843    # not.  Note: at this time, system_dirs_default_all will be empty.
844    # It will get filled via processing in parse_command.py.  This is
845    # why we need to store the compiler_defaults instance, to make
846    # sure "our" system_dirs_default_all is updated.
847    # TODO(csilvers): get rid of this once prefix_cache TODO is cleaned up
848    self.compiler_defaults = compiler_defaults.CompilerDefaults(
849      self.canonical_path.Canonicalize, client_root)
850    self.systemdir_prefix_cache = SystemdirPrefixCache(
851      self.compiler_defaults.system_dirs_default_all)
852
853    # The main caches, that say whether a file exists or not.  We have
854    # two: a simple one that takes a filepath (string) as an argument,
855    # and the complicated one that works with index-triples.
856    self.simple_build_stat = SimpleBuildStat()
857    self.build_stat_cache = BuildStatCache(self.includepath_map,
858                                           self.directory_map,
859                                           self.realpath_map)
860
861    # Convenient function closures to test for various semantic datatypes.
862    self.IsIncludepathIndex = (lambda x:
863                                 isinstance(x, int)
864                                 and 0 < x < self.includepath_map.Length())
865
866    self.IsSearchdirIndex = (lambda x:
867                               isinstance(x, int)
868                               and 0 < x < self.directory_map.Length())
869
870    self.IsCurrdirIndex = (lambda x:
871                             isinstance(x, int)
872                             and 0 < x < self.directory_map.Length())
873
874    self.IsFilepathPair = (lambda x:
875                             isinstance(x, tuple)
876                             and len(x) == 2
877                             and self.IsSearchdirIndex(x[0])
878                             and self.IsIncludepathIndex(x[1]))
879
880    self.IsRealpathIndex = (lambda x:
881                              isinstance(x, int)
882                              and 0 < x < self.realpath_map.Length())
883