1#! /usr/bin/env python3 2 3# Copyright 2007 Google Inc. 4# 5# This program is free software; you can redistribute it and/or 6# modify it under the terms of the GNU General Public License 7# as published by the Free Software Foundation; either version 2 8# of the License, or (at your option) any later version. 9# 10# This program is distributed in the hope that it will be useful, 11# but WITHOUT ANY WARRANTY; without even the implied warranty of 12# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13# GNU General Public License for more details. 14# 15# You should have received a copy of the GNU General Public License 16# along with this program; if not, write to the Free Software 17# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, 18# USA. 19# 20 21"""Classes enabling definition and composition of caches. 22 23This file defines caches used to speed up the does-this-file-exist 24test that forms the basis of the C preprocessor's include-file 25handling, and takes most of its time. 26 27When the preprocessor sees a line like "#include <foo/bar.h>" it looks 28for a file named "bar.h" in many directories: /usr/include/foo/bar.h, 29./foo/bar.h, and so forth. More precisely, the preprocessor is given 30a "search path", which is a list of directory-names. (By default, the 31search-path looks like ['/usr/include', '/usr/local/include', ...], 32but it's often extended via gcc flags like -I, -isystem, -iprefix, 33etc.) To resolve a single #include like "#include <foo/bar.h>", the 34preprocessor goes through every directory in the search path, running 35 36 os.stat(os.path.join(current_working_dir, search_dir, 'foo/bar.h')) 37 38until the stat call succeeds. With dozens of search-dirs to look 39through, dozens of #include lines per source file, and hundreds of 40source files per compilation, this can add up to millions of stat 41calls. Many of these calls are exactly the same, so caching is a big 42win. 43 44The cache of stat calls takes a filename as input and produces a bool 45as output, saying if the filename exists. For reasons that will 46become clear in a moment, we actually represent the input filename as 47a triple that breaks the filename into its three components: 48 49 1) currdir: the current working directory (usually os.path.absdir('.')) 50 2) searchdir: an element of the search path (eg '/usr/include', 'base') 51 3) includepath: the thing that comes after "#include" in source files 52 ("foo/bar.h" in our examples above). 53 54Why do we break the input into three parts? Consider what cache-lookups 55we have to do for a single source file: 56 cache[os.path.join(currdir, searchdir1, includepath1)] # #include <ipath1> 57 cache[os.path.join(currdir, searchdir2, includepath1)] # #include <ipath1> 58 cache[os.path.join(currdir, searchdir3, includepath1)] # #include <ipath1> 59 [etc...until the cache-lookup returns True] 60 cache[os.path.join(currdir, searchdir1, includepath2)] # #include <ipath2> 61 cache[os.path.join(currdir, searchdir2, includepath2)] # #include <ipath2> 62 cache[os.path.join(currdir, searchdir3, includepath2)] # #include <ipath2> 63 [etc] 64 65By having the key be a triple, we avoid all those unnecessary 66os.path.join calls. But even if we do this, we notice bigger fish 67to fry: the Python interpreter still has to do a string-hash of 68currdir for every lookup, and also has to string-hash searchdirX and 69includepathX many times. It would be much more efficient if we did 70those hashes ourselves, reducing the number of string-hashes from 71O(|search-path| * |#include lines|) to 72O(|search-path| + |#include lines|). 73 74This motivates (finally!) the data structures in this file. We have 75three string-to-number maps, for mapping each currdir, searchdir, and 76includepath to a small integer. We put that all together in a cache, 77that takes a triple of integers as its key and produces True if the 78file exists, False if it does not, or None if its status is unknown. 79 80The String-to-number Map(s) 81--------------------------- 82The basic map that converts a filepath-path -- a currdir, searchdir, 83or includepath -- to a small integer is called MapToIndex. MapToIndex 84provides mapping in both directions: 85 index: a dictionary mapping paths (strings) to indices in 1..N, and 86 string: an array of size N + 1 that implements the reverse mapping 87 88So: 89 obj.string[obj.index[path_as_string]] == path_as_string 90 obj.index[obj.string[path_as_number]] == path_as_number 91 92Note we map from 1..N, and not 0..N-1, which leave us 0 free to use as 93a synonym for None or False. 94 95There are also classes that specialize MapToIndex for specific purposes. 96 97DirectoryMapToIndex assumes the input is a directory, and in 98particular a directory that does not have a slash at the end of it (eg 99"/etc"). It adds the trailing slash before inserting into the map. 100This is useful because it allows us to use + to join this directory 101with a relative filename, rather than the slower os.path.join(). 102 103RelpathMapToIndex assumes the input is a relative filepath, that is, 104one that does not start with /. When combined with DirectoryMapToIndex 105entries, + can be used as a fast alternative to os.path.join(). 106 107CanonicalMapToIndex is a MapToIndex that canonializes its input before 108inserting it into the map: resolving symlinks, getting rid of ..'s, 109etc. It takes an absolute path as input. 110 111Other Caches 112------------ 113Besides the maps from strings to integers, there are three other caches. 114One is the realpath-cache, that takes a filename and returns 115os.path.realpath(filename). We cache this because os.path.realpath() 116is very slow. This is called CanonicalPath. 117 118The second cache, the DirnameCache, maps an arbitrary pathname to 119dirname(pathname), that is, the directory the pathname is in. The 120input pathname is represented by a (currdir_idx, searchdir_idx, 121includepath_idx) triple. The output is likewise represented as a 122number: an index into the DirectoryMapToIndex structure. 123 124The third cache is called SystemdirPrefixCache. It tells you, for a 125given absolute filepath, whether it is prefixed by a systemdir (that 126is, one of the searchdirs that's built into cpp, such as /usr/include). 127This is useful to cache because there are several systemdirs, and it's 128expensive to check them all each time. 129 130Naming Conventions 131------------------ 132 currdir: the current working dir. 133 searchdir: an element of the search-path (places cpp looks for .h files). 134 includepath: the string a source file #includes. 135 realpath: a full filepath with all its symlinks resolved: 136 os.path.realpath(os.path.join(currdir, searchdir, includepath)) 137 FOO_idx: the small integer associated with the string FOO. 138 139 includepath_map: the map that takes includepaths to their idx and back 140 (a RelpathMapToIndex). 141 directory_map: the map that takes currdirs and searchdirs to their 142 idx and back. It also is used to store dirname(filepath) for arbitrary 143 filepaths -- basically, anything we know is a directory (a 144 DirectoryMapToIndex). 145 realpath_map: the map that takes full filepaths to their idx and back, 146 canonicalizing them first (by resolving symlinks) (a 147 CanonicalMapToIndex). 148 149 searchlist: a list of searchdirs. In gcc/cpp documentation, this is 150 called the "search path", but for consistency, in this code we reserve 151 the name "path" to mean "filesystem component," never "list of dirs". 152 (A list of strings). 153 systemdir: a searchdir that's built into cpp, rather than set via -I. 154 (A string.) 155 156 resolved_filepath: given an includepath, and a (possibly implicit) 157 currdir and searchlist, the resolved_filepath is 158 os.path.join(currdir, searchdir, includepath) 159 for the first searchdir in searchlist for which the joined string 160 exists. This path can be represented in many ways: 1) a string like 161 "foo/bar/baz.h" (if so, this string has been canonicalized to resolve 162 symlinks and the like); 2) an index into realpath_map associated with 163 that string; 3) a triple of indices; or 4) a pair of indices plus an 164 assumption that os.getcwd() == currdir. 165 166Pair Representation of Filepaths 167------------------------------- 168A file is uniquely determined by the triple 169 (currdir_idx, searchdir_idx, includepath_idx) 170For a single compilation unit, the code will often start with a 171chdir(currdir). After that, we often refer to a file by the pair 172 (searchdir_idx, includepath_idx) 173which might be either an absolute filename or relative to $PWD. 174 175We refer to this pair as a filepath_pair. 176TODO(csilvers): find a better name? 177 178The function IsFilepathPair(x) tests whether x is a pair that could 179plausibly have a searchdir_idx as its first element and an 180includepath_idx as its second. 181 182Tests 183----- 184This code is currently only tested by regression tests of modules 185using this one. 186""" 187 188__author__ = "opensource@google.com (Nils Klarlund, Craig Silverstein)" 189 190import os 191import os.path 192import sys 193 194import basics 195import statistics 196import compiler_defaults 197 198DIR_ARRAY_SIZE = 500 199 200# We currently use the stat and realpath of GNU libc stat and 201# realpath. They are about an order of magnitude faster than their 202# Python counterparts, even when called through the Python/C 203# interface. 204 205try: 206 import distcc_pump_c_extensions 207 _OsPathExists = distcc_pump_c_extensions.OsPathExists 208 _OsPathIsFile = distcc_pump_c_extensions.OsPathIsFile 209 _PathRealpath = distcc_pump_c_extensions.Realpath 210 _path_realpath_works = True 211except ImportError: 212 _OsPathExists = os.path.exists 213 _OsPathIsFile = os.path.isfile 214 _PathRealpath = os.path.realpath 215 # os.path.realpath might have some bugs. TODO(csilvers): check that here 216 _path_realpath_works = False 217 218Debug = basics.Debug 219DEBUG_TRACE = basics.DEBUG_TRACE 220DEBUG_TRACE1 = basics.DEBUG_TRACE1 221DEBUG_TRACE2 = basics.DEBUG_TRACE2 222DEBUG_WARNING = basics.DEBUG_WARNING 223NotCoveredError = basics.NotCoveredError 224 225 226#### 227#### SIMPLE CACHES 228#### 229 230class CanonicalPath(object): 231 """Memoizing calculation of realpaths. realpath(x) is the 'canonical' 232 version of x, with all symbolic links eliminated. 233 """ 234 235 def __init__(self): 236 self.cache = {} 237 238 def Canonicalize(self, filepath): 239 """Find a really canonical path, possibly memoized. 240 241 Arguments: 242 filepath: a filepath (string) 243 Returns: 244 the realpath of filepath (string) 245 246 The following is irrelevant if we always use the distcc_pump_c_extensions 247 realpath function. 248 --- 249 Apparently, in some versions of Python 2.4 at least, realpath does 250 *not* resolve the last component of a filepath if it is a link: 251 https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1213894&group_id=5470 252 Make up for that: follow that final link until a real realpath has 253 been found. 254 255 Also, realpath is not idempotent. 256 257 Solution (?): turn filepath into abspath before applying realpath; 258 then we can cache results as well (without worring about value of 259 current directory). 260 261 The final problem -- that os.path.realpath is very slow, at least 262 an order of magnitude slower than the gnu libc one --- is solved 263 through caching all uses through an object of the present class. 264 """ 265 assert isinstance(filepath, str) 266 try: 267 return self.cache[filepath] 268 except KeyError: 269 if _path_realpath_works: 270 r = _PathRealpath(filepath) 271 self.cache[filepath] = r 272 return r 273 # Fix for os.path.realpath idempotencey bug (Python 2.4). 274 filepath_ = os.path.abspath(filepath) 275 filepath_ = _PathRealpath(filepath_) 276 # Fix for os.path.realpath bug (Python 2.4): symlinks at end not 277 # resolved. 278 for unused_i in range(10): 279 if not os.path.islink(filepath_): 280 break 281 filepath_ = os.path.join(os.path.dirname(filepath_), 282 os.readlink(filepath_)) 283 else: 284 raise NotCoveredError("Too many symlinks in '%s'." % filepath) 285 self.cache[filepath] = filepath_ 286 return filepath_ 287 288class DirnameCache(object): 289 """Cache the mapping from filepath pairs to index of their directory names. 290 291 The key is a triple (currdir_idx, searchdir_idx, includepath_idx). The 292 value is 293 294 (dir_idx, dir_realpath_idx) 295 296 where dir_idx is the index of dirname of the corresponding filepath, which 297 possibly is relative, and dir_realpath_idx is the realpath index of the 298 absolute location of the dirname. The value currdir_idx is of possible 299 importance for deteterming dir_realpath_idx, but plays no role in determining 300 dir_idx.""" 301 302 def __init__(self, includepath_map, directory_map, realpath_map): 303 """Constructor. 304 Arguments: 305 includepath_map: the map used to construct the includepath_idx 306 that will be passed in as arguments to Lookup(). 307 directory_map: the map used to construct both the currdir_idx 308 and searchdir_idx that will be passed in as arguments to 309 Lookup(). It's also the data structure that produces dir_idx. 310 realpath_map: a string-to-int map of canonicalized filepaths 311 """ 312 self.includepath_map = includepath_map 313 self.directory_map = directory_map 314 self.realpath_map = realpath_map 315 self.cache = {} 316 317 def Lookup(self, currdir_idx, searchdir_idx, includepath_idx): 318 """Return the directory and realpath indices of the dirname of the input. 319 320 Arguments: 321 currdir_idx: the directory index of the current directory 322 searchdir_idx: a directory_map index 323 includepath_idx: an includepath index 324 Returns: 325 a pair (directory map index, realpath index) 326 327 See class documentation. 328 329 Example: if the strings of the arguments indices put together make 330 '/usr/include/foo/bar.h', then this routine will insert '/usr/include/foo/' 331 into self.directory_map, and then return the corresponding pair (directory 332 index of /usr/include/foo/, real path index of /usr/include/foo/). If the 333 arguments put together form "foo.h", then the directory index returned is 334 that of "", the current directory, and the realpath index is that of 335 currdir. 336 """ 337 try: 338 return self.cache[(currdir_idx, searchdir_idx, includepath_idx)] 339 except KeyError: 340 directory = os.path.dirname(os.path.join( 341 self.directory_map.string[searchdir_idx], 342 self.includepath_map.string[includepath_idx])) 343 dir_idx = self.directory_map.Index(directory) 344 rp_idx = self.realpath_map.Index( 345 os.path.join(self.directory_map.string[currdir_idx], 346 directory)) 347 self.cache[(currdir_idx, searchdir_idx, includepath_idx)] = (dir_idx, 348 rp_idx) 349 return (dir_idx, rp_idx) 350 351 352class SystemdirPrefixCache(object): 353 """A cache of information about whether a file exists in a systemdir. 354 355 A systemdir is a searchdir that is built in to the C/C++ 356 preprocessor. That is, when the preprocessor is figuring out what 357 directory an #include is in, these are the directories it's 358 hard-coded in to check (you can add other directories via -I). This 359 cache records, for a given filepath, whether it starts with a 360 systemdir. This is useful to identify whether the path is likely to 361 correspond to a system include-file (such as stdio.h). Such files are 362 unlikely to change, and are likely to already exist on the distcc 363 servers, both of which are useful things to know for optimization. 364 365 For speed, users can access self.cache directly, rather than going 366 through the StartsWithSystemdir API. Be sure to call FillCache() to 367 make sure the cache is populated, before accessing it! 368 """ 369 370 def __init__(self, systemdirs): 371 """Constructor. 372 373 Argument: 374 systemdirs: the list of system-directories the preprocessor 375 uses. It's a list of strings, probably extracted from the 376 preprocessor itself. Each systemdir should end in a slash. 377 378 In practice, systemdirs will start empty, and later some routine 379 (in parse_command.py) will magically fill it. So be sure to wait 380 for that before calling FillCache! 381 TODO(csilvers): normalize this; ideally pass systemdirs in to FillCache. 382 """ 383 self.systemdirs = systemdirs 384 # self.cache[i] will be True, False, or None for not-yet-checked. 385 self.cache = [None] 386 387 def FillCache(self, realpath_map): 388 """Ensures that there's a cache entry for every index in realpath_map. 389 390 Argument: 391 realpath_map: a string-to-int map of canonicalized filepaths we know. 392 393 After this function is called, the cache entry is True iff 394 realpath.startswith(systemdir) is True for any of the systemdirs 395 passed in to our constructor. 396 """ 397 if len(self.cache) >= realpath_map.Length(): 398 return # we're already all full 399 for realpath_idx in range(len(self.cache), realpath_map.Length()): 400 realpath = realpath_map.string[realpath_idx] 401 for systemdir in self.systemdirs: 402 if realpath.startswith(systemdir): 403 self.cache.append(True) 404 break 405 else: # we get here if the for never 'break'ed 406 self.cache.append(False) 407 408 assert len(self.cache) == realpath_map.Length() 409 410 def StartsWithSystemdir(self, realpath_idx, realpath_map): 411 """Return True iff realpath starts with a systemdir. 412 413 Arguments: 414 realpath_idx: the index of the realpath we want to check. 415 realpath_map: the map from realpath_idx to a string. 416 417 Return True iff realpath.startswith(systemdir) for any of the 418 systemdirs passed in to our constructor. (For speed, you can 419 access self.cache directly instead of calling this, but make 420 sure FillCache() has been called first!) 421 """ 422 self.FillCache(realpath_map) 423 return self.cache[realpath_idx] 424 425 426#### 427#### MAP_TO_INDEX AND ITS SPECIALIZATIONS 428#### 429 430class MapToIndex(object): 431 """Maps every object it sees to a unique small integer. In 432 practice, this class is used to map path-components (which are strings). 433 """ 434 435 def __init__(self): 436 """Constructor. 437 438 Instance variables: 439 map: a dictionary such that map[path] is the index of path 440 string: a list satisfying: string[i] is the path such that map[path] = i 441 """ 442 443 # Do not make the mistake of letting a real index be 0. (Hint: 444 # because "if path:" then does not distinguish between 0 and None.) 445 self.index = {None:None} 446 self.string = [None] 447 448 def _Invariant_(self): 449 return len(self.index) == len(self.string) 450 451 def Index(self, path): 452 """Returns the index i > 0 of path.""" 453 assert self._Invariant_() 454 try: 455 return self.index[path] 456 except KeyError: 457 self.index[path] = len(self.string) 458 self.string.append(path) 459 return len(self.string) - 1 460 461 def String(self, i): 462 """Returns the path such that Index(path) == i.""" 463 assert self._Invariant_() 464 assert 0 < i < self.Length() 465 return self.string[i] 466 467 def Length(self): 468 """One more than the number of elements indexed.""" 469 assert self._Invariant_() 470 return len(self.string) 471 472 473class DirectoryMapToIndex(MapToIndex): 474 """Like a normal MapToIndex, but assumes the keys are directories, 475 and in particular, directories without a trailing slash (eg "/etc"). 476 It stores the directories in the map, but appends the trailing slash 477 first. This is another type of normalization, and useful for cheap 478 path-joining (eg using + instead of os.path.join). 479 """ 480 481 def Index(self, directory): 482 """Return index d > 0 of normalized directory. 483 Argument: 484 directory: a string, either empty or not ending in '/'. 485 486 The empty string is not changed, but other strings are stored with 487 a '/' appended. 488 """ 489 if directory != "" and directory != "/": 490 assert directory[-1] != '/', directory 491 directory = directory + '/' 492 return MapToIndex.Index(self, directory) 493 494 495class RelpathMapToIndex(MapToIndex): 496 """Like a normal MapToIndex, but assumes the keys are relative 497 filesystem paths, that is, filesystem paths not starting with /. 498 This is useful for "cheap" normalization: this invariant ensures that 499 os.path.join(some-directorymap-string, some-relpathmap-string) can 500 be implemented using +. 501 502 We actually do allow storing absolute paths if option 503 --unsafe_absolute_includes is in use. But, then, we're careful in Resolve 504 (below) to bail out. 505 """ 506 507 def Index(self, relpath, ignore_absolute_path_warning=False): 508 """Return index d > 0 of relative path. 509 Args: 510 directory: a string not starting with /. 511 ignore_absolute_path_warning: a Boolean 512 513 The variable ignore_absolute_path_warning is set to True in order to 514 override the requirement that filepaths are relative. This is useful for the 515 compilation unit filepath and filepaths of -include's: they are permitted to 516 be absolute because the command line can still be rewritten on the server. 517 The server tweaks their location to become relative to the server root. 518 """ 519 if os.path.isabs(relpath) and not ignore_absolute_path_warning: 520 if basics.opt_unsafe_absolute_includes: 521 Debug(DEBUG_WARNING, 522 "absolute filepath '%s' was IGNORED" 523 " (correctness of build may be affected)", relpath) 524 else: 525 raise NotCoveredError("Filepath must be relative but isn't: '%s'." 526 " Consider setting INCLUDE_SERVER_ARGS='--" 527 "unsafe_absolute_includes'." 528 % relpath, 529 send_email=False) 530 # Now, remove leading "./" so as not to start an infinite regression when 531 # say foo.c contains: 532 # 533 # #include "./foo.c" 534 # 535 # which mighy seduce a recursive include analyzer down the forbidden path: 536 # 537 # "foo.c", # "./foo.c", "././foo.c." etc. 538 while relpath.startswith("./"): 539 relpath = relpath[2:] 540 return MapToIndex.Index(self, relpath) 541 542 543class CanonicalMapToIndex(MapToIndex): 544 """Like a normal MapToIndex, but assumes the keys are absolute 545 filepaths, and canonicalizes them before inserting into the map. 546 'Canonicalize' means to do the equivalent of os.path.realpath(), 547 which mostly involves resolving symlinks in the filepath. 548 """ 549 550 def __init__(self, canonicalize): 551 """Constructor. 552 Argument: 553 canonicalize: an instance of the CanonicalPath cache.""" 554 MapToIndex.__init__(self) 555 self.canonicalize = canonicalize 556 557 def Index(self, filepath): 558 """Return the realpath index r of filepath. filepath should be 559 an absolute filename. 560 """ 561 return MapToIndex.Index(self, self.canonicalize(filepath)) 562 563 564def RetrieveDirectoriesExceptSys(directory_map, realpath_map, 565 systemdir_prefix_cache, directory_idxs): 566 """Calculate the set of non-system directories of an index list. 567 568 Arguments: 569 directory_map: a DirectoryMapToIndex cache 570 realpath_map: a CanonicalMapToIndex cache 571 directory_idxs: a list or tuple of directory_map indices 572 Returns: 573 the corresponding tuple of directories except for those whose 574 realpath has a prefix that is a sysdir 575 576 The directories in the returned list have their trailing '/' 577 stripped. 578 """ 579 result = [] 580 for dir_idx in directory_idxs: 581 # Index the absolute path; this will let us know whether dir_idx is under a 582 # default systemdir of the compiler. 583 rp_idx = realpath_map.Index(os.path.join( 584 os.getcwd(), directory_map.string[dir_idx])) 585 systemdir_prefix_cache.FillCache(realpath_map) 586 if not systemdir_prefix_cache.cache[rp_idx]: 587 result.append(directory_map.string[dir_idx].rstrip('/')) 588 return tuple(result) 589 590 591#### 592#### THE STAT CACHES 593#### 594 595class SimpleBuildStat(object): 596 """Stat cache that works with strings, not indices.""" 597 598 def __init__(self): 599 self.cache = {} 600 601 def Lookup(self, filepath): 602 """Returns true if filepath exists.""" 603 try: 604 return self.cache[filepath] 605 except KeyError: 606 result = self.cache[filepath] = _OsPathExists(filepath) 607 return result 608 609 610class BuildStatCache(object): 611 """A highly optimized mechanism for stat queries of filepaths, 612 as represented by a triple of indexes: currdir_idx, searchdir_idx, 613 filepath_idx. Given this input, we can say whether a regular file 614 represented by this triple exists on the filesystem, and if so, 615 what its canonical pathname is: that is, the pathname after all 616 symlinks have been resolved. 617 618 The hash table is three-level structure: 619 - build_stat[currdir_idx] contains an array for each includepath_idx 620 - build_stat[currdir_idx][includepath_idx] is this array, and 621 - build_stat[currdir_idx][includepath_idx][searchdir_idx] is either 622 * False if os.path.join(currdir, searchdir, includepath) does not exist 623 * True if it does 624 * None when it is not known whether it exists or not 625 In addition, we keep a parallel structure for the realpath, that lets us 626 quickly map from a filepath to os.path.realpath(filepath). 627 - real_stat[currdir_idx] contains an array for each fp 628 - real_stat[currdir_idx][includepath_idx] is this array, and 629 - real_stat[currdir_idx][includepath_idx][searchdir_idx] is either 630 * realpath_idx, such that realpath_map.string[realpath_idx] = 631 os.path.realpath(os.path.join(currdir, searchdir, includepath)) 632 when build_stat[currdir_idx][includepath_idx][searchdir_idx] = True 633 * None, otherwise 634 """ 635 636 def __init__(self, includepath_map, directory_map, realpath_map): 637 self.build_stat = {} 638 self.real_stat = {} 639 self.includepath_map = includepath_map 640 self.directory_map = directory_map 641 self.realpath_map = realpath_map 642 self.path_observations = [] 643 644 def _Verify(self, currdir_idx, searchdir_idx, includepath_idx): 645 """Verify that the cached result is the same as obtained by stat call. 646 Prerequisite: we've done a chdir(currdir) before this call. 647 """ 648 assert 1 <= includepath_idx < self.includepath_map.Length() 649 assert 1 <= searchdir_idx < self.directory_map.Length() 650 if __debug__: statistics.sys_stat_counter += 1 651 652 # Since we know directory_map entries end in /, and includepaths don't 653 # start with / (who does "#include </usr/include/string.h>"??), we can 654 # use + instead of the more expensive os.path.join(). 655 # Make sure $PWD is currdir, so we don't need to include it in our stat(). 656 assert os.getcwd() + '/' == self.directory_map.string[currdir_idx] 657 really_exists = _OsPathIsFile( 658 self.directory_map.string[searchdir_idx] 659 + self.includepath_map.string[includepath_idx]) 660 cache_exists = self.build_stat[currdir_idx][includepath_idx][searchdir_idx] 661 assert isinstance(cache_exists, bool) 662 if cache_exists != really_exists: 663 filepath = os.path.join(self.directory_map.string[currdir_idx], 664 self.directory_map.string[searchdir_idx], 665 self.includepath_map.string[includepath_idx]) 666 sys.exit("FATAL ERROR: " 667 "Cache inconsistency: '%s' %s, but earlier this path %s." % ( 668 filepath, 669 really_exists and "exists" or "does not exist", 670 cache_exists and "existed" or "did not exist")) 671 672 def WarnAboutPathObservations(self, translation_unit): 673 """Print new paths found according to path observation expression option. 674 675 Args: 676 translation_unit: a string embedded in warning 677 """ 678 for (includepath, relpath, realpath) in self.path_observations: 679 Debug(DEBUG_WARNING, 680 "For translation unit '%s'," 681 " lookup of file '%s' resolved to '%s' whose realpath is '%s'.", 682 translation_unit, includepath, relpath, realpath) 683 self.path_observations = [] 684 685 def Resolve(self, includepath_idx, currdir_idx, searchdir_idx, 686 searchlist_idxs): 687 """Says whether (currdir_idx, searchdir_idx, includepath_idx) exists, 688 and if so what its canonicalized form is (with symlinks resolved). 689 TODO(csilvers): rearrange the order of the arguments. 690 691 Args: 692 includepath_idx: The index of an includepath, from e.g. "#include <foo>" 693 currdir_idx: The index of the current working dir. Note that we 694 require os.getcwd() == currdir before calling Resolve! 695 searchdir_idx: A single searchdir, which is prepended to searchlist, 696 or None to not prepend to the searchlist. 697 searchlist_idxs: A list of directory indices. 698 699 Returns: 700 1) (None, None) if, for all sl_idx in [searchdir_idx] + searchlist_idxs, 701 os.path.join(currdir, sp, includepath) does not exist. 702 2) ((sl_idx, includepath_idx), realpath_idx) 703 if, for some sl_idx in [searchdir_idx] + searchlist_idxs, 704 os.path.join(currdir, sp, includepath) does exist. In this case, 705 sl_idx is the index of the first searchlist entry for which the 706 exists-test succeeds, and realpath_idx is the index into the 707 realpath_map of os.path.join(currdir, sp, includepath). 708 709 Again, we require as a prequesite that os.getcwd() must equal currdir: 710 os.getcwd() + '/' == self.directory_map.string[currdir_idx] 711 """ 712 includepath = self.includepath_map.string[includepath_idx] 713 if includepath.startswith('/'): 714 # We really don't want to start exploring absolute includepaths; what's 715 # the sl_idx to return for example? And what about the use of '+' 716 # (as an optimization) below instead of os.path.join. 717 return (None, None) 718 dir_map_string = self.directory_map.string # memoize the fn pointer 719 build_stat = self.build_stat 720 real_stat = self.real_stat 721 if __debug__: 722 dir_map = self.directory_map 723 assert 0 < includepath_idx < self.includepath_map.Length() 724 assert 0 < currdir_idx < dir_map.Length() 725 assert searchdir_idx is None or 1 <= searchdir_idx < dir_map.Length() 726 for sl_idx in searchlist_idxs: 727 assert sl_idx < dir_map.Length() 728 assert os.getcwd() + '/' == dir_map_string[currdir_idx] 729 Debug(DEBUG_TRACE2, "Resolve: includepath: '%s', currdir: '%s', " 730 "searchdir: '%s', searchlist: %s" % 731 (includepath, 732 dir_map_string[currdir_idx], 733 searchdir_idx and dir_map_string[searchdir_idx], 734 " \n".join([dir_map_string[idx] for idx in searchlist_idxs]))) 735 try: 736 # Locate the array (list) relative to currdir_idx and includepath_idx 737 searchdir_stats = build_stat[currdir_idx][includepath_idx] 738 # Locate the corresponding array of realpath names 739 searchdir_realpaths = real_stat[currdir_idx][includepath_idx] 740 except KeyError: # We'll need to grow the relevant arrays 741 currdir_stats = build_stat.setdefault(currdir_idx, {}) 742 currdir_realpaths = real_stat.setdefault(currdir_idx, {}) 743 searchdir_stats = currdir_stats[includepath_idx] = \ 744 [None] * DIR_ARRAY_SIZE 745 searchdir_realpaths = currdir_realpaths[includepath_idx] = \ 746 [None] * DIR_ARRAY_SIZE 747 748 # Try searchdir_idx if not None, then try every index in searchlist_idxs. 749 # This inner loop may be executed tens of millions of times. 750 # Do not try to form [searchdir_idx] + searchlist_idxs -- too expensive! 751 for searchlist in (searchdir_idx and [searchdir_idx] or [], 752 searchlist_idxs): 753 for sl_idx in searchlist: 754 if __debug__: 755 statistics.search_counter += 1 756 statistics.build_stat_counter += 1 757 try: 758 # We expect that searchdir_stats[sl_idx] == False, because 759 # we've usually seen sl_idx before for our includepath and 760 # our currdir --- and includepath does not usually exist 761 # relative to the sp directory. We're optimizing for this 762 # case of course. That should give us a rate of a couple of 763 # million iterations per second (for this case). 764 if searchdir_stats[sl_idx] == False: 765 if __debug__: self._Verify(currdir_idx, sl_idx, includepath_idx) 766 continue 767 if searchdir_stats[sl_idx]: 768 if __debug__: self._Verify(currdir_idx, sl_idx, includepath_idx) 769 return ((sl_idx, includepath_idx), searchdir_realpaths[sl_idx]) 770 except IndexError: # DIR_ARRAY_SIZE wasn't big enough; let's double 771 searchdir_stats.extend([None] * max(sl_idx, len(searchdir_stats))) 772 searchdir_realpaths.extend([None] * max(sl_idx, len(searchdir_stats))) 773 774 # If we get here, result is not cached yet. 775 if __debug__: statistics.sys_stat_counter += 1 776 # We do not explicitly take into account currdir_idx, because 777 # of the check above that os.getcwd is set to current_dir. 778 relpath = dir_map_string[sl_idx] + includepath 779 if _OsPathIsFile(relpath): 780 searchdir_stats[sl_idx] = True 781 rpath = os.path.join(dir_map_string[currdir_idx], relpath) 782 realpath_idx = searchdir_realpaths[sl_idx] = ( 783 self.realpath_map.Index(rpath)) 784 # This is the place to catch errant files according to user defined 785 # regular expression path_observation_re. 786 if basics.opt_path_observation_re: 787 realpath = self.realpath_map.string[realpath_idx] 788 if basics.opt_path_observation_re.search(realpath): 789 self.path_observations.append((includepath, relpath, realpath)) 790 return ((sl_idx, includepath_idx), realpath_idx) 791 else: 792 searchdir_stats[sl_idx] = False 793 794 if __debug__: Debug(DEBUG_TRACE2, "Resolve: failed") 795 return (None, None) 796 797 798class SetUpCaches(object): 799 """Erect the edifice of caches. 800 801 Instance variables: 802 includepath_map: RelpathMapToIndex 803 directory_map: DirectoryMapToIndex 804 realpath_map: CanonicalMapToIndex 805 806 canonical_path: CanonicalPath 807 build_stat_cache: BuildStatCache 808 dirname_cache: DirnameCache 809 simple_build_stat: SimpleBuildStat 810 811 client_root: a path such as /dev/shm/tmpX.include_server-X-1 812 (used during default system dir determination) 813 814 IsFilepathIndex: test for filepath index 815 IsDirectoryIndex: test for director index 816 IsRealpathIndex: test for realpath index 817 IsFilepathPair: test for filepath pair 818 """ 819 820 def __init__(self, client_root): 821 822 # A memoizing (caching) class to canonicalize a path: mostly by 823 # resolving any symlinks in the path-component. 824 self.canonical_path = CanonicalPath() 825 826 # The index-map for includepath names: things seen after '#include'. 827 self.includepath_map = RelpathMapToIndex() 828 829 # The index-map for searchdir names and currdir as well. Also used any 830 # other time we have something we know is a directory (eg dirname(foo)). 831 self.directory_map = DirectoryMapToIndex() 832 833 # The index-map for realpaths: the full pathname of an include, with 834 # symlinks resolved and such (hence the name realpath). 835 self.realpath_map = CanonicalMapToIndex(self.canonical_path.Canonicalize) 836 837 # A cache of the directory part of filepaths. Note it uses the 838 # directory_map to actually store the mapping. 839 self.dirname_cache = DirnameCache(self.includepath_map, self.directory_map, 840 self.realpath_map) 841 842 # A cache of whether a realpath starts with a system searchdir or 843 # not. Note: at this time, system_dirs_default_all will be empty. 844 # It will get filled via processing in parse_command.py. This is 845 # why we need to store the compiler_defaults instance, to make 846 # sure "our" system_dirs_default_all is updated. 847 # TODO(csilvers): get rid of this once prefix_cache TODO is cleaned up 848 self.compiler_defaults = compiler_defaults.CompilerDefaults( 849 self.canonical_path.Canonicalize, client_root) 850 self.systemdir_prefix_cache = SystemdirPrefixCache( 851 self.compiler_defaults.system_dirs_default_all) 852 853 # The main caches, that say whether a file exists or not. We have 854 # two: a simple one that takes a filepath (string) as an argument, 855 # and the complicated one that works with index-triples. 856 self.simple_build_stat = SimpleBuildStat() 857 self.build_stat_cache = BuildStatCache(self.includepath_map, 858 self.directory_map, 859 self.realpath_map) 860 861 # Convenient function closures to test for various semantic datatypes. 862 self.IsIncludepathIndex = (lambda x: 863 isinstance(x, int) 864 and 0 < x < self.includepath_map.Length()) 865 866 self.IsSearchdirIndex = (lambda x: 867 isinstance(x, int) 868 and 0 < x < self.directory_map.Length()) 869 870 self.IsCurrdirIndex = (lambda x: 871 isinstance(x, int) 872 and 0 < x < self.directory_map.Length()) 873 874 self.IsFilepathPair = (lambda x: 875 isinstance(x, tuple) 876 and len(x) == 2 877 and self.IsSearchdirIndex(x[0]) 878 and self.IsIncludepathIndex(x[1])) 879 880 self.IsRealpathIndex = (lambda x: 881 isinstance(x, int) 882 and 0 < x < self.realpath_map.Length()) 883