1$OpenBSD: Intro.pod,v 1.2 2022/05/11 07:51:47 espie Exp $ 2 3=head1 NAME 4 5OpenBSD::Intro - Introduction to the pkg tools internals 6 7=head1 SYNOPSIS 8 9 use OpenBSD::PackingList; 10 ... 11 12=head1 DESCRIPTION 13 14Note that the C<OpenBSD::> namespace of perl modules is not limited to 15package tools, but also includes L<pkg-config(1)> support modules. 16This document only covers package tools material. 17 18The design of the package tools revolves around a few central ideas: 19 20Design modules that manipulate some notions in a consistent way, so 21that they can be used by the package tools proper, but also with a 22high-level API that's useful for anything that needs to manipulate 23packages. This was validated by the ease with which we can now update 24packing-lists, check for conflicts, and check various properties of 25our packages. 26 27Try to be as safe as possible where installation and update operations 28are concerned. Cut up operations into small subsets which yields frequent 29safe intermediate points where the machine is completely functional. 30 31Traditional package tools often rely on the following model: take 32a snapshot of the system, try to perform an operation, and roll back to 33a stable state if anything goes wrong. 34 35Instead, OpenBSD package tools take a computational approach: record 36semantic information in a useful format, pre-compute as much as can be 37about an operation, and only perform the operation when we 38have proved that (almost) nothing can go wrong. As far as possible, 39the actual operation happens on the side, as a temporary scaffolding, and 40we only commit to the operation once most of the work is over. 41 42Keep high-level semantic information instead of recomputing it all the 43time, but try to organize as much as possible as plain text files. 44Originally, it was a bit of a challenge: trying to see how much we could 45get away with, before having to define an actual database format. 46Turns out we do not need a database format, or even any cache on the 47ftp server. 48 49Avoid copying files all over the place. Hence the L<OpenBSD::Ustar(3p)> 50module that allows package tools to manipulate tarballs directly without 51having to extract them first in a staging area. 52 53All the package tools use the same internal perl modules, which gives them 54some consistency about fundamental notions. 55 56It is highly recommended to try to understand packing-lists and packing 57elements first, since they are the core that unlocks most of the package 58tools. 59 60 61=head1 COMMON NOTIONS 62 63=over 3 64 65=item packing-lists and elements 66 67Each package consists of a list of objects (mostly files, but there are some 68other abstract structures, like new user accounts, or stuff to do when 69the package gets installed). 70They are recorded in a L<OpenBSD::PackingList(3p)>, the module offers 71everything needed to manipulate packing-lists. 72The packing-list format has a text representation, which is documented 73in L<pkg_create(1)>. 74Internally, packing-lists are heavily structured. Objects are reordered 75by the internals of L<OpenBSD::PackingList(3p)>, and there are some standard 76filters defined to gain access to some commonly used information (dependencies 77and conflicts mostly) without having to read and parse the whole packing-list. 78Each object is an L<OpenBSD::PackingElement(3p)>, which is an abstract class 79with lots of children classes. 80The use of packing-lists most often combines two classic design patterns: 81one uses Visitor to traverse a packing-list and perform an operation on 82all its elements (this is where the order is important, and why some 83stuff like user creation will `bubble up' to the beginning of the list), allied 84to Template Method: the operation is often not determined for a 85basic L<OpenBSD::PackingElement(3p)>, but will make more sense to an 86L<OpenBSD::PackingElement::FileObject(3p)> or similar. 87Packing-list objects have an "automatic visitor" property: if a method is not 88defined for the packing-list proper, but exists for packing elements, then 89invoking the method on the packing-list will traverse it and apply the method 90to each element. 91For instance, package installation happens through the following snippet: 92 93 $plist->install_and_progress(...) 94 95where C<install_and_progress> is defined at the packing element level, 96and invokes C<install> and shows a progress bar if needed. 97 98=item package names and specs 99 100Package names and specifications for package names have a specific format, 101which is described in L<packages-specs(7)>. Package specs are objects 102created in L<OpenBSD::PkgSpec(3p)>, which are then compared to objects 103created in L<OpenBSD::PackageName(3p)>. Both classes contain further functions 104for high level manipulation of names and specs. 105There is also a framework to organize searches based on L<OpenBSD::Search(3p)> 106objects. Specifications are structured in a specific way, which yields 107a shorthand for conflict handling through L<OpenBSD::PkgCfl(3p)>, 108allows the package system to resolve dependencies in 109L<OpenBSD::Dependencies(3p)> and to figure out package 110updates in L<OpenBSD::Update(3p)>. 111 112=item sources of packages 113 114Historically, L<OpenBSD::PackageInfo(3p)> was used to get to the list of 115installed packages and grab information. This is now part of a more 116generic framework L<OpenBSD::PackageRepository(3p)>, which interacts with 117the search objects to allow you to access packages, be they installed, 118on the local machines, or distant. Once a package is located, the repository 119yields a proxy object called L<OpenBSD::PackageLocation(3p)> that can be used 120to gain further info. (There are still shortcuts for installed packages 121for performance and simplicity reasons.) 122 123=item package sets 124 125Each operation (installation, removal, or replacement of packages) 126is cut up into small atomic operations, in order to guarantee maximal 127stability of the installed system. The package tools 128will try really hard to only deal with one or two packages at a time, 129in order to minimize combinatorial complexity, and to have a maximal number 130of safe points, where an update operation can stop without hosing the 131whole system. An update set is simply a minimal bag of packages, with old 132packages that are going to be removed, new packages that are going 133to replace them, and an area to record related ongoing computations. 134The old set may be empty, the new set may be empty, and in all cases, 135the update set shall be small (as small as possible). 136We have already met with update situations where 137dependencies between packages invert (A-1.0 depends on B-1.0, but B-0.0 138depends on A-0.0), or where files move between packages, which in 139theory will require update-sets with two new packages that replace two 140old packages. We still cheat in a few cases, but in most cases, L<pkg_add(1)> 141will recognize those situations, and merge updatesets as required. 142L<pkg_delete(1)> also uses package sets, but a simpler variation, known as 143delete sets. Some update operations may produce inter-dependent packages, 144and those will have to be deleted together, instead of one after another. 145L<OpenBSD::UpdateSet(3p)> contains the code for both UpdateSets and DeleteSets 146for historical reasons. 147 148=item updater and tracker 149 150PackageSets contain some initial information, such as a package name to 151install, or a package location to update. 152 153This information will be completed incrementally by a 154C<OpenBSD::Update> updater object, which is responsible for figuring out 155how to update each element of an updateset, if it is an older package, or 156to resolve a hint to a package name to a full package location. 157 158In order to avoid loops, a C<OpenBSD::Tracker> tracker 159object keeps track of all the package name statuses: what's queued for 160update, what is uptodate, or what can't be updated. 161 162L<pkgdelete(1)> uses a simpler tracker, which is currently located inside 163the L<OpenBSD::PkgDelete(3p)> code. 164 165=item dependency information 166 167Dependency information exists at three levels: first, there are source 168specifications within ports. Then, those specifications turn into binary 169specifications with more constraints when the package is built by 170L<pkg_create(1)>, and finally, they're matched against lists of installed 171objects when the package is installed, and recorded as lists of 172inter-dependencies in the package system. 173 174At the package level, there are currently two types of dependencies: 175package specifications, that establish direct dependencies between 176packages, and shared libraries, that are described below. 177 178Normal dependencies are shallow: it is up to the package tools to 179figure out a whole dependency tree throughout top-level dependencies. 180None of this is hard-coded: this a prerequisite for flavored packages to 181work, as we do not want to depend on a specific package if something 182more generic will do. 183 184At the same time, shared libraries have harsher constraints: a package 185won't work without the exact same shared libraries it needs (same major 186number, at least), so shared libraries are handled through a want/provide 187mechanism that walks the whole dependency tree to find the required shared 188libraries. 189 190Dependencies are just a subclass of the packing-elements, rooted at 191the C<OpenBSD::PackingElement::Depend> class. 192 193A specific C<OpenBSD::Dependencies::Solver> object is used for the resolution 194of dependencies (see L<OpenBSD::Dependencies(3p)>, the solver is mostly 195a tree-walker, but there are performance considerations, so it also caches 196a lot of information and cooperates with the C<OpenBSD::Tracker>. 197Specificities of shared libraries are handled by L<OpenBSD::SharedLibs(3p)>. 198In particular, the base system also provides some shared libraries which are 199not recorded within the dependency tree. 200 201Lists of inter-dependencies are recorded in both directions 202(RequiredBy/Requiring). The L<OpenBSD::RequiredBy(3p)> module handles the 203subtleties (removing duplicates, keeping things ordered, and handling 204pretend operations). 205 206=item shared items 207 208Some items may be recorded multiple times within several packages (mostly 209directories, users and groups). There is a specific L<OpenBSD::SharedItems(3p)> 210module which handles these. Mostly, removal operations will scan 211all packing-lists at high speed to figure out shared items, and remove 212stuff that's no longer in use. 213 214=item virtual file system 215 216Most package operations will lead to the installation and removal of some 217files. Everything is checked beforehand: the package system must verify 218that no new file will erase an existing file, or that the file system 219won't overflow during the package installation. 220The package tools also have a "pretend" mode where the user can check what 221will happen before doing an operation. All the computations and caching 222are handled through the L<OpenBSD::Vstat(3p)> module, which is designed 223to hide file system oddities, and to perform addition/deletion operations 224virtually before doing them for real. 225 226=item framework for user interaction 227 228Most commands are now implemented as perl modules, with C<pkg(1)> requiring 229the correct module C<M>, and invoking C<M-E<gt>parse_and_run("command")>. 230 231All those commands use a class derived from C<OpenBSD::State> for user 232interaction. Among other things, C<OpenBSD::State> provides for printable, 233translatable messages, consistent option handling and usage messages. 234 235All commands that provide a progress meter use the derived module 236C<OpenBSD::AddCreateDelete>, which contains a derived state class 237C<OpenBSD::AddCreateDelete::State>, and a main command class 238C<OpenBSD::AddCreateDelete>, with consistent options. 239 240Eventually, this will allow third party tools to simply override the user 241interface part of C<OpenBSD::State>/C<OpenBSD::ProgressMeter> to provide 242alternate displays. 243 244=back 245 246=head1 BASIC ALGORITHMS 247 248There are three basic operations: package addition (installation), 249package removal (deinstallation), and package replacement (update). 250 251These operations are achieved through repeating the correct 252operations on all elements of a packing-list. 253 254=head2 PACKAGE ADDITION 255 256For package addition, L<pkg_add(1)> first checks that everything is correct, 257then runs through the packing-list, and extracts element from the archive. 258 259=head2 PACKAGE DELETION 260 261For package deletion, L<pkg_delete(1)> removes elements from the packing-list, 262and marks `common' stuff that may need to be unregistered, then walks quickly 263through all installed packages and removes stuff that's no longer used 264(directories, users, groups...) 265 266=head2 PACKAGE REPLACEMENT 267 268Package replacement is more complicated. It relies on package names 269and conflict markers. 270 271In normal usage, L<pkg_add(1)> installs only new stuff, and checks that all 272files in the new package don't already exist in the file system. 273By convention, packages with the same stem are assumed to be different 274versions of the same package, e.g., screen-1.0 and screen-1.1 correspond 275to the same software, and users are not expected to be able to install 276both at the same time. 277 278This is a conflict. 279 280One can also mark extra conflicts (if two software distributions install 281the same file, generally a bad idea), or remove default conflict markers 282(for instance, so that the user can install several versions of autoconf at 283the same time). 284 285If L<pkg_add(1)> is invoked in replacement mode (-r), it will use conflict 286information to figure out which package(s) it should replace. It will then 287operate in a specific mode, where it replaces old package(s) with a new one. 288 289=over 290 291=item * 292 293determine which package to replace through conflict information 294 295=item * 296 297extract the new package 'alongside' the existing package(s) using 298temporary filenames. 299 300=item * 301 302remove the old package 303 304=item * 305 306finish installing the new package by renaming the temporary files. 307 308=back 309 310Thus replacements will work without needing any extra information besides 311conflict markers. pkg_add -r will happily replace any package with a 312conflicting package. Due to missing information (one can't predict the 313future), conflict markers work both way: packages a and b conflict as 314soon as a conflicts with b, or b conflicts with a. 315 316=head2 PACKAGE UPDATES 317 318Package replacement is the basic operation behind package updates. 319In your average update, each individual package will be replaced 320by a more recent one, starting with dependencies, so that the installation 321stays functional the whole time. Shared libraries enjoy a special status: 322old shared libraries are kept around in a stub .lib-* package, so that 323software that depends on them keeps running. (Thus, it is vital that porters 324pay attention to shared library version numbers during an update.) 325 326An update operation starts with update sets that contain only old packages. 327There is some specific code (the C<OpenBSD::Update> module) which is used 328to figure out the new package name from the old one. 329 330Note that updates are slightly more complicated than straight replacement: 331a package may replace an older one if it conflicts with it. But an older 332package can only be updated if the new package matches (both conflicts and 333correct pkgpath markers). 334 335In every update or replacement, pkg_add will first try to install or update 336the quirks package, which contains a global list of exceptions, such as 337extra stems to search for (allowing for package renames), or packages to 338remove as they've become part of base OpenBSD. 339 340This search relies on stem names first (e.g., to update package 341foo-1.0, pkg_add -u will look for foo-* in the PKG_PATH), then it trims 342the search results by looking more closely inside the package candidates. 343More specifically, their pkgpath (the directory in the ports tree from which 344they were compiled). Thus, a package 345that comes from category/someport/snapshot will never replace a package 346that comes from category/someport/stable. Likewise for flavors. 347 348Finally, pkg_add -u decides whether the update is needed by comparing 349the package version and the package signatures: a package will not be 350downgraded to an older version. A package signature is composed of 351the name of a package, together with relevant dependency information: 352all wantlib versions, and all run dependencies versions. 353pkg_add only replaces packages with different signatures. 354 355Currently, pkg_add -u stops at the first entry in the PKG_PATH from which 356suitable candidates are found. 357 358=head1 LIST OF MODULES 359 360=over 3 361 362=item OpenBSD::Add 363 364common operations related to a package addition. 365 366=item OpenBSD::AddCreateDelete 367 368common operations related to package addition/creation/deletion. 369Mainly C<OpenBSD::ProgressMeter> related. 370 371=item OpenBSD::AddDelete 372 373common operations used during addition and deletion. 374Mainly due to the fact that C<pkg_add(1)> will remove packages during 375updates, and that addition/suppression operations are only allowed to 376fail at specific times. Most updateset algorithms live there, as does 377the upper layer framework for handling signals safely. 378 379=item OpenBSD::ArcCheck 380 381additional layer on top of C<OpenBSD::Ustar> that enforces extra 382rules specific to packages. 383In particular, we don't store timestamps in the packing-list to 384avoid gratuitous changes, and also, a lot of sensitive information 385is not allowed if it's not also annotated in the PackingList. 386 387=item OpenBSD::CollisionReport 388 389checks a collision list obtained through C<OpenBSD::Vstat> against the 390full list of installed files, and reports origin of existing files. 391 392=item OpenBSD::Delete 393 394common operations related to package deletion. 395 396=item OpenBSD::Dependencies 397 398looking up all kind of dependencies. Contains rather complicated caching 399to speed things up. Interacts with the global tracker object. 400 401=item OpenBSD::Error 402 403handles signal registration, the exception mechanism, and auto-caching 404methods. Most I/O operations have moved to C<OpenBSD::State>. 405 406=item OpenBSD::Getopt 407 408L<Getopt::Std(3p)>-like with extra hooks for special options. 409 410=item OpenBSD::Handle 411 412proxy class to go from a package location to an opened package with plist, 413including state information to cache errors. 414 415=item OpenBSD::IdCache 416 417caches uid and gid vs. user names and group names correspondences. 418 419=item OpenBSD::Interactive 420 421handles user questions (do not call directly, go through C<OpenBSD::State> 422and derivatives). 423 424=item OpenBSD::LibSpec 425 426interactions between library objects from packing-lists, library specifications, 427and matching those against actual lists of libraries (from packages or from 428the system). 429 430=item OpenBSD::LibSpec::Build 431 432extends C<OpenBSD::LibSpec> for matching during ports builds. 433 434=item OpenBSD::Log 435 436component for printing information later, to be used by derivative classes 437of C<OpenBSD::State>. 438 439=item OpenBSD::Mtree 440 441simple parser for L<mtree(8)> specifications. 442 443=item OpenBSD::OldLibs 444 445code required by C<pkg_add(1)> to handle the removal of old libraries during 446update. 447 448=item OpenBSD::PackageInfo 449 450handles package meta-information (all the +CONTENTS, +DESCR, etc files) 451 452=item OpenBSD::PackageLocation 453 454proxy for a package, either as a tarball, or an installed package. 455Obtained through C<OpenBSD::PackageRepository>. 456 457=item OpenBSD::PackageLocator 458 459central non-OO hub for the normal repository list 460(should use a singleton pattern instead). 461 462=item OpenBSD::PackageName 463 464common operations on package names. 465 466=item OpenBSD::PackageRepository 467 468base class for all package sources. Actual packages instantiate as 469C<OpenBSD::PackageLocation>. 470 471=item OpenBSD::PackageRepositoryList 472 473list of package repository, provided as a front to search objects, 474because searching through a repository list has L<ld(1)>-like semantics 475(stops at the first repository that matches). 476 477=item OpenBSD::PackingElement 478 479all the packing-list elements class hierarchy, together with common 480methods that do not belong elsewhere. 481 482=item OpenBSD::PackingList 483 484responsible for reading/writing packing-lists, copying them, comparing them. 485 486=item OpenBSD::Paths 487 488hardcoded paths to external programs and locations. 489 490=item OpenBSD::PkgAdd, OpenBSD::PkgCreate, OpenBSD::PkgCheck, OpenBSD::PkgDelete, OpenBSD:PkgInfo 491 492implements corresponding commands. 493 494=item OpenBSD::PkgCfl 495 496conflict lists handling in an efficient way. 497 498=item OpenBSD::PkgSpec 499 500ad-hoc search for package specifications. External API is stable, but it 501needs to be updated to use C<OpenBSD::PackageName> objects now that they 502exist. 503 504=item OpenBSD::ProgressMeter 505 506handles display of a progress meter when a terminal is available, devolves 507to nothings otherwise. 508 509=item OpenBSD::Replace 510 511common operations related to package replacement. 512 513=item OpenBSD::RequiredBy 514 515handles requiredby and requiring lists. 516 517=item OpenBSD::Search 518 519search object for package repositories: specs, stems, and pkgpaths. 520 521=item OpenBSD::SharedItems 522 523handles items that may be shared by several packages. 524 525=item OpenBSD::SharedLibs 526 527shared library specificities when handled as dependencies. 528 529=item OpenBSD::Signature 530 531handles package signatures and the corresponding version comparison (do not 532confuse with cryptographic signatures, as handled through C<OpenBSD::x509>). 533 534=item OpenBSD::State 535 536base class to UI and option handling. 537 538=item OpenBSD::Subst 539 540conventions used for substituting variables during L<pkg_create(1)>, 541and related algorithms. 542 543=item OpenBSD::Temp 544 545safe creation of temporary files as a light-weight module that also 546deals with signal issues. 547 548=item OpenBSD::Tracker 549 550tracks all package names through update operations, in order to avoid 551loops while doing incremental updates. 552 553=item OpenBSD::Update 554 555incremental computation of package replacements required by an update or 556installation. 557 558=item OpenBSD::UpdateSet 559 560common operations to all package tools that manipulate update sets. 561 562=item OpenBSD::Ustar 563 564simple API that allows for Ustar (new tar) archive manipulation, 565allowing for extraction and copies on the fly. 566 567=item OpenBSD::Vstat 568 569virtual file system (pretend) operations. 570 571=item OpenBSD::md5 572 573simple interface to the L<Digest::MD5(3p)> and L<Digest::SHA(3p)> modules. 574 575=item OpenBSD::x509 576 577cryptographic signature through x509 certificates. Mostly calls C<openssl(1)>. 578Note that C<OpenBSD::ArcCheck> is vital in ensuring archive meta-info have 579not been tampered with. 580 581=back 582