xref: /openbsd/usr.sbin/pkg_add/OpenBSD/Intro.pod (revision d415bd75)
1$OpenBSD: Intro.pod,v 1.2 2022/05/11 07:51:47 espie Exp $
2
3=head1 NAME
4
5OpenBSD::Intro - Introduction to the pkg tools internals
6
7=head1 SYNOPSIS
8
9   use OpenBSD::PackingList;
10   ...
11
12=head1 DESCRIPTION
13
14Note that the C<OpenBSD::> namespace of perl modules is not limited to
15package tools, but also includes L<pkg-config(1)> support modules.
16This document only covers package tools material.
17
18The design of the package tools revolves around a few central ideas:
19
20Design modules that manipulate some notions in a consistent way, so
21that they can be used by the package tools proper, but also with a
22high-level API that's useful for anything that needs to manipulate
23packages.  This was validated by the ease with which we can now update
24packing-lists, check for conflicts, and check various properties of
25our packages.
26
27Try to be as safe as possible where installation and update operations
28are concerned.  Cut up operations into small subsets which yields frequent
29safe intermediate points where the machine is completely functional.
30
31Traditional package tools often rely on the following model: take
32a snapshot of the system, try to perform an operation, and roll back to
33a stable state if anything goes wrong.
34
35Instead, OpenBSD package tools take a computational approach: record
36semantic information in a useful format, pre-compute as much as can be
37about an operation, and only perform the operation when we
38have proved that (almost) nothing can go wrong.  As far as possible,
39the actual operation happens on the side, as a temporary scaffolding, and
40we only commit to the operation once most of the work is over.
41
42Keep high-level semantic information instead of recomputing it all the
43time, but try to organize as much as possible as plain text files.
44Originally, it was a bit of a challenge: trying to see how much we could
45get away with, before having to define an actual database format.
46Turns out we do not need a database format, or even any cache on the
47ftp server.
48
49Avoid copying files all over the place. Hence the L<OpenBSD::Ustar(3p)>
50module that allows package tools to manipulate tarballs directly without
51having to extract them first in a staging area.
52
53All the package tools use the same internal perl modules, which gives them
54some consistency about fundamental notions.
55
56It is highly recommended to try to understand packing-lists and packing
57elements first, since they are the core that unlocks most of the package
58tools.
59
60
61=head1 COMMON NOTIONS
62
63=over 3
64
65=item packing-lists and elements
66
67Each package consists of a list of objects (mostly files, but there are some
68other abstract structures, like new user accounts, or stuff to do when
69the package gets installed).
70They are recorded in a L<OpenBSD::PackingList(3p)>, the module offers
71everything needed to manipulate packing-lists.
72The packing-list format has a text representation, which is documented
73in L<pkg_create(1)>.
74Internally, packing-lists are heavily structured. Objects are reordered
75by the internals of L<OpenBSD::PackingList(3p)>, and there are some standard
76filters defined to gain access to some commonly used information (dependencies
77and conflicts mostly) without having to read and parse the whole packing-list.
78Each object is an L<OpenBSD::PackingElement(3p)>, which is an abstract class
79with lots of children classes.
80The use of packing-lists most often combines two classic design patterns:
81one uses Visitor to traverse a packing-list and perform an operation on
82all its elements (this is where the order is important, and why some
83stuff like user creation will `bubble up' to the beginning of the list), allied
84to Template Method: the operation is often not determined for a
85basic L<OpenBSD::PackingElement(3p)>, but will make more sense to an
86L<OpenBSD::PackingElement::FileObject(3p)> or similar.
87Packing-list objects have an "automatic visitor" property: if a method is not
88defined for the packing-list proper, but exists for packing elements, then
89invoking the method on the packing-list will traverse it and apply the method
90to each element.
91For instance, package installation happens through the following snippet:
92
93    $plist->install_and_progress(...)
94
95where C<install_and_progress> is defined at the packing element level,
96and invokes C<install> and shows a progress bar if needed.
97
98=item package names and specs
99
100Package names and specifications for package names have a specific format,
101which is described in L<packages-specs(7)>.   Package specs are objects
102created in L<OpenBSD::PkgSpec(3p)>, which are then compared to objects
103created in L<OpenBSD::PackageName(3p)>.  Both classes contain further functions
104for high level manipulation of names and specs.
105There is also a framework to organize searches based on L<OpenBSD::Search(3p)>
106objects.  Specifications are structured in a specific way, which yields
107a shorthand for conflict handling through L<OpenBSD::PkgCfl(3p)>,
108allows the package system to resolve dependencies in
109L<OpenBSD::Dependencies(3p)> and to figure out package
110updates in L<OpenBSD::Update(3p)>.
111
112=item sources of packages
113
114Historically, L<OpenBSD::PackageInfo(3p)> was used to get to the list of
115installed packages and grab information.  This is now part of a more
116generic framework L<OpenBSD::PackageRepository(3p)>, which interacts with
117the search objects to allow you to access packages, be they installed,
118on the local machines, or distant.  Once a package is located, the repository
119yields a proxy object called L<OpenBSD::PackageLocation(3p)> that can be used
120to gain further info.  (There are still shortcuts for installed packages
121for performance and simplicity reasons.)
122
123=item package sets
124
125Each operation (installation, removal, or replacement of packages)
126is cut up into small atomic operations, in order to guarantee maximal
127stability of the installed system. The package tools
128will try really hard to only deal with one or two packages at a time,
129in order to minimize combinatorial complexity, and to have a maximal number
130of safe points, where an update operation can stop without hosing the
131whole system. An update set is simply a minimal bag of packages, with old
132packages that are going to be removed, new packages that are going
133to replace them, and an area to record related ongoing computations.
134The old set may be empty, the new set may be empty, and in all cases,
135the update set shall be small (as small as possible).
136We have already met with update situations where
137dependencies between packages invert (A-1.0 depends on B-1.0, but B-0.0
138depends on A-0.0), or where files move between packages, which in
139theory will require update-sets with two new packages that replace two
140old packages.  We still cheat in a few cases, but in most cases, L<pkg_add(1)>
141will recognize those situations, and merge updatesets as required.
142L<pkg_delete(1)> also uses package sets, but a simpler variation, known as
143delete sets. Some update operations may produce inter-dependent packages,
144and those will have to be deleted together, instead of one after another.
145L<OpenBSD::UpdateSet(3p)> contains the code for both UpdateSets and DeleteSets
146for historical reasons.
147
148=item updater and tracker
149
150PackageSets contain some initial information, such as a package name to
151install, or a package location to update.
152
153This information will be completed incrementally by a
154C<OpenBSD::Update> updater object, which is responsible for figuring out
155how to update each element of an updateset, if it is an older package, or
156to resolve a hint to a package name to a full package location.
157
158In order to avoid loops, a C<OpenBSD::Tracker> tracker
159object keeps track of all the package name statuses: what's queued for
160update, what is uptodate, or what can't be updated.
161
162L<pkgdelete(1)> uses a simpler tracker, which is currently located inside
163the L<OpenBSD::PkgDelete(3p)> code.
164
165=item dependency information
166
167Dependency information exists at three levels: first, there are source
168specifications within ports. Then, those specifications turn into binary
169specifications with more constraints when the package is built by
170L<pkg_create(1)>, and finally, they're matched against lists of installed
171objects when the package is installed, and recorded as lists of
172inter-dependencies in the package system.
173
174At the package level, there are currently two types of dependencies:
175package specifications, that establish direct dependencies between
176packages, and shared libraries, that are described below.
177
178Normal dependencies are shallow: it is up to the package tools to
179figure out a whole dependency tree throughout top-level dependencies.
180None of this is hard-coded: this a prerequisite for flavored packages to
181work, as we do not want to depend on a specific package if something
182more generic will do.
183
184At the same time, shared libraries have harsher constraints: a package
185won't work without the exact same shared libraries it needs (same major
186number, at least), so shared libraries are handled through a want/provide
187mechanism that walks the whole dependency tree to find the required shared
188libraries.
189
190Dependencies are just a subclass of the packing-elements, rooted at
191the C<OpenBSD::PackingElement::Depend> class.
192
193A specific C<OpenBSD::Dependencies::Solver> object is used for the resolution
194of dependencies (see L<OpenBSD::Dependencies(3p)>, the solver is mostly
195a tree-walker, but there are performance considerations, so it also caches
196a lot of information and cooperates with the C<OpenBSD::Tracker>.
197Specificities of shared libraries are handled by L<OpenBSD::SharedLibs(3p)>.
198In particular, the base system also provides some shared libraries which are
199not recorded within the dependency tree.
200
201Lists of inter-dependencies are recorded in both directions
202(RequiredBy/Requiring). The L<OpenBSD::RequiredBy(3p)> module handles the
203subtleties (removing duplicates, keeping things ordered, and handling
204pretend operations).
205
206=item shared items
207
208Some items may be recorded multiple times within several packages (mostly
209directories, users and groups). There is a specific L<OpenBSD::SharedItems(3p)>
210module which handles these. Mostly, removal operations will scan
211all packing-lists at high speed to figure out shared items, and remove
212stuff that's no longer in use.
213
214=item virtual file system
215
216Most package operations will lead to the installation and removal of some
217files.   Everything is checked beforehand: the package system must verify
218that no new file will erase an existing file, or that the file system
219won't overflow during the package installation.
220The package tools also have a "pretend" mode where the user can check what
221will happen before doing an operation.  All the computations and caching
222are handled through the L<OpenBSD::Vstat(3p)> module, which is designed
223to hide file system oddities, and to perform addition/deletion operations
224virtually before doing them for real.
225
226=item framework for user interaction
227
228Most commands are now implemented as perl modules, with C<pkg(1)> requiring
229the correct module C<M>, and invoking C<M-E<gt>parse_and_run("command")>.
230
231All those commands use a class derived from C<OpenBSD::State> for user
232interaction. Among other things, C<OpenBSD::State> provides for printable,
233translatable messages, consistent option handling and usage messages.
234
235All commands that provide a progress meter use the derived module
236C<OpenBSD::AddCreateDelete>, which contains a derived state class
237C<OpenBSD::AddCreateDelete::State>, and a main command class
238C<OpenBSD::AddCreateDelete>, with consistent options.
239
240Eventually, this will allow third party tools to simply override the user
241interface part of C<OpenBSD::State>/C<OpenBSD::ProgressMeter> to provide
242alternate displays.
243
244=back
245
246=head1 BASIC ALGORITHMS
247
248There are three basic operations: package addition (installation),
249package removal (deinstallation), and package replacement (update).
250
251These operations are achieved through repeating the correct
252operations on all elements of a packing-list.
253
254=head2 PACKAGE ADDITION
255
256For package addition, L<pkg_add(1)> first checks that everything is correct,
257then runs through the packing-list, and extracts element from the archive.
258
259=head2 PACKAGE DELETION
260
261For package deletion, L<pkg_delete(1)> removes elements from the packing-list,
262and marks `common' stuff that may need to be unregistered, then walks quickly
263through all installed packages and removes stuff that's no longer used
264(directories, users, groups...)
265
266=head2 PACKAGE REPLACEMENT
267
268Package replacement is more complicated. It relies on package names
269and conflict markers.
270
271In normal usage, L<pkg_add(1)> installs only new stuff, and checks that all
272files in the new package don't already exist in the file system.
273By convention, packages with the same stem are assumed to be different
274versions of the same package, e.g., screen-1.0 and screen-1.1 correspond
275to the same software, and users are not expected to be able to install
276both at the same time.
277
278This is a conflict.
279
280One can also mark extra conflicts (if two software distributions install
281the same file, generally a bad idea), or remove default conflict markers
282(for instance, so that the user can install several versions of autoconf at
283the same time).
284
285If L<pkg_add(1)> is invoked in replacement mode (-r), it will use conflict
286information to figure out which package(s) it should replace. It will then
287operate in a specific mode, where it replaces old package(s) with a new one.
288
289=over
290
291=item *
292
293determine which package to replace through conflict information
294
295=item *
296
297extract the new package 'alongside' the existing package(s) using
298temporary filenames.
299
300=item *
301
302remove the old package
303
304=item *
305
306finish installing the new package by renaming the temporary files.
307
308=back
309
310Thus replacements will work without needing any extra information besides
311conflict markers. pkg_add -r will happily replace any package with a
312conflicting package.  Due to missing information (one can't predict the
313future), conflict markers work both way: packages a and b conflict as
314soon as a conflicts with b, or b conflicts with a.
315
316=head2 PACKAGE UPDATES
317
318Package replacement is the basic operation behind package updates.
319In your average update, each individual package will be replaced
320by a more recent one, starting with dependencies, so that the installation
321stays functional the whole time.  Shared libraries enjoy a special status:
322old shared libraries are kept around in a stub .lib-* package, so that
323software that depends on them keeps running. (Thus, it is vital that porters
324pay attention to shared library version numbers during an update.)
325
326An update operation starts with update sets that contain only old packages.
327There is some specific code (the C<OpenBSD::Update> module) which is used
328to figure out the new package name from the old one.
329
330Note that updates are slightly more complicated than straight replacement:
331a package may replace an older one if it conflicts with it. But an older
332package can only be updated if the new package matches (both conflicts and
333correct pkgpath markers).
334
335In every update or replacement, pkg_add will first try to install or update
336the quirks package, which contains a global list of exceptions, such as
337extra stems to search for (allowing for package renames), or packages to
338remove as they've become part of base OpenBSD.
339
340This search relies on stem names first (e.g., to update package
341foo-1.0, pkg_add -u will look for foo-* in the PKG_PATH), then it trims
342the search results by looking more closely inside the package candidates.
343More specifically, their pkgpath (the directory in the ports tree from which
344they were compiled). Thus, a package
345that comes from category/someport/snapshot will never replace a package
346that comes from category/someport/stable. Likewise for flavors.
347
348Finally, pkg_add -u decides whether the update is needed by comparing
349the package version and the package signatures: a package will not be
350downgraded to an older version. A package signature is composed of
351the name of a package, together with relevant dependency information:
352all wantlib versions, and all run dependencies versions.
353pkg_add only replaces packages with different signatures.
354
355Currently, pkg_add -u stops at the first entry in the PKG_PATH from which
356suitable candidates are found.
357
358=head1 LIST OF MODULES
359
360=over 3
361
362=item OpenBSD::Add
363
364common operations related to a package addition.
365
366=item OpenBSD::AddCreateDelete
367
368common operations related to package addition/creation/deletion.
369Mainly C<OpenBSD::ProgressMeter> related.
370
371=item OpenBSD::AddDelete
372
373common operations used during addition and deletion.
374Mainly due to the fact that C<pkg_add(1)> will remove packages during
375updates, and that addition/suppression operations are only allowed to
376fail at specific times. Most updateset algorithms live there, as does
377the upper layer framework for handling signals safely.
378
379=item OpenBSD::ArcCheck
380
381additional layer on top of C<OpenBSD::Ustar> that enforces extra
382rules specific to packages.
383In particular, we don't store timestamps in the packing-list to
384avoid gratuitous changes, and also, a lot of sensitive information
385is not allowed if it's not also annotated in the PackingList.
386
387=item OpenBSD::CollisionReport
388
389checks a collision list obtained through C<OpenBSD::Vstat> against the
390full list of installed files, and reports origin of existing files.
391
392=item OpenBSD::Delete
393
394common operations related to package deletion.
395
396=item OpenBSD::Dependencies
397
398looking up all kind of dependencies.  Contains rather complicated caching
399to speed things up.  Interacts with the global tracker object.
400
401=item OpenBSD::Error
402
403handles signal registration, the exception mechanism, and auto-caching
404methods.  Most I/O operations have moved to C<OpenBSD::State>.
405
406=item OpenBSD::Getopt
407
408L<Getopt::Std(3p)>-like with extra hooks for special options.
409
410=item OpenBSD::Handle
411
412proxy class to go from a package location to an opened package with plist,
413including state information to cache errors.
414
415=item OpenBSD::IdCache
416
417caches uid and gid vs. user names and group names correspondences.
418
419=item OpenBSD::Interactive
420
421handles user questions (do not call directly, go through C<OpenBSD::State>
422and derivatives).
423
424=item OpenBSD::LibSpec
425
426interactions between library objects from packing-lists, library specifications,
427and matching those against actual lists of libraries (from packages or from
428the system).
429
430=item OpenBSD::LibSpec::Build
431
432extends C<OpenBSD::LibSpec> for matching during ports builds.
433
434=item OpenBSD::Log
435
436component for printing information later, to be used by derivative classes
437of C<OpenBSD::State>.
438
439=item OpenBSD::Mtree
440
441simple parser for L<mtree(8)> specifications.
442
443=item OpenBSD::OldLibs
444
445code required by C<pkg_add(1)> to handle the removal of old libraries during
446update.
447
448=item OpenBSD::PackageInfo
449
450handles package meta-information (all the +CONTENTS, +DESCR, etc files)
451
452=item OpenBSD::PackageLocation
453
454proxy for a package, either as a tarball, or an installed package.
455Obtained through C<OpenBSD::PackageRepository>.
456
457=item OpenBSD::PackageLocator
458
459central non-OO hub for the normal repository list
460(should use a singleton pattern instead).
461
462=item OpenBSD::PackageName
463
464common operations on package names.
465
466=item OpenBSD::PackageRepository
467
468base class for all package sources. Actual packages instantiate as
469C<OpenBSD::PackageLocation>.
470
471=item OpenBSD::PackageRepositoryList
472
473list of package repository, provided as a front to search objects,
474because searching through a repository list has L<ld(1)>-like semantics
475(stops at the first repository that matches).
476
477=item OpenBSD::PackingElement
478
479all the packing-list elements class hierarchy, together with common
480methods that do not belong elsewhere.
481
482=item OpenBSD::PackingList
483
484responsible for reading/writing packing-lists, copying them, comparing them.
485
486=item OpenBSD::Paths
487
488hardcoded paths to external programs and locations.
489
490=item OpenBSD::PkgAdd, OpenBSD::PkgCreate, OpenBSD::PkgCheck, OpenBSD::PkgDelete, OpenBSD:PkgInfo
491
492implements corresponding commands.
493
494=item OpenBSD::PkgCfl
495
496conflict lists handling in an efficient way.
497
498=item OpenBSD::PkgSpec
499
500ad-hoc search for package specifications. External API is stable, but it
501needs to be updated to use C<OpenBSD::PackageName> objects now that they
502exist.
503
504=item OpenBSD::ProgressMeter
505
506handles display of a progress meter when a terminal is available, devolves
507to nothings otherwise.
508
509=item OpenBSD::Replace
510
511common operations related to package replacement.
512
513=item OpenBSD::RequiredBy
514
515handles requiredby and requiring lists.
516
517=item OpenBSD::Search
518
519search object for package repositories: specs, stems, and pkgpaths.
520
521=item OpenBSD::SharedItems
522
523handles items that may be shared by several packages.
524
525=item OpenBSD::SharedLibs
526
527shared library specificities when handled as dependencies.
528
529=item OpenBSD::Signature
530
531handles package signatures and the corresponding version comparison (do not
532confuse with cryptographic signatures, as handled through C<OpenBSD::x509>).
533
534=item OpenBSD::State
535
536base class to UI and option handling.
537
538=item OpenBSD::Subst
539
540conventions used for substituting variables during L<pkg_create(1)>,
541and related algorithms.
542
543=item OpenBSD::Temp
544
545safe creation of temporary files as a light-weight module that also
546deals with signal issues.
547
548=item OpenBSD::Tracker
549
550tracks all package names through update operations, in order to avoid
551loops while doing incremental updates.
552
553=item OpenBSD::Update
554
555incremental computation of package replacements required by an update or
556installation.
557
558=item OpenBSD::UpdateSet
559
560common operations to all package tools that manipulate update sets.
561
562=item OpenBSD::Ustar
563
564simple API that allows for Ustar (new tar) archive manipulation,
565allowing for extraction and copies on the fly.
566
567=item OpenBSD::Vstat
568
569virtual file system (pretend) operations.
570
571=item OpenBSD::md5
572
573simple interface to the L<Digest::MD5(3p)> and L<Digest::SHA(3p)> modules.
574
575=item OpenBSD::x509
576
577cryptographic signature through x509 certificates. Mostly calls C<openssl(1)>.
578Note that C<OpenBSD::ArcCheck>  is vital in ensuring archive meta-info have
579not been tampered with.
580
581=back
582