• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

attic/H03-May-2022-

src/H10-May-2021-8,2736,258

test/H10-May-2021-974843

ChangeLog.mdH A D13-May-20212.5 KiB10357

LICENSEH A D10-May-20211.5 KiB2522

README.mdH A D10-May-20219.2 KiB218174

pantry.cabalH A D13-May-20213.9 KiB189183

README.md

1# pantry
2
3[![Build Status](https://dev.azure.com/commercialhaskell/pantry/_apis/build/status/commercialhaskell.pantry?branchName=master)](https://dev.azure.com/commercialhaskell/pantry/_build/latest?definitionId=6&branchName=master)
4
5Content addressable Haskell package management, providing for secure,
6reproducible acquisition of Haskell package contents and metadata.
7
8## What is Pantry
9
10* A Haskell library, storage specification, and network protocol
11* Intended for content-addressable storage of Haskell packages
12* Allows non-centralized package storage
13* Primarily for use by Stackage and Stack, hopefully other tools as well
14
15## Goals
16
17* Efficient, distributed package storage for Haskell
18* Superset of existing storage mechanisms
19* Security via content addressable storage
20* Allow more Stackage-style snapshots to exist
21* Allow authors to bypass Hackage for uploads
22* Allow Stackage to create forks of packages on Hackage
23
24__TODO__
25
26Content below needs to be updated.
27
28* Support for hpack in PackageLocationImmutable?
29
30## Package definition
31
32Pantry defines the following concepts:
33
34* __Blob__: a raw byte sequence, identified by its key (SHA256 of the
35  contents)
36* __Tree entry__: contents of a single file (identified by blob key)
37  and whether or not it is executable.
38    * NOTE: existing package formats like tarballs support more
39      sophisticated options. We explicitly do not support those. If
40      such functionality is needed, fallback to those mechanism is
41      required.
42* __Tree__: mapping from relative path to a tree entry. Some basic
43  sanity rules apply to the paths: no `.` or `..` directory
44  components, no newlines in filepaths, does not begin with `/`, no
45  `\\` (we normalize to POSIX-style paths). A tree is identified by a
46  tree key (SHA256 of the tree's serialized format).
47* __Package__: a tree key for the package contents, package name,
48  version number, and cabal file blob key. Requirements: there must be
49  a single file with a `.cabal` file extension at the root of the
50  tree, and it must match the cabal file blob key. The cabal file must
51  be located at `pkgname.cabal`. Each tree can be in at most one
52  package, and therefore tree keys work as package keys too.
53
54Note that with the above, a tree key is all the information necessary
55to uniquely identify a package. However, including additional
56information (package name, version, cabal key) in config files may be
57useful for optimizations or user friendliness. If such extra
58information is ever included, it must be validated to concur with the
59package contents itself.
60
61### Package location
62
63Packages will optionally be sourced from some location:
64
65* __Hackage__ requires the package name, version number, and revision
66  number. Each revision of a package will end up with a different tree
67  key.
68* __Archive__ takes a URL pointing to a tarball (gzipped or not) or a
69  ZIP file. An implicit assumption is that archives remain immutable
70  over time. Use tree keys to verify this assumption. (Same applies to
71  Hackage for that matter.)
72* __Repository__ takes a repo type (Git or Mercurial), URL, and
73  commit. Assuming the veracity of the cryptographic hashes on the
74  repos, this should guarantee a unique set of files.
75
76In order to deal with _megarepos_ (repos and archives containing more
77than one package), there is also a subdirectory for the archive and
78repository cases. An empty subdir `""` would be the case for a
79standard repo/archive.
80
81In order to meet the rules of a package listed above, the following
82logic is applied to all three types above:
83
84* Find all of the files in the raw location, and represent as `Map
85  FilePath TreeEntry` (or equivalent).
86* Remove a wrapper directory. If _all_ filepaths in that `Map` are
87  contained within the same directory, strip it from all of the
88  paths. For example, if the paths are `foo/bar` and `foo/baz`, the
89  paths will be reduced to `bar` and `baz`.
90* After this wrapper is removed, then subdirectory logic is applied,
91  essentially applying `stripPrefix` to the filepaths. If the subdir
92  is `yesod-bin` and files exist called `yesod-core/yesod-core.cabal`
93  and `yesod-bin/yesod-bin.cabal`, the only file remaining after
94  subdir stripping would be `yesod-bin.cabal`. Note that trailing
95  slashes must be handled appropriately, and that an empty subdir
96  string results in this step being a noop.
97
98The result of all of this is that, given one of the three package
99locations above, we can receive a tree key which will provide an
100installable package. That tree key will remain immutable.
101
102### How tooling refers to packages
103
104We'll get to the caching mechanism for Pantry below. However, the
105recommended approach for tooling is to support some kind of composite
106of the Pantry keys, parsed info, and raw package location. This allows
107for more efficient lookups when available, with a fallback when
108mirrors don't have the needed information.
109
110An example:
111
112```yaml
113extra-deps:
114- name: foobar
115  version: 1.2.3.4
116  pantry: deadbeef # tree key
117  cabal-file: 12345678 # blob key
118  archive: https://example.com/foobar-1.2.3.4.tar.gz
119```
120
121It is also recommended that tooling provide an easy way to generate
122such complete information from, e.g., just the URL of the tarball, and
123that upon reading information, hashes, package names, and version
124numbers are all checked for correctness.
125
126## Pantry caching
127
128One simplistic option for Pantry would be that, every time a piece of
129data is needed, Pantry downloads the necessary tarball/Git
130repo/etc. However, this would in practice be highly wasteful, since
131downloading Git repos and archives just to get a single cabal file
132(for plan construction purposes) is overkill. Instead, here's the
133basic idea for how caching works:
134
135* All data for Pantry can be stored in a SQL database. Local tools
136  like Stack will use an SQLite database. Servers will use PostgreSQL.
137* We'll define a network protocol (initially just HTTP, maybe
138  extending to something more efficient if desired) for querying blobs
139  and trees.
140* When a blob or tree is needed, it is first checked for in the local
141  SQLite cache. If it's not available there, a request to the Pantry
142  mirrors (configurable) will be made for the data. Since everything
143  is content addressable, it is safe to use untrusted mirrors.
144* If the data is not available in a mirror, and a location is
145  provided, the location will be downloaded and cached locally.
146
147We may also allow these Pantry mirrors to provide some kind of query
148interface to find out, e.g., the latest version of a package on
149Hackage. That's still TBD.
150
151## Example: resolving a package location
152
153To work through a full example, the following three stanzas are intended to
154have equivalent behavior:
155
156```yaml
157- archive: https://example.com/foobar-1.2.3.4.tar.gz
158
159- name: foobar
160  version: 1.2.3.4
161  pantry: deadbeef # tree key
162  cabal-file: 12345678 # blob key
163  archive: https://example.com/foobar-1.2.3.4.tar.gz
164
165- pantry: deadbeef
166
167```
168
169The question is: how does the first one (presumably what a user would want to
170enter) be resolved into the second and third? Pantry would follow this set of
171steps:
172
173* Download the tarball from the given URL
174* Place each file in the tarball into its store as a blob, getting a blob key
175  for each. The tarball is now represented as `Map FilePath BlobKey`
176* Perform the root directory stripping step, removing a shared path
177* Since there's no subdirectory: no subdirectory stripping would be performed
178* Serialize the `Map FilePath BlobKey` to a binary format and take its hash to
179  get a tree key
180* Store the tree in the store referenced by its tree key. In our example: the
181  tree key is `deadbeef`.
182* Ensure that the tree is a valid package by checking for a single cabal file
183  at the root. In our example, that's found in `foobar.cabal` with blob key
184  `12345678`.
185* Parse the cabal file and ensure that it is a valid cabal file, and that its
186  package name is `foobar`. Grab the version number (1.2.3.4).
187* We now know that tree key `deadbeef` is a valid package, and can refer to it
188  by tree key exclusively. However, including the other information allows us
189  to verify our assumptions, provide user-friendly readable data, and provide a
190  fallback if the package isn't in the Pantry cache.
191
192## More advanced content discovery
193
194There are three more advanced cases to consider:
195
196* Providing fall-back locations for content, such as out of concern for a
197  single URL being removed in the future
198* Closed corporate setups, where access to the general internet may either be
199  impossible or undesirable
200* Automatic discovery of missing content by hash
201
202The following extensions are possible to address these cases:
203
204* Instead of a single package location, provide a list of package locations
205  with fallback semantics.
206* Corporate environments will be encouraged to run a local Pantry mirror, and
207  configure clients like Stack to speak to these mirrors instead of the default
208  ones (or in addition to).
209* Provide some kind of federation protocol for Pantry where servers can
210  registry with each other and requests for content can be pinged to each
211  other.
212
213Providing override at the client level for Pantry mirror locations is a
214__MUST__. Making it easy to run in a corporate environment is a __SHOULD__.
215Providing the fallback package locations seems easy enough that we should
216include it initially, but falls under a __SHOULD__. The federated protocol
217should be added on-demand.
218