1JARs and JAR entries in ABCL 2============================ 3 4 Mark Evenson 5 Created: 09 JAN 2010 6 Modified: 02 NOV 2019 7 8Notes towards an implementation of "jar:" references to be contained 9in Common Lisp `PATHNAME`s within ABCL. 10 11Broken implementation 12--------------------- 13 14abcl-1.5.0 was discovered to be broken with respect to nested jar 15entries in November 2019. This is evidenced by the tests invoked via 16 17 (asdf:test-system :abcl) 18 19failing with 20 21 Failed to parse URL 'jar:jar:file:a/baz.jar!/b/c/foo.abcl!/'Nested JAR URLs are not supported 22 23In researching where to fix, a flaw in the reasoning about nesting jar 24pathnames emerged. The current implementation uses the device as a 25CONS for storing the results of the hacky processing around the `jar` 26scheme. This was reasoned to be "good enough" in that it kept the 27pathnames referencing pathnames to a minimum and no suitable case had 28been meaningful forwarded. In the days of Überjars, where it is 29perfectly accepable to have jars within jars, here is a counter-example: 30 31 The jar containing the jar containing the abcl fasl 32 33We need to name all possible locations of ABCL fasl files. 34 35To fix this, we need to allow the following structure for 36 37 #p"jar:jar:jar:file:abcl.jar!/b/c/foo.abcl!/foo.cls" 38 39resolve to linked PATHNAME-DEVICE references: 40 41 "foo.cls" --device--> "foo.abcl" --device--> "abcl.jar" 42 43Towards Fixing 44============== 45 46It would be better to reflect the pathname hierarchy as Java classes. 47Although hooking up things is gonna take some elbow grease, being to 48cleanly separate the logic for our schemas like "jar", and the special 49handling that should happen with all pathnames whose namestring starts 50with a schema we handle (like HTML encoding into/out of expression) 51would be helpful. 52 53We make a breaking change with how we abstract the notion of "Archive" 54and "Archive Entries". 55 56Pathname DEVICE fields currently contain either 57 58+ a single digit denoting a UNC drive (Windows) 59 60+ a list containing one or two pathnames denoting paths within archives 61 62It is conceptually much more correct to only have a single Pathname in 63a file to denote the source of an archive. 64 65 66 67 68Goals 69----- 70 711. Use Common Lisp pathnames to refer to entries in a jar file. 72 732. Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for 74 namestring representation. 75 76 An entry in a JAR file: 77 78 #p"jar:file:baz.jar!/foo" 79 80 A JAR file: 81 82 #p"jar:file:baz.jar!/" 83 84 A JAR file accessible via URL 85 86 #p"jar:http://example.org/abcl.jar!/" 87 88 An entry in a ABCL FASL in a URL accessible JAR file 89 90 #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls" 91 92[jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html 93 943. `MERGE-PATHNAMES` working for jar entries in the following use cases: 95 96 (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._") 97 ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls" 98 99 (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/") 100 ==> "jar:file:foo.abcl!/foo-1.cls" 101 1024. TRUENAME and PROBE-FILE working with "jar:" with TRUENAME 103 cannonicalizing the JAR reference. 104 1055. DIRECTORY working within JAR files (and within JAR in JAR). 106 1076. References "jar:<URL>" for all strings <URL> that java.net.URL can 108 resolve works. 109 1107. Make jar pathnames work as a valid argument for OPEN with 111:DIRECTION :INPUT. 112 1138. Enable the loading of ASDF systems packaged within jar files. 114 1159. Enable the matching of jar pathnames with PATHNAME-MATCH-P 116 117 (pathname-match-p 118 "jar:file:/a/b/some.jar!/a/system/def.asd" 119 "jar:file:/**/*.jar!/**/*.asd") 120 ==> t 121 122Status 123------ 124 125All the above goals have been implemented and tested. 126 127 128Implementation 129-------------- 130 131A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME. 132It can either refer to the entire JAR file or an entry within the JAR 133file. 134 135A JAR PATHNAME always has a DEVICE which is a proper list. This 136distinguishes it from other uses of Pathname. 137 138The DEVICE of a JAR PATHNAME will be a list with either one or two 139elements. The first element of the JAR PATHNAME can be either a 140PATHNAME representing a JAR on the filesystem, or a URL PATHNAME. 141 142A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is 143known as a DEVICE PATHNAME. 144 145Only the first entry in the the DEVICE list may be a URL PATHNAME. 146 147Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file. 148 149The DEVICE PATHNAME list of enclosing JARs runs from outermost to 150innermost. The implementaion currently limits this list to have at 151most two elements. 152 153The DIRECTORY component of a JAR PATHNAME should be a list starting 154with the :ABSOLUTE keyword. Even though hierarchial entries in jar 155files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp", 156the meaning of DIRECTORY component is better represented as an 157absolute path. 158 159A jar Pathname has type JAR-PATHNAME, derived from PATHNAME. 160 161 162BNF 163--- 164 165An incomplete BNF of the syntax of JAR PATHNAME would be: 166 167 JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ] 168 169 URL ::= <URL parsable via java.net.URL.URL()> 170 | JAR-FILE-PATHNAME 171 172 JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ] 173 174 JAR-NAMESTRING ::= ABSOLUTE-FILE-NAMESTRING 175 | RELATIVE-FILE-NAMESTRING 176 177 ENTRY ::= [ DIRECTORY "/"]* FILE 178 179 180### Notes 181 1821. `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use 183the local filesystem conventions, meaning that on Windows this could 184contain '\' as the directory separator, which are always normalized to 185'/'. An `ENTRY` always uses '/' to separate directories within the 186jar archive. 187 188 189Use Cases 190--------- 191 192 // UC1 -- JAR 193 pathname: { 194 namestring: "jar:file:foo/baz.jar!/" 195 device: ( 196 pathname: { 197 device: "jar:file:" 198 directory: (:RELATIVE "foo") 199 name: "baz" 200 type: "jar" 201 } 202 ) 203 } 204 205 206 // UC2 -- JAR entry 207 pathname: { 208 namestring: "jar:file:baz.jar!/foo.abcl" 209 device: ( pathname: { 210 device: "jar:file:" 211 name: "baz" 212 type: "jar" 213 }) 214 name: "foo" 215 type: "abcl" 216 } 217 218 219 // UC3 -- JAR file in a JAR entry 220 pathname: { 221 namestring: "jar:jar:file:baz.jar!/foo.abcl!/" 222 device: ( 223 pathname: { 224 name: "baz" 225 type: "jar" 226 } 227 pathname: { 228 name: "foo" 229 type: "abcl" 230 } 231 ) 232 } 233 234 // UC4 -- JAR entry in a JAR entry with directories 235 pathname: { 236 namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls" 237 device: ( 238 pathname { 239 directory: (:RELATIVE "a") 240 name: "bar" 241 type: "jar" 242 } 243 pathname { 244 directory: (:RELATIVE "b" "c") 245 name: "foo" 246 type: "abcl" 247 } 248 ) 249 directory: (:RELATIVE "this" "that") 250 name: "foo-20" 251 type: "cls" 252 } 253 254 // UC5 -- JAR Entry in a JAR Entry 255 pathname: { 256 namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls" 257 device: ( 258 pathname: { 259 directory: (:RELATIVE "a" "foo") 260 name: "baz" 261 type: "jar" 262 } 263 pathname: { 264 directory: (:RELATIVE "c" "d") 265 name: "foo" 266 type: "abcl" 267 } 268 ) 269 directory: (:ABSOLUTE "a" "b") 270 name: "bar-1" 271 type: "cls" 272 } 273 274 // UC6 -- JAR entry in a http: accessible JAR file 275 pathname: { 276 namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class", 277 device: ( 278 pathname: { 279 namestring: "http://example.org/abcl.jar" 280 } 281 pathname: { 282 directory: (:RELATIVE "org" "armedbear" "lisp") 283 name: "Version" 284 type: "class" 285 } 286 } 287 288 // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE 289 pathname: { 290 namestring "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls" 291 device: ( 292 pathname: { 293 namestring: "http://example.org/abcl.jar" 294 } 295 pathname: { 296 name: "foo" 297 type: "abcl" 298 } 299 ) 300 name: "foo-1" 301 type: "cls" 302 } 303 304 // UC8 -- JAR in an absolute directory 305 306 pathame: { 307 namestring: "jar:file:/a/b/foo.jar!/" 308 device: ( 309 pathname: { 310 directory: (:ABSOLUTE "a" "b") 311 name: "foo" 312 type: "jar" 313 } 314 ) 315 } 316 317 // UC9 -- JAR in an relative directory with entry 318 pathname: { 319 namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp" 320 device: ( 321 directory: (:RELATIVE "a" "b") 322 name: "foo" 323 type: "jar" 324 ) 325 directory: (:ABSOLUTE "c" "d") 326 name: "foo" 327 type: "lisp 328 } 329 330 331URI Encoding 332------------ 333 334As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for 335that type. Most notably this means that all #\Space characters should 336be encoded as '%20' when dealing with jar entries. 337 338 339History 340------- 341 342Previously, ABCL did have some support for jar pathnames. This support 343used the convention that the if the device field was itself a 344pathname, the device pathname contained the location of the jar. 345 346In the analysis of the desire to treat jar pathnames as valid 347locations for `LOAD`, we determined that we needed a "double" pathname 348so we could refer to the components of a packed FASL in jar. At first 349we thought we could support such a syntax by having the device 350pathname's device refer to the inner jar. But with in this use of 351`PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC 352path support uses the `DEVICE` field so JARs located on UNC mounts can't 353be referenced. via '\\', i.e. 354 355 jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java 356 357would not have a valid representation. 358 359So instead of having `DEVICE` point to a `PATHNAME`, we decided that the 360`DEVICE` shall be a list of `PATHNAME`, so we would have: 361 362 pathname: { 363 namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/" 364 device: ( 365 pathname: { 366 host: "server" 367 device: "share" 368 name: "foo" 369 type: "jar" 370 } 371 pathname: { 372 name: "foo" 373 type: "abcl" 374 } 375 ) 376 } 377 378Although there is a fair amount of special logic inside `Pathname.java` 379itself in the resulting implementation, the logic in `Load.java` seems 380to have been considerably simplified. 381 382When we implemented URL Pathnames, the special syntax for URL as an 383abstract string in the first position of the device list was naturally 384replaced with a URL pathname. 385 386 387