Packaging System

w3c-designation

EXPath Candidate Module

9 May 2012 XML Revision markup Florent Georges H2O Consulting

This specification was published by the EXPath Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

This proposal defines a packaging system for various core XML technologies: XSLT, XQuery, and XProc. The goal is to define it in a way generic enough so to adapt it to other technologies in the future (such as XML Schema, XForms, etc.) using the same framework. Besides enabling the delivery of libraries written in standard XSLT, XQuery and XProc, it provides support for extensions specific to some processors, as well as enabling new processors to be supported by using the same framework.

Must be ignored, but is required by the schema...

langusage

revisiondesc

Introduction

XSLT, XQuery and XProc are amazing programming languages. But they lack a large choice of libraries, and when such libraries do exist, this is a challenge to install. There is no automatic install process, the rules are different for each processor, library authors do not follow the same rules regarding the info they provide, the cataloging, the way they reference third-party libraries, etc.

All those problems (well, most of them) can be addressed by a packaging system that is broadly adopted by processor vendors and library authors. The cornerstone of such a system is the packaging format: a description of the information to be provided by the library authors and how to provide and structure them.

Concepts

A package is a set of files fulfilling a common purpose. A package could for instance provide XML schemas for a particular document type, alongside a set of XSLT stylesheets to format the same document type to HTML or XSL-FO. Or a package can simply provide an XQuery library module containing a set of functions (directly supporting the notion of library). Those files composing a package (stylesheets, schemas, pipelines, queries, etc.) are called its components.

A package has a unique name (a URI) as well as a convenient short name, also known as its abbrev (an NCName), used in contexts where a URI is not suitable (or simply not convenient) and where the uniqueness of the name is not necessary. A package has a set of dependencies too. A dependency is another package, identified by its full name, a package depends on.

A package provides all those infos as well as its components as a single file. This is then very convenient to organize packages, publish them, give them to a processor to install them, etc.

A component can be of one out of several types: an XSLT stylesheet, an XQuery main or library module, an XProc pipeline, an XML schema, a RELAX NG schema (XML or compact syntax), a Schematron schema, an NVDL schema, or a plain resource file. An implementation can add support for additional types. A component is identified by a public URI. This public URI is used to access the component from the outside of the package. Each component type has its own URI space, within which a component's URI has to be unique (so two different components can have the same public URI if they are of different types, but this is not recommended).

All the components composing the package, alongside an additional package descriptor, are used to create a ZIP file. This file is the physical package, and can be used to distribute the package to the users. The package descriptor is a plain XML file, named expath-pkg.xml at the root of the ZIP file, and containing information about the package (like its name and its version number) and about the components it provides and how to reference them.

Package descriptor

The package descriptor is an XML file, named expath-pkg.xml and located at the root of the ZIP file. It describe the whole package, and all of its components. Alongside this descriptor, the root of the ZIP file contains a directory named content, containing the components and any other file the package needs. All the relative URIs used to identify components are relative to the content directory. See for a normative schema.

Because this package format is designed to be extensible and used as a building block by other specifications, the ZIP file can contain another entries at top-level. They are just ignored when the package is deployed as a simple package following this specification.

Overall structure

All the elements in the package descriptor are defined in the namespace http://expath.org/ns/pkg. All the elements defined in this specification or used in samples and in text are in this namespace, even if no prefix is used. The root element is package, and contains, besides some naming and versionning attributes, a title, an optional home URI, an optional list of dependencies on other packages and on processors, and a list of components:

name is the name of the package. A package is named using an IRI, as defined by , excepted any IRI using the file: scheme (most frequent choices are http: and urn: scheme URIs). Note that the definition of IRI excludes relative references. abbrev is the package abbrev, and version its version. spec is the version of the packaging specification the package conforms to. The current specification requires the package to use the spec number 1.0 (no forward compatibility rules are defined, a processor conforming to this specification has to generate an error if the spec number is different than the string 1.0). The package then contains a title, which is a plain string, intended to be a simple description of the package for humans, and a home which is a URI to find more informations about the package. It then contains the list of its dependencies and the list of its components.

<dependency package = uri processor = string versions = string semver = string semver-min = string semver-max = string> empty </dependency>

Dependencies are set by using the name of other packages this package depends on. The dependency can also define the version of the package or the set of processors it depends on by using one of the few available strategies; see for a complete description of them.

Components

This section describes the standard component kinds supported by this specification, and how they contribute to the package descriptor document type. Every component has the same basic information: it associates a public URI to a specific file within the content directory. The file element contains a path, relative to the package content directory. Both elements in a component are of type anyURI.

XSLT

An XSLT file is associated a public import URI. This is the URI to use in an XSLT import instruction (aka xsl:import) to import the XSLT file provided in the package. This file is configured with the element xslt.

<xslt> import-uri, file </xslt> XQuery

An XQuery library module is referenced by its namespace URI. Thus the xquery element associates a namespace URI to an XQuery file. An importing module just need to use an import statement of the form import module namespace xx = "<namespace-uri>";.

An XQuery main module is associated a public URI. Usually an XQuery package will provide functions through library modules, but in some cases one can want to provide main modules as well.

<xquery> (namespace |import-uri), file </xquery>

Note that there is no way to set any location hint (as the at clause in the import statement.) To use this packaging system, an XQuery library module must be referenced by its target namespace.

XProc

An XProc pipeline, like an XSLT stylesheet, is associated a public import URI, aimed to be used in an XProc p:import statement.

<xproc> import-uri, file </xproc> XML Schema

An XML schema can be imported using its target namespace. It is not possible to set several files as several sources for the schema. If the schema is spread over multiple files, there must be one top-level file that includes the other files.

<xsd> (namespace |import-uri), file </xsd>

The import-uri can be used to define a schema location for this schema component. This can be useful for schema without target namespace, or for some specific usages, like when using xs:redefine.

RelaxNG

A RelaxNG schema, like an XSLT stylesheet, is associated a public import URI, aimed to be used in an import statement (either the include element for an RNG schema or an import directive for an RNC schema.)

<rng> import-uri, file </rng> <rnc> import-uri, file </rnc> Schematron

A Schematron schema is associated a public URI.

<schematron> import-uri, file </schematron> NVDL

An NVDL script is associated a public URI.

<nvdl> import-uri, file </nvdl> DTD

A DTD file is associated a public URI.

<dtd> public-id?, system-id, file </dtd> Resource

A resource file is associated a public URI. This can be any kind of file. It has to be used in accordance to its content. For instance accessing a text file through fn:unparsed-text() is correct, while using fn:doc() is not (it will raise an error because it parses the content as XML).

<resource> public-uri, file </resource> Not supported file kinds

Documentation (like result of XSLStyle or xqDoc) is not taken into account in the packaging format, though that could be used by IDEs for instance to provide documentation for functions in an editor with a live completion feature. Some support for documentation can of course be added as a product-specific feature to the package descriptor.

DEPRECATED: Processor behaviour

This section is deprecated, either to remove it completely or to rewrite it (in case some pieces should be kept here). Most of this material should be moved to the new section "URI resolving").

A processor is any program that use packaged components. For instance, an XSLT processor uses XSLT stylesheets (as well as XML schemas for an XSLT 2.0 SA processor), an XML database can use XQuery modules, XSLT stylesheets or any other kind of component, etc. But processors include also IDEs, editors, and any program that could want to use packaged components and that supports this specification.

The installed package list is implementation-defined. Each implementation (for a specific processor) can define its own way to install and remove packages, as long as it properly documents it. A processor should use, when appropriate, the on-disk repository layout as defined below.

Whether or not such a repository exists (or several repositories), the implementation must define an installed packages list. How this is done is outside the scope of this spec. An XML IDE could provide a way to select packages to activate for a specific scenario, or a web server container could activate packages on a per-web application basis.

When a reference to a file of a specific kind is done via an absolute URI, a processor must look up for this URI in the corresponding URI space in the repository. How the repository is set to the processor is implementation-defined (a processor can also use a list of repositories, and enable or disable some libraries in any implementation-defined way.)

The URI space to use is defined by the nature of the reference. An XSLT href attribute on xsl:import will use the xslt URI space, while it will use the xsd space for xsl:import-schema.

XProc pipelines

An XProc processor in particular has to pay great attention to the space it uses regarding the step that is being evaluated. Any xsl:import instruction encountered on the stylesheet port of the step p:xslt has to be looked for in the xslt space (regardless if the stylesheet document is inlined in the pipeline, computed, loaded from the file system or retrieved from the Internet, or if the containing stylesheet has been imported itself.)

The XProc elements p:document and p:data, as well as the step p:load are handled specially. They can be used to access any kind of resource, including but not limited to components in a repository. The user has to tell explicitly the processor what kind of component is looked for by using the pkg:kind extension attribute. For instance, a stylesheet can be loaded from a repository as input to the step xslt as following:

<p:xslt> <p:input port="stylesheet"> <p:document href="..." pkg:kind="xslt"/> </p:input> ... </p:xslt> Versioning and dependency management

Every package has a name (a URI) and a version number. It can also have a set of dependencies on other packages, identified by their names. Each dependency can also define a specific version (or range of versions) of the package it depends on. In addition to depending on other packages, a package can also depend on a specific processor (or on one among a set of specifc processors). This section uses the following example package to illustrate those concepts:

At first glance, we can see that this package (which is named http://example.org/app and has the version number 1.0), depends on the package http://external.org/library (no specific version), on the package http://partner.com/lib-2 (version 2, whatever minor revision numbers) and on the Saxon processor, Professional Edition. The rest of this section explains in details the rules behind those dependencies, how to represent them, and their semantics.

The remainder of this section will use the term primary for the current package, described by this package decriptor, and secondary for a package it depends on.

Package dependency

A dependency on another package (usually called a library), is represented by the element dependency, with an attribute package which is the name of the library the package depends on. The dependency can be versioned; that is, only some versions of the library are acceptable for this package.

The versionning attributes are versions, semver, semver-min and semver-max. They are all mutually exclusive, except semver-min and semver-max. If no versionning attribute is set on the dependency, any version of the secondary package, identified by its name (the URI in the package attribute), is acceptable. If the versions attribute is used, it defines the exact set of acceptable versions for the secondary package, separated by spaces.

The other versionning attributes use to define the set of acceptable versions for the secondary package. We use only the syntax of SemVer, and define in addition the format of a SemVer template as being either a SemVer version number or a subpart of it (i.e. only the major version, or the major and the minor versions, or the major, the minor and the patch version). For instance 1.9 is a valid SemVer template (because it does not have any patch number), while 1.9.0 and 1.9.23 are actual version numbers compatible with this template.

If the semver attribute is used, the secondary package version must be compatible with this template. If semver-min is used, the secondary package version must be either compatible with that SemVer template, or greater than it. If semver-max is used, the secondary package version must be either compatible with that SemVer template, or lower than it. For instance, with semver-min="2.3" and semver-max="3", any version of the secondary package which is equal or above 2.3.0 and strictly below 4.0.0 is valid (that is, 2.3.0, 3.0.0, 3.99.87, etc.)

Processor dependency

In addition to depending on other packages, a dependency can tell which processor(s) is (are) needed for this package:

If there is several such dependencies, the package is supported by a specific processor as soon as one of the proc dependencies does match. The meaning of the versionning attributes on processor dependencies is defined by each processor.

Dependency enforcement

How those rules are enforced is implementation-defined. For instance a command line tool to install a package in a repository can generate an error if the required dependencies are not installed, try to install them automagically from the web, or just emit a warning install the package hoping the dependencies to be installed later on, and provide an option to select the behaviour to use. Another implementation can enforce the dependencies at compile- or at run-time. An implementation SHOULD provide a way to enforce those rules, but can also provide an option to ignore them.

From the above text, a version number is not defined to have any particular semantics, except when used with one of the several SemVer attributes. The only constraint is that a version number cannot contain any whitespace character (as defined by the recommendation). A library author SHOULD use version numbers compatible with SemVer though, unless there is a justification not to use it.

URI resolution

At the end of the day, having packages installed in a repository is interesting only if one is able to use the installed components. This can be done by invoking directly the component, e.g. an XProc pipeline directly from the command-line or an XSLT stylesheet from an IDE, or most of the time by importing the component from within user's own components. For instance, a user can import the FunctX XSLT implementation in his/her own stylesheet, by refering to it only by its public import URI, provided the FunctX package is available to the processor the user uses to evaluate the overall transform.

In order to promote interoperability, this specification defines a standard URI resolving mechanism, and how to import installed components from within user's own components. In order to stay generic enough though, the initial list of available package at runtime, as well as the way to set the initial options of this algorithm, are implementation-defined. Indeed, the way to set the available package list will be different on an application server and on an IDE, or on a standalone processor and on an XML database. The same way, although the algorithm defines a set of options to control its behaviour, the way to set those is out of the scope of this specification; a .Net API, an IDE and a webapp container, to mention only them, have very different mechanism to let the user or developer set options.

A public component is a component which is part of a package, and is associated a public import URI within the package descriptor. A public component as a associated URI space, as explained earlier. A universe is a set of packages available to some processor, at some point in time, in some context. As described later, a universe can be e.g. the latest version of every packages installed in a specific repository; another example is a single package (e.g. used as a web application) and all its explicit dependencies, recursively.

The way to define a universe is implementation-defined, but implementations SHOULD allow a user to create a universe from either: a repository or a set of repositories (actually choosing the latest version for each package in the repository), or from an explicit set of packages, given their names (actually choosing the latest version for each package in the list), or from an expicit set of packages, given their names and their version numbers (or if no explicit version number is available, choose the latest version for the given package). In the two later cases, an implementation SHOULD provide the user with the ability of including explicit package dependencies as well, either recursively or direct dependencies only. A universe SHOULD NOT contain several packages with the same name but different versions. A universe can not contain several packages with the same name and the same version number, because in that case their are considered the same package regardless they actually are duplicated in the repository (this is the responsability of the user to prevent inconsistencies in that case).

TODO: Define what's a processor, what's compilation, and the case or 'resource'... (that is, the context of a resolution)

When a processor, whilst compiling a component, encounters an import statement, it SHOULD perform the following steps:

if the URI of the imported module is relative or uses the file URI scheme, then the usual resolving mechanism of the processor is used (if the import statement appears within a component from a package, it must resolve relatively to that component, according to the original package structure),

depending on the context of the import statement, the processor determines the URI space to look into,

if there is a public component in the correct URI space in the initial universe, the processor uses it,

if there is nothing corresponding in the universe, the usual resolving mechanism of the processor is used (for instance trying to resolve the URI as a URL or generating an error).

The rule to define the proper URI space to look into in order to resolve the URI of an import statement

On-disk repository layout

This section defines a standard structure for on-disk repositories. An implementation can choose not to support this kind of repository and to define its own one (or even to not define it publicly, just to provide the ability to install and remove packages, in a clearly documented way.) However, there are several advantages to support this structure, the most obvious one is to be able to benefit from existing tools to manage such repositories as well as existing libraries to access those repositories.

The repository is a directory on the file system, within which the packages are installed. Each package results in a subdirectory in the repository, where the content of the XAR file is unzipped. Next to those package directories, there is also a subdirectory named .expath-pkg/, which contains administrative information about the packages installed in this repository.

The package directories SHOULD be named after the package abbreviation (the abbrev attribute in the package descriptor) and the version number, separated by a dash. A package directory name must not contain the space character.

Because these directory names cannot be ensured to be unique among every packages installed in the repository, and because the processors will in some case have to translate some characters (depending on the rules of the filesystem they put the repository on), they cannot be used as a perfect mapping between package abbreviations and directories. It is only described here as a note, in the hope that the several implementations will use similar naming strategies, meaningful for humans.

The repository maintains a list of installed packages. For each of them, this list contains its name (the name URI), its subdirectory name, and its version number. This list is maintained within two files: .expath-pkg/packages.xml and .expath-pkg/packages.txt. Both contain the same information, the former in an XML format and the later as plain text. The XML format looks like the following (see for a normative schema):

The text file is a line-separated file (either line termination can be used: \r, \n or \n\r, though the standard Internet line termination is preferred: \n\r). Each line represent a package, and is a space-separated record. The first field is the package directory (the name if the subdirectory within the repository where the package has been unzipped), the second one is the package name (the name URI), and the last one is the version number. No one of those fields can contain any space character. Every field is separated by exactly one space character (the Unicode character with code point #x20). This file then looks like:

some-package-2.3 http://example.org/some/package 2.3 some-lib-0.1 http://example.com/some-library 0.1 ...

The overall repository thus looks like:

[repository-dir]/ .expath-pkg/ packages.txt packages.xml some-package-2.3/ expath-pkg.xml content/ query.xq style.xsl some-lib-0.1/ expath-pkg.xml content/ ...

Because the package descriptor is unzipped in the package directory, as well as the rest of the files contained in the package, this is easy to find the mapping between package directories and package names and versions: just look into the package descriptor into every package directory, and you will find what you are looking for. But in order to allow other programs to access easily the list of installed packages, we provide the list in .expath-pkg/packages.xml. And because shell scripts must be able to access that list as well, a second file, text-oriented, is provided; of course this is redundant information, but it is maintained automatically by the repository manager, so the redundancy is not an issue.

In addition to this structure, a specific processor can store information in subdirectories (both in the repository level or within each package directories), provided those directories start with a dot, and are named unambiguously with the name of the processor. For instance [repository]/.saxon/ for repository-wide config for Saxon, or [repository]/[some-pkg]/.exist/ for config for eXist for a specific package.

Example

This section provides a non-normative example to illustrate the concepts defined here. Instead of using a hello world example, it describes the packaging of the existing FunctX library . This library consists of a standard XQuery 1.0 library module and a standard XSLT 2.0 stylesheet (both provide the same set of functions to either XQuery or XSLT, but this is not relevant to packaging.)

The first thing to do is to create a ZIP file with both of those components, alongside a package descriptor. The constraints are: 1) the package descriptor is named expath-pkg.xml at the root of the package, and 2) the library content is in a directory named content at the root of the package (aka the content directory). The structure of the content directory is completely free. In our case, let's just put both component files directly in the content directory, and define the module name as functx:

expath-pkg.xml content/ functx.xql functx.xsl

The XQuery library module's target namespace is defined by the module itself. For the XSLT stylesheet, we have to define its public URI, used to identify it within an xsl:import (or any other means, for instance within XProc or an IDE scenarii system). Let's define it as http://www.functx.com/functx.xsl. The package descriptor thus looks like the following:

<package xmlns="http://expath.org/ns/pkg" spec="1.0" name="http://www.functx.com" abbrev="functx" version="1.0"> <title>FunctX library</title> <xquery> <namespace>http://www.functx.com</namespace> <file>functx.xql</file> </xquery> <xslt> <import-uri>http://www.functx.com/functx.xsl</import-uri> <file>functx.xsl</file> </xslt> </package>

We just have to create a ZIP file with this structure and content. The convention is to call this file functx-1.0.xar (that is, [abbrev]-[version].xar). That's all for the package itself. If the target processor supports the on-disk repository layout, here is what the repository could look like after the package has been installed:

[repository-dir]/ .expath-pkg/ packages.txt packages.xml functx-1.0/ expath-pkg.xml content/ functx.xql functx.xsl

The content of .expath-pkg/packages.xml is:

The content of .expath-pkg/packages.txt is:

functx-1.0 http://www.functx.com 1.0

The package directory functx-1.0/ has been named after the abbrev and the version number. Its content is simply the content of the XAR file, unzipped. A user can then import the FunctX library from either an XQuery module or an XSLT stylesheet by using respectively an import statement:

import module namespace fx = "http://www.functx.com";

or an import instruction:

<xsl:import href="http://www.functx.com/functx.xsl"/> Extensibility

The package format defined in this specification is a complete system to package libraries, for the right definition of a library. But a specfic kind of library can require more information. For instance, a web application is also a set of components, but requiring more configuration. An additional descriptor format for such applications could be defined and added to this packaging framework to define a specialized kind of packages. By building upon this packaging format, the new format would benefit from the existing mechanism to map public URIs to components.

Another extensibility point is the definition of additional component types. The component types defined here are the standard types, but an implementation may support additional implementation-defined types.

More generally, an implementation may define any extension element or attribute in the package descriptor to achieve its purposes, providing that the elements and attributes it defines are neither in the EXPath Packaging namespace nor in the null namespace (an attribute on an extension element can be in no namespace, but an attribute on a standard element must be in a specific namespace).

Package descriptor schema Package list schema References FunctX library of functions for XQuery and XSLT 2.0, Datypic. RFC 3987: Internationalized Resource Identifiers (IRIs). M. Duerst and M. Suignard, editors. Internet Engineering Task Force. January, 2005. Semantic Versioning, Tom Preston-Werner. Extensible Markup Language (XML) 1.0, Tim Bray et al.