The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.


EXPath

Packaging System

EXPath Candidate Module 7 January 2010

This version:
Latest version:
http://expath.org/spec/pkg
Previous versions:

http://expath.org/spec/pkg/20090901
Editor:
Florent Georges, H2O Consulting

This document is also available in these non-normative formats: XML and Revision markup.


Abstract

This proposal defines a packaging system for various core XML technologies: XSLT, XQuery, and XProc. The goal is to define it in a way enough generic so to adapt it to other technologies in the future (such as XML Schema, XForms, etc.) using the same framework. Besides enabling the delivery of libraries written in standard XSLT, XQuery and XProc, it provides support for extensions specific to some processors, as well as enabling new processors to be supported by using the same framework.

Table of Contents

1 Introduction
2 Concepts
3 Package descriptor
    3.1 Overall structure
    3.2 XSLT
    3.3 XQuery
    3.4 XProc
    3.5 XML Schema
    3.6 RelaxNG
    3.7 Schematron
    3.8 NVDL
    3.9 Not supported file kinds
4 Processor behaviour
    4.1 XProc pipelines
5 On-disk repository layout
6 Example
7 Extensibility
8 Editorial notes

Appendices

A Package descriptor schema
B References


1 Introduction

XSLT, XQuery and XProc are amazing programming languages. But they lack a large choice of libraries, and when such libraries do exist, this is a challenge to install. There is no automatic install process, the rules are different for each processor, library authors do not follow the same rules regarding the info they provide, the cataloging, the way they reference third-party libraries, etc.

All those problems (well, most of them) can be addressed by a packaging system that would be broadly adopted by processor vendors and library authors. The cornerstone of such a system is the packaging format: a description of the information to be provided by the library authors and how to provide and structure them.

2 Concepts

A package is a set of files fulfilling a common purpose. A package could for instance provide XML schemas for a particular document type, alongside a set of XSLT stylesheets to format the same document type to HTML or XSL-FO. Or a package can simply provide an XQuery library module containing a set of functions (directly supporting the notion of library). Those files composing a package (stylesheets, schemas, pipelines, queries, etc.) are called its components.

A package has a unique name (a URI) as well as a convenient short name, also known as its abbrev (an NCName), used in contexts where a URI is not suitable (or simply not convenient) and where the uniqueness of the name is not necessary. A package has a set of dependencies too. A dependency is another package, identified by its full name, a package depends on. [[ TODO: Introduce other kinds of dependencies, for instance collections... ? ]]

A package provides all those infos as well as its components as a single file. This is then very convenient to organize packages, publish them, give them to a processor to install them, etc.

A component can be of one out of several types: an XSLT stylesheet, an XML schema, a RELAX NG schema, an XProc pipeline, but also any other type a particular implementation chose to support. Its is identified by a public URI. This public URI is used to access the component from the outside of the package. Each component type has its own URI space, within which a component's URI has to be unique (so two different components can have the same public URI if they are of different types, but this is not recommended).

All the components composing the package, alongside an additional package descriptor, are used to create a ZIP file. This file is the physical package, and can be used to distribute the package to the users. The package descriptor is a plain XML file, named expath-pkg.xml at the root of the ZIP file, and containing information about the package (like its name and its version number) and about the components it provides and how to reference them.

A library is a set of files fulfilling a common purpose. An XSLT library can for instance provide a set of template rules and functions to help formating a particular XML document type. A package is a way to bundle those files into a single ZIP file, following a defined structure and providing more information within the package descriptor. The package descriptor is a plain XML file, named expath-pkg.xml at the root of the ZIP file, and containing information about the library (like its name and its version number) and about the files it provides and how to reference them (for instance stylesheets and query modules.)

The ZIP file structure (aka the package structure) must have exactly two entries at the top level: the package descriptor and one directory entry. This directory contains all the library files, and all file references in the package descriptor are relative to this directory. This directory is called the library directory.

All the elements in the package descriptor are defined in the namespace http://expath.org/mod/expath-pkg. All the elements defined in this specification or used in samples and in text are in this namespace, even if no prefix is used. The root element is package, and contains exactly one child element module:

<package>
   module
</package>

<module name = NCName
        version = string>
   title,
   (xslt
   |xquery
   |xproc
   |xsd
   |rng
   |rnc
   |...)+
</module>

name is the library name. The top-level directory in the package structure must have the exact same name. The module has also a version number, and a human-readable title. It then provides information about one or several files. Those files are called the components. In addition to those standard file descriptors, it can also contain elements specific to some processors (for instance an element for Saxon, eXist, etc.) Details are provided below.

The components are the files exported by the module. But the whole library directory must be preserved. Indeed, it can contain other, private files, aimed to be used only from within library files, not from the outside.

The components are accessed from the outside of the package by using a URI. This URI is the public URI, and absolute URI, which cannot be of scheme file:. Its exact usage depends on the kind of component (for instance, with XSLT it is aimed at be used in xsl:import, and in XQuery this is the target namespace of an XQuery library module.) Each kind of component defines its own URI space. So to uniquely identify a component in the repository, one needs the public URI and the URI space to use.

3 Package descriptor

The package descriptor is an XML file, named expath-pkg.xml and located at the root of the ZIP file. It describe the whole package, and all of its components. Alongside this descriptor, the root of the ZIP file contains a directory named after the module name (see below) which contains the components and any other file the package needs. All the relative URIs used to identify components are relative to the package directory.

Note:

Because this package format is designed to be extensible and used as a building block by other specifications, the ZIP file can contain another entries at top-level. They are just ignored when the package is deployed as a simple package following this specification.

3.1 Overall structure

All the elements in the package descriptor are defined in the namespace http://expath.org/ns/pkg. All the elements defined in this specification or used in samples and in text are in this namespace, even if no prefix is used. The root element is package, and contains a name, an optional list of dependencies to other packages, and exactly one child element module:

<package name = uri>
   dependency*,
   module
</package>

<dependency name = uri>
   empty
</dependency>

name is the name of the package. A package is named using an absolute URI, except any file: scheme URIs (most frequent choices are http: and urn: scheme URIs). Dependencies are set by using the name of other packages this package depends on. The module content model is:

<module name = NCName
        version = string>
   title,
   (xslt
   |xquery
   |xproc
   |xsd
   |rng
   |rnc
   |...)+
</module>

name is a short module name. The package directory must have the exact same name. The module has also a version number, and a human-readable title. It then provides information about one or several components. In addition to those standard components, it can also contain elements specific to some processors (for instance an element for Saxon, eXist, etc.)

The following is the description of the standard component kinds supported by this specification, and how they contribute to the package descriptor document type.

3.2 XSLT

An XSLT file is associated a public import URI.This is the URI to use in an XSLT import instruction (aka xsl:import) to import the XSLT file provided in the package. This file is configured with the element xslt.

<xslt>
   import-uri,
   file
</xslt>

The element file contains the path to the file within the package structure, relative to the package directory. Both elements import-uri and file are of type anyURI.

3.3 XQuery

An XQuery library module is referenced by its namespace URI. Thus the xquery element associates a namespace URI to an XQuery file. An importing module just need to use an import statement of the form import module namespace xx = "<namespace-uri>";.

An XQuery main module is associated a public URI. Usually an XQuery package will provide functions through library modules, but in some cases one can want to provide main modules as well.

<xquery>
   (namespace
   |import-uri),
   file
</xquery>
<xquery>
   namespace,
   file
</xquery>

Note that there is no way to set any location hint (as the at clause in the import statement.) To use this packaging system, an XQuery library module must be referenced by its target namespace.

3.4 XProc

An XProc pipeline, like an XSLT stylesheet, is associated a public import URI, aimed to be used in an XProc p:import statement.

<xproc>
   import-uri,
   file
</xproc>

3.5 XML Schema

An XML schema can be imported using its target namespace. Like for XQuery, there is no way to use any schemaLocation instead. There is neither the ability to set several files as several sources for the schema. If the schema is spread over multiple files, there must be one top-level file that includes the other files.

<xsd>
   namespace,
   file
</xsd>

(TODO: Should we support schemas with empty target namespace? I am not sure this is a good idea in a packaging system...) (TODO: This does not support xs:redefine, as it requires a hint, not a target namespace)

3.6 RelaxNG

A RelaxNG schema, like an XSLT stylesheet, is associated a public import URI, aimed to be used in an import statement (either the include element for an RNG schema or an import directive for an RNC schema.)

<rng>
   import-uri,
   file
</rng>

<rnc>
   import-uri,
   file
</rnc>

3.7 Schematron

A Schematron schema is associated a public URI.

<schematron>
   import-uri,
   file
</schematron>

3.8 NVDL

An NVDL script is associated a public URI.

<nvdl>
   import-uri,
   file
</nvdl>

3.9 Not supported file kinds

Documentation (like result of XSLStyle or xqDoc) is not taken into account in the packaging format, though that could be used by IDEs for instance to provide documentation for functions in an editor with a live completion feature. Some support for documentation can of course be added as a product-specific feature to the package descriptor.

4 Processor behaviour

A processor is any program that use packaged components. For instance, an XSLT processor uses XSLT stylesheets (as well as XML schemas for an XSLT 2.0 SA processor), an XML database can use XQuery modules, XSLT stylesheets or any other kind of component, etc. But processors include also IDEs, editors, and any program that could want to use packaged components and that supports this specification.

The installed package list is implementation-defined. Each implementation (for a specific processor) can define its own way to install and remove packages, as long as it properly documents it. A processor should use, when appropriate, the on-disk repository layout as defined below.

Whether or not such a repository exists (or several repositories), the implementation must define an installed packages list. How this is done is outside the scope of this spec. An XML IDE could provide a way to select packages to activate for a specific scenario, or a web server container could activate packages on a per-web application basis.

When a reference to a file of a specific kind is done via an absolute URI, a processor must look up for this URI in the corresponding URI space in the repository. How the repository is set to the processor is implementation-defined (a processor can also use a list of repositories, and enable or disable some libraries in any implementation-defined way.)

The URI space to use is defined by the nature of the reference. An XSLT href attribute on xsl:import will use the xslt URI space, while it will use the xsd space for xsl:import-schema.

4.1 XProc pipelines

An XProc processor in particular has to pay great attention to the space it use regarding the step that is beng evaluated. Any xsl:import instruction encountered on the stylesheet port of the step p:xslt has to be looked for in the xslt space (regardless if the stylesheet document is inlined in the pipeline, computed, loaded from the file system or retrieved from the Internet, or if the containing stylesheet has been imported itself.)

The XProc elements p:document and p:data, as well as the step p:load are handled specially. They can be used to access any kind of resource, including but not limited to components in a repository. The user has to tell explicitely the processor what kind of component is looked for by using the pkg:kind extension attribute. For instance, a stylesheet can be loaded from a repository as input to the step xslt as following:

<p:xslt>
   <p:input port="stylesheet">
      <p:document href="..." pkg:kind="xslt"/>
   </p:input>
   ...
</p:xslt>

5 On-disk repository layout

This section defines a standard structure for on-disk repositories. An implementation can choose to not support this kind of repository and to define its own one (or even to not define it publicly, just to provide the ability to install and remove packages, in a clearly documented way.) However, there are several advantages to support this structure, the most obvious one is to be able to benefit from existing tools to manage such repositories as well as existing libraries to access those repositories.

The resolving machinery is based on OASIS XML Catalogs [Catalogs]. The repository is a simple directory, each subdirectory of which is an installed package (aka a package dir.) The only exception to this is the subdirectory .expath-pkg/ which is dedicated to store working information about the installed packages, among which the catalogs (aka the admin dir.)

[repository-dir]/
   .expath-pkg/
      xquery-catalog.xml
      xslt-catalog.xml
      lib1/
         xquery-catalog.xml
         xslt-catalog.xml
      lib2/
         ...
   lib1/
      query.xq
      style.xsl
   lib2/
      ...

The package dirs are really simple: they are simply an unzipped version of the XAR file. The name of the directory is simply the same as the name of the module in the package. The admin dir contains a catalog for each URI space (the catalog for one specific URI space can not be there if there is no one file in that URI space in the whole repository.) The name of such a catalog is [space]-catalog.xml where [space] is either xslt, xquery, rnc, etc. Those catalogs are called repository catalogs. It also contains a subdirectory for each installed package, with the same naming convention. In turn, those directories contain catalog files, containing the mappings defined in the corresponding package descriptors (pointing to the actual files installed in the package dirs.) Those are called the package catalogs. They follow the same naming convention than the repository catalogs (divided by URI spaces.) The repository catalogs just include the several package catalogs for the same URI space.

[ ... TODO ... ]

6 Example

This section provides a non-normative example to illustrate the concepts defined here. Instead of using a hello world example, it describes the packaging of the existing FunctX library [FunctX]. This library consists of a standard XQuery 1.0 library module and a standard XSLT 2.0 stylesheet (both provide the same set of functions to either XQuery or XSLT, but this is not relevant to packaging.)

The first thing to do is to create a ZIP file with both of those components, alongside a package descriptor. The constraints are: 1/ the package descriptor is named expath-pkg.xml at the root of the package, 2/ the library content is in a directory at the root of the package (aka the package directory), and 3/ the name of this directory must be the value of the name attribute of the module element, and must be a valid NCName. The structure (the content) of the package directory is completely free. In our case, let's just put both component files directly in the package directory, and define the module name as functx:

expath-pkg.xml
functx/
   functx.xql
   functx.xsl

The XQuery library module's target namespace is defined by the module itself. For the XSLT stylesheet, we have to define its public URI, used to identify it within an xsl:import (or any other means, for instance within XProc or an IDE scenarii system). Let's define it as http://www.functx.com/functx.xsl. The package descriptor thus looks like the following:

<package xmlns="http://expath.org/ns/pkg">
   <module name="functx" version="1.0">
      <title>FunctX library</title>
      <xquery>
         <namespace>http://www.functx.com</namespace>
         <file>functx.xql</file>
      </xquery>
      <xslt>
         <import-uri>http://www.functx.com/functx.xsl</import-uri>
         <file>functx.xsl</file>
      </xslt>
   </module>
</package>

We just have to create a ZIP file with this structure and content. The convention is to call this file functx-1.0.xar (that is, [name]-[version].xar). That's all for the package itself.

If the target processor supports the on-disk repository layout, here is what the repository could look like after the package has been installed:

[repository-dir]/
   .expath-pkg/
      xquery-catalog.xml
      xslt-catalog.xml
      functx/
         xquery-catalog.xml
         xslt-catalog.xml
   functx/
      functx.xql
      functx.xsl

The package directory has been copied verbatim, and two new catalogs have been created for the package, pointing to the components in the package directory. The top-level catalogs in the admin directory (one per URI space) just point to the other package-specific catalogs. Here is for instance the content of .expath-pkg/xslt-catalog.xml, initiating the catalog list for the URI space for XSLT (in our case the catalog for the URI space for XSLT of the single installed package):

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
   <nextCatalog catalog="functx/xslt-catalog.xml"/>
</catalog>

The catalogs at the package-level point to the actual components by using a relative path within the repository. The following is for example the content of the catalog .expath-pkg/functx/xslt-catalog.xml:

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
   <!-- TODO: Should there be a system entry as well? -->
   <uri name="http://www.functx.com/functx.xsl"
        uri="../../functx/functx.xsl"/>
</catalog>

A user can then import the FunctX library from either an XQuery module or an XSLT stylesheet by using respectively an import statement:

import module namespace fx = "http://www.functx.com/";

or an import instruction:

<xsl:import href="http://www.functx.com/functx.xsl"/>

7 Extensibility

The package format defined in this specification is a complete system to package libraries, for the right definition of a library. But a specfic kind of library can require more information. For instance, a web application is also a set of components, but requiring more configuration. An additional descriptor format for such applications could be defined and added to this packaging framework to define a specialized kind of packages. By building upon this packaging format, the new format would benefit from the existing mechanism to map public URIs to components.

Another extensibility point is the definition of additional component types. The component types defined here are the standard types, but an implementation may support additional implementation-defined types.

More generally, an implementation may define any extension element in the package descriptor to achieve its purposes, providing that the new elements it defines are neither in the EXPath Packaging namespace nor in the null namespace.

8 Editorial notes

A Package descriptor schema

[ ... TODO ... ]

B References

Catalogs
XML Catalogs, OASIS Standard V1.1, 7 October 2005
FunctX
FunctX library of functions for XQuery and XSLT 2.0, Datypic