This document is also available in these non-normative formats: XML.
This proposal defines a packaging system for various core XML technologies: XSLT, XQuery, and XProc. The goal is to define it in a way enough generic so to adapt it to other technologies in the future (such as XML Schema, XForms, etc.) using the same framework. Besides enabling the delivery of libraries written in standard XSLT, XQuery and XProc, it provides support for extensions specific to some processors, as well as enabling new processors to be supported by using the same framework.
1 Introduction
2 Concepts
3 Standard components
3.1 XSLT
3.2 XQuery
3.3 XProc
3.4 XML Schema
3.5 RelaxNG
3.6 Schematron
3.7 NVDL
3.8 Not supported file kinds
4 Processor behaviour
4.1 XProc pipelines
5 On-disk repository layout
6 Example
7 Extensibility
8 Editorial notes
XSLT, XQuery and XProc are amazing programming languages. But they lack a large choice of libraries, and when such libraries do exist, this is a challenge to install. There is no automatic install process, the rules are different for each processor, library authors do not follow the same rules regarding the info they provide, the cataloging, the way they reference third-party libraries, etc.
All those problems (well, most of them) can be addressed by a packaging system that would be broadly adopted by processor vendors and library authors. The cornerstone of such a system is the packaging format: a description of the information to be provided by the library authors and how to provide and structure them.
A library is a set of files fulfilling a common purpose. An XSLT library can for
instance provide a set of template rules and functions to help formating a particular
XML document type. A package is a way to bundle those files into a single ZIP file,
following a defined structure and providing more information within the package
descriptor. The package descriptor is a plain XML file, named
expath-pkg.xml
at the root of the ZIP file, and containing information
about the library (like its name and its version number) and about the files it provides
and how to reference them (for instance stylesheets and query modules.)
The ZIP file structure (aka the package structure) must have exactly two entries at the top level: the package descriptor and one directory entry. This directory contains all the library files, and all file references in the package descriptor are relative to this directory. This directory is called the library directory.
All the elements in the package descriptor are defined in the namespace
http://expath.org/mod/expath-pkg
. All the elements defined in this
specification or used in samples and in text are in this namespace, even if no prefix is
used. The root element is package
, and contains exactly one child element
module
:
<package> module </package> <module name = NCName version = string> title, (xslt |xquery |xproc |xsd |rng |rnc |...)+ </module>
name
is the library name. The top-level directory in the package structure
must have the exact same name. The module has also a version number, and a
human-readable title. It then provides information about one or several files. Those
files are called the components. In addition to those standard file
descriptors, it can also contain elements specific to some processors (for instance an
element for Saxon, eXist, etc.) Details are provided below.
The components are the files exported by the module. But the whole library directory must be preserved. Indeed, it can contain other, private files, aimed to be used only from within library files, not from the outside.
The components are accessed from the outside of the package by using a URI. This URI is
the public URI, and absolute URI, which cannot be of scheme file:
.
Its exact usage depends on the kind of component (for instance, with XSLT it is aimed at
be used in xsl:import, and in XQuery this is the target namespace of an XQuery library
module.) Each kind of component defines its own URI space. So to uniquely
identify a component in the repository, one needs the public URI and the URI space to
use.
Here is the description of the standard component kinds supported by this specification, and how they contribute to the package descriptor document type.
An XSLT file is associated a public import URI.This is the URI to use in
an XSLT import instruction (aka xsl:import
) to import the XSLT file
provided in the package. This file is configured with the element
xslt
.
<xslt> import-uri, file </xslt>
The element file
contains the path to the file within the package
structure, relative to the library directory. Both elements import-uri
and file
are of type anyURI
.
An XQuery library module is referenced by its namespace URI. Thus the
xquery
element associates a namespace URI to an XQuery file. An
importing module just need to use an import statement of the form import module
namespace xx = "<namespace-uri>";
.
<xquery> namespace, file </xquery>
Note that there is no way to set any location hint (as the at
clause in
the import statement.) To use this packaging system, an XQuery library module must be
referenced by its target namespace.
An XProc pipeline, like an XSLT stylesheet, is associated a public import
URI, aimed to be used in an XProc p:import
statement.
<xproc> import-uri, file </xproc>
An XML schema can be imported using its target namespace. Like for XQuery, there is
no way to use any schemaLocation
instead. There is neither the ability
to set several files as several sources for the schema. If the schema is spread over
multiple files, there must be one top-level file that includes the other files.
<xsd> namespace, file </xsd>
(TODO: Should we support schemas with empty target namespace? I am not sure this is a
good idea in a packaging system...) (TODO: This does not support
xs:redefine
, as it requires a hint, not a TNS)
A RelaxNG schema, like an XSLT stylesheet, is associated a public import URI, aimed to be used in an import statement (either the include element for an RNG schema or an import directive for an RNC schema.)
<rng> import-uri, file </rng> <rnc> import-uri, file </rnc>
A Schematron schema is associated a public URI.
<schematron> import-uri, file </schematron>
Documentation (like result of XSLStyle or xqDoc) is not taken into account in the packaging format, though that could be used by IDEs for instance to provide documentation for functions in an editor with a live completion feature. Some support for documentation can of course be added as a product-specific feature to the package descriptor.
A processor is any program that use packaged components. For instance, an XSLT processor uses XSLT stylesheets (as well as XML schemas for an XSLT 2.0 SA processor), an XML database can use XQuery modules, XSLT stylesheets or any other kind of component, etc. But processors include also IDEs, editors, and any program that could want to use packaged components and that supports this specification.
The installed package list is implementation-defined. Each implementation (for a specific processor) can define its own way to install and remove packages, as long as it properly documents it. A processor should use, when appropriate, the on-disk repository layout as defined below.
When a reference to a file of a specific kind is done via an absolute URI, a processor must look up for this URI in the corresponding URI space in the repository. How the repository is set to the processor is implementation-defined (a processor can also use a list of repositories, and enable or disable some libraries in any implementation-defined way.)
The URI space to use is defined by the nature of the reference. An XSLT
href
attribute on xsl:import
will use the xslt
URI space, while it will use the xsd
space for
xsl:import-schema
.
An XProc processor in particular has to pay great attention to the space it use
regarding the step that is beng evaluated. Any xsl:import
instruction
encountered on the stylesheet
port of the step p:xslt
has
to be looked for in the xslt
space (regardless if the stylesheet
document is inlined in the pipeline, computed, loaded from the file system or
retrieved from the Internet, or if the containing stylesheet has been imported
itself.)
The XProc elements p:document
and p:data
, as well as the
step p:load
are handled specially. They can be used to access any kind
of resource, including but not limited to components in a repository. The user has to
tell explicitely the processor what kind of component is looked for by using the
pkg:kind
extension attribute. For instance, a stylesheet can be
loaded from a repository as input to the step xslt
as following:
<p:xslt> <p:input port="stylesheet"> <p:document href="..." pkg:kind="xslt"/> </p:input> ... </p:xslt>
This section defines a standard structure for on-disk repositories. An implementation can choose to not support this kind of repository and to define its own one (or even to not define it publicly, just to provide the ability to install and remove packages, in a clearly documented way.) However, there are several advantages to support this structure, the most obvious one is to be able to benefit from existing tools to manage such repositories as well as existing libraries to access those repositories.
The resolving machinery is based on OASIS XML Catalogs [Catalogs]. The
repository is a simple directory, each subdirectory of which is an installed package
(aka a package dir.) The only exception to this is the subdirectory
.expath-pkg/
which is dedicated to store working information about the
installed packages, among which the catalogs (aka the admin dir.)
[repository-dir]/ .expath-pkg/ xquery-catalog.xml xslt-catalog.xml lib1/ xquery-catalog.xml xslt-catalog.xml lib2/ ... lib1/ query.xq style.xsl lib2/ ...
The package dirs are really simple: they are simply an unzipped version of
the XAR file. The name of the directory is simply the same as the name of the module in
the package. The admin dir contains a catalog for each URI space (the
catalog for one specific URI space can not be there if there is no one file in that URI
space in the whole repository.) The name of such a catalog is
[space]-catalog.xml
where [space] is either xslt
,
xquery
, rnc
, etc. Those catalogs are called repository
catalogs. It also contains a subdirectory for each installed package, with the
same naming convention. In turn, those directories contain catalog files, containing the
mappings defined in the corresponding package descriptors (pointing to the actual files
installed in the package dirs.) Those are called the package
catalogs. They follow the same naming convention than the repository
catalogs (divided by URI spaces.) The repository catalogs just include the
several package catalogs for the same URI space.
[ ... TODO ... ]
This section provides a non-normative example to illustrate the concepts defined here. Instead of using a hello world example, it describes the packaging of the existing FunctX library. This library consists of a standard XQuery 1.0 library module and a standard XSLT 2.0 stylesheet (both provide the same set of functions to either XQuery or XSLT, but this is not relevant to packaging.)
The first thing to do is to create a ZIP file with both of those components, alongside a
package descriptor. The constraints are: 1/ the package descriptor is
named expath-pkg.xml
at the root of the package, 2/ the library content is
in a directory at the root of the package (aka the library directory), and
3/ the name of this directory must be the name of the library, and must be a valid
NCName. The structure (the content) of the library directory is completely free. In our
case, let's just put both component files directly in the library directory, and define
the library name as functx
:
expath-pkg.xml functx/ functx.xql functx.xsl
The XQuery library module's target namespace is defined by the module itself. For the
XSLT stylesheet, we have to define its public URI, used to identify it
within an xsl:import
(or any other means, for instance within XProc or an
IDE scenarii system). Let's define it as http://www.functx.com/functx.xsl
.
The package descriptor thus looks like the following:
<package xmlns="http://expath.org/mod/expath-pkg"> <module name="functx" version="1.0"> <title>FunctX library</title> <xquery> <namespace>http://www.functx.com</namespace> <file>functx.xql</file> </xquery> <xslt> <import-uri>http://www.functx.com/functx.xsl</import-uri> <file>functx.xsl</file> </xslt> </module> </package>
We just have to create a ZIP file with this structure and content. The convention is to
call this file functx-1.0.xar
(that is,
[name]-[version].xar). That's all for the package
itself.
[... TODO ...] (directory layout)
[repository-dir]/ .expath-pkg/ xquery-catalog.xml xslt-catalog.xml functx/ xquery-catalog.xml xslt-catalog.xml functx/ functx.xql functx.xsl
[ ... ] (content of .expath-pkg/xslt-catalog.xml)
<nextCatalog catalog="functx/xslt-catalog.xml"/>
[ ... ] (content of .expath-pkg/functx/xslt-catalog.xml)
<!-- TODO: Should there be a system entry as well? --> <uri name="http://www.functx.com/functx.xsl" uri="../../functx/functx.xsl"/>
[ ... ] (processor behaviour)
Should the package system define a set of XPath functions? Instead of just defining the package format and letting everything else as implementation-defined, should it in addition define a module of functions to install, delete, and more generally manage packages from within a processor?
Drawback: potential problems if the processor requires to be stopped?
Advantages: enables writing tools on top of the system (one single graphical package manager for one system, simply using the XPath functions, as well as easy integration within IDEs; or even other systems could be more easily be built on top of it, like a packaging system for XRX applications for instance.)
Should we add a generic "xml" URI space, for any XML document?
Should we add a restriction on public URIs: prohibiting the FILE scheme?
Interesting use case with XProc and NVDL: How to configure the XSLT processor in Calabash with the NVDL URI space for an NVDL implementation for XProc written using XSLT as a plain library step?
About the standard directory layout: should we still use XML Catalogs, or just a simple format to map public URIs to files?
Should we add support for XML Catalogs (written by the library author), in any way?
Add in the package descriptor an element for a description, or at very least a URL to the home of the project.