This proposal defines a packaging system for various core XML technologies: XSLT, XQuery, and XProc. The goal is to define it in a way enough generic so to adapt it to other technologies in the future (such as XML Schema, XForms, etc.) using the same framework. Besides enabling the delivery of libraries written in standard XSLT, XQuery and XProc, it provides support for extensions specific to some processors, as well as enabling new processors to be supported by using the same framework.
Must be ignored, but is required by the schema...
revisiondesc
XSLT, XQuery and XProc are amazing programming languages. But they lack a large choice of libraries, and when such libraries do exist, this is a challenge to install. There is no automatic install process, the rules are different for each processor, library authors do not follow the same rules regarding the info they provide, the cataloging, the way they reference third-party libraries, etc.
All those problems (well, most of them) can be addressed by a packaging system that would be broadly adopted by processor vendors and library authors. The cornerstone of such a system is the packaging format: a description of the information to be provided by the library authors and how to provide and structure them.
A
A package has a unique name (a URI) as well as a convenient short name, also known as
its
A package provides all those infos as well as its components as a single file. This is then very convenient to organize packages, publish them, give them to a processor to install them, etc.
A
All the components composing the package, alongside an additional expath-pkg.xml
at the root of the ZIP
file, and containing information about the package (like its name and its version
number) and about the components it provides and how to reference them.
The package descriptor is an XML file, named expath-pkg.xml
and located at
the root of the ZIP file. It describe the whole package, and all of its components.
Alongside this descriptor, the root of the ZIP file contains a directory named after the
module name (see below) which contains the components and any other file the package
needs. All the relative URIs used to identify components are relative to the package
directory.
Because this package format is designed to be extensible and used as a building block by other specifications, the ZIP file can contain another entries at top-level. They are just ignored when the package is deployed as a simple package following this specification.
All the elements in the package descriptor are defined in the namespace
http://expath.org/ns/pkg
. All the elements defined in this
specification or used in samples and in text are in this namespace, even if no prefix
is used. The root element is package
, and contains a name, an optional
list of dependencies to other packages, and exactly one child element
module
:
name
is the name of the package. A package is named using an absolute
URI, except any file:
scheme URIs (most frequent choices are
http:
and urn:
scheme URIs). Dependencies are set by
using the name of other packages this package depends on. The module
content model is:
name
is a short module name. The package directory must have the exact
same name. The module has also a version number, and a human-readable title. It then
provides information about one or several components. In addition to those standard
components, it can also contain elements specific to some processors (for instance an
element for Saxon, eXist, etc.)
The following is the description of the standard component kinds supported by this specification, and how they contribute to the package descriptor document type.
An XSLT file is associated a xsl:import
) to import the XSLT file
provided in the package. This file is configured with the element
xslt
.
The element file
contains the path to the file within the package
structure, relative to the package directory. Both elements import-uri
and file
are of type anyURI
.
An XQuery library module is referenced by its namespace URI. Thus the
xquery
element associates a namespace URI to an XQuery file. An
importing module just need to use an import statement of the form import module
namespace xx = "<namespace-uri>";
.
An XQuery main module is associated a public URI. Usually an XQuery package will provide functions through library modules, but in some cases one can want to provide main modules as well.
Note that there is no way to set any location hint (as the at
clause in
the import statement.) To use this packaging system, an XQuery library module must be
referenced by its target namespace.
An XProc pipeline, like an XSLT stylesheet, is associated a p:import
statement.
An XML schema can be imported using its target namespace. Like for XQuery, there is
no way to use any schemaLocation
instead. There is neither the ability
to set several files as several sources for the schema. If the schema is spread over
multiple files, there must be one top-level file that includes the other files.
(TODO: Should we support schemas with empty target namespace? I am not sure this is a
good idea in a packaging system...) (TODO: This does not support
xs:redefine
, as it requires a hint, not a target namespace)
A RelaxNG schema, like an XSLT stylesheet, is associated a public import URI, aimed
to be used in an
A Schematron schema is associated a public URI.
An NVDL script is associated a public URI.
Documentation (like result of XSLStyle or xqDoc) is not taken into account in the packaging format, though that could be used by IDEs for instance to provide documentation for functions in an editor with a live completion feature. Some support for documentation can of course be added as a product-specific feature to the package descriptor.
A
The installed package list is implementation-defined. Each implementation (for a
specific processor) can define its own way to install and remove packages, as long as it
properly documents it. A processor should use, when appropriate, the
Whether or not such a repository exists (or several repositories), the implementation must define an installed packages list. How this is done is outside the scope of this spec. An XML IDE could provide a way to select packages to activate for a specific scenario, or a web server container could activate packages on a per-web application basis.
When a reference to a file of a specific kind is done via an absolute URI, a processor must look up for this URI in the corresponding URI space in the repository. How the repository is set to the processor is implementation-defined (a processor can also use a list of repositories, and enable or disable some libraries in any implementation-defined way.)
The URI space to use is defined by the nature of the reference. An XSLT
href
attribute on xsl:import
will use the xslt
URI space, while it will use the xsd
space for
xsl:import-schema
.
An XProc processor in particular has to pay great attention to the space it use
regarding the step that is beng evaluated. Any xsl:import
instruction
encountered on the stylesheet
port of the step p:xslt
has
to be looked for in the xslt
space (regardless if the stylesheet
document is inlined in the pipeline, computed, loaded from the file system or
retrieved from the Internet, or if the containing stylesheet has been imported
itself.)
The XProc elements p:document
and p:data
, as well as the
step p:load
are handled specially. They can be used to access any kind
of resource, including but not limited to components in a repository. The user has to
tell explicitely the processor what kind of component is looked for by using the
pkg:kind
extension attribute. For instance, a stylesheet can be
loaded from a repository as input to the step xslt
as following:
This section defines a standard structure for on-disk repositories. An implementation can choose to not support this kind of repository and to define its own one (or even to not define it publicly, just to provide the ability to install and remove packages, in a clearly documented way.) However, there are several advantages to support this structure, the most obvious one is to be able to benefit from existing tools to manage such repositories as well as existing libraries to access those repositories.
The resolving machinery is based on OASIS XML Catalogs .expath-pkg/
which is dedicated to store working information about the
installed packages, among which the catalogs (aka the
The [space]-catalog.xml
where [space] is either xslt
,
xquery
, rnc
, etc. Those catalogs are called
[ ... TODO ... ]
This section provides a non-normative example to illustrate the concepts defined here.
Instead of using a
The first thing to do is to create a ZIP file with both of those components, alongside a
expath-pkg.xml
at the root of the package, 2/ the library content is
in a directory at the root of the package (aka the module
element, and must be a valid NCName. The structure (the content)
of the package directory is completely free. In our case, let's just put both component
files directly in the package directory, and define the module name as
functx
:
The XQuery library module's target namespace is defined by the module itself. For the
XSLT stylesheet, we have to define its xsl:import
(or any other means, for instance within XProc or an
IDE scenarii system). Let's define it as http://www.functx.com/functx.xsl
.
The package descriptor thus looks like the following:
We just have to create a ZIP file with this structure and content. The convention is to
call this file functx-1.0.xar
(that is,
[
If the target processor supports the on-disk repository layout, here is what the repository could look like after the package has been installed:
The package directory has been copied verbatim, and two new catalogs have been created
for the package, pointing to the components in the package directory. The top-level
catalogs in the admin directory (one per URI space) just point to the other
package-specific catalogs. Here is for instance the content of
.expath-pkg/xslt-catalog.xml
, initiating the catalog list for the URI
space for XSLT (in our case the catalog for the URI space for XSLT of the single
installed package):
The catalogs at the package-level point to the actual components by using a relative
path within the repository. The following is for example the content of the catalog
.expath-pkg/functx/xslt-catalog.xml
:
A user can then import the FunctX library from either an XQuery module or an XSLT stylesheet by using respectively an import statement:
or an import instruction:
The package format defined in this specification is a complete system to package
libraries, for the right definition of a library. But a specfic kind of
Another extensibility point is the definition of additional component types. The component types defined here are the standard types, but an implementation may support additional implementation-defined types.
More generally, an implementation may define any extension element in the package descriptor to achieve its purposes, providing that the new elements it defines are neither in the EXPath Packaging namespace nor in the null namespace.
Should the package system define a set of XPath functions? Instead of just defining the package format and letting everything else as implementation-defined, should it in addition define a module of functions to install, delete, and more generally manage packages from within a processor?
Drawback: potential problems if the processor requires to be stopped?
Advantages: enables writing tools on top of the system (one single graphical package manager for one system, simply using the XPath functions, as well as easy integration within IDEs; or even other systems could be more easily be built on top of it, like a packaging system for XRX applications for instance.)
Should we add a generic "xml" URI space, for any XML document?
Interesting use case with XProc and NVDL: How to configure the XSLT processor in Calabash with the NVDL URI space for an NVDL implementation for XProc written using XSLT as a plain library step?
About the
Add in the package descriptor an element for a description, or at very least a URL to the home of the project.
[ ... TODO ... ]