Copyright © 2011-2013 Florent Georges, published by the
This specification was published by the
This specification defines how to write web applications on server-side, using XML
technologies like XSLT, XQuery and XProc. It also defines their execution context, as
well as some functions they can use. Last but not least it defines how to package such
webapps, by using the
Must be ignored, but is required by the schema...
revisiondesc
The webapp descriptor uses the namespace http://expath.org/ns/webapp
, as
the default namespace. This namespace is also used for the XML representation of the
HTTP requests and responses, and to define several functions provided by the webapp
container. In this document, the web
prefix, when used, is bound to this
namespace URI.
Error codes are defined in the namespace http://expath.org/ns/error
. In
this document, the err
prefix, when used, is bound to this namespace
URI.
A web application, or webapp, in this specification, is a set of components, implementing an application. The application responds to HTTP requests and runs in a webapp container. The container provides the context of execution for the webapps, provides them with some functions, and is responsible for translating from and to HTTP. When the container receives an HTTP request, it identifies the corresponding component to process it (e.g. based on the request URL), builds an XML representation of the request, and calls the component. The component receives the HTTP request as an XML document, and returns an XML description of the HTTP response to send back to the client. The container then translates this XML document and sends the corresponding HTTP response.
A component is a piece of XSLT, XQuery or XProc. Each type of component defines how the request is passed to and the response is returned by such a component. But the XML format of the requests and responses is always the same. The existing types of components are:
an XSLT stylesheet
an XSLT function
an XSLT template
an XQuery query
an XQuery function
an XProc pipeline
an XProc step
A webapp contains a descriptor, which defines how to dispatch a request to a specific
component. The dispatching mechanism is based on the request URL, by associating a URL
pattern to a component public URI (a component public URI is defined in the
A webapp is installed at a specific context root. The context root is a path prefix, and
all the requests received "below" that prefix are served by that webapp. For instance, a
webapp installed on example.org
at the context root /somewhere
will serve all requests with a URL starting with http://example.org/somewhere/
.
The part after that prefix is the path to the servlet.
The HTTP requests and responses are represented as XML documents. The request is built by the container to represent the actual HTTP request received. The component returns a representation of the response, used by the container to actually respond to the client over HTTP. A sample request:
This request can be built by a container listening at http://example.org/
,
when it receives a request to GET the resource at /myapp/some/resource
. We
can see the path has been pre-analyzed, and the request contains various HTTP
information like the method, the request URL (decomposed in different ways), and the
headers. The request might also contain an entity content, also known as the body of
the request (e.g. in case of a PUT request).
In response to the above request, the invoked component could return the following response to the container:
This tells the container to return the XML document
<hello>World!</hello>
to the client, using the Content-Type
"application/xml", with the HTTP status code 200, and an extra header.
A HTTP request is represented by a sequence, the web:request
, and the remaining
items represent the entity content (there might be several of them in case of
multipart). The web:request
element is defined as the following:
A request contains the name of the matched servlet
, the request
path
, and its method
(also known as the HTTP verb, like
GET and POST) in lower case. The text element url
contains the original
request URL, including its query parameters if any. The URL also appears cut down
into several pieces. The element authority
is its first part, including
the URL scheme and the domain name, context-root
is the webapp context
root, path
is an alternative representation of the requested path (as in
the attribute path
), where some specific parts have been analyzed, and
the elements param
represent the query parameters.
Then come the HTTP header
elements, each with a name and a value. Then
the content of the HTTP entity (e.g. for a PUT), as an element body
or
multipart
, depending on the content type of the request.
The URL appears at several places in the request element, under different shapes. The
element url
is the original URL as typed by the user. Or at least it is
an educated guess of what it could be, as HTTP does not include the original URL in
the request (the port number for instance is not in the HTTP request).
The element authority
is the first part of the URL, including the URL
scheme and the domain name (up to the slash first, but not including it). The scheme
can be either "http:" or "https:". It is then followed by two slashes then the domain
name.
The element context-root
is the webapp context root on this server. It
is fixed for the webpp, and represents where the webapp has been "installed" on the
server (the webapp serves all requests coming to URLs "below" its context root).
After the element path
(see below), the query parameters, if any, are
represented each with an element param
, with an attribute
name
and an attribute value
.
Note that the XPath expression "fn:concat(authority, context-root,
path)
" gives the original URL except the query parameters.
The path is the part of the request URL that comes after the context root (excluding the query parameters). This is thus the part that can vary for a given webapp (for a given webapp, deployed on a specific server at a specific context root, everything up to and including the context root will be always the same). Servlets use regexes to match URLs, and they can give a name to some sub parts of the path matched by the regex (see the definition of the webapp descriptor for all details).
This is represented by having, in the element path
, a sequence of
elements part
and match
. The elements part
are
the non-matched parts of the URL, and the elements match
are the matched
parts of the URL. They appear in the same order as in the URL.
The entire path is the string value of the element path
. Put another
way, concatenating all the part
and match
elements, in
order, gives the value of the path. It is also available as the value of the
attribute path
on the element request
. See
The body of the request, in HTTP parlance the entity content, is not embedded in the
XML element representing the request (that is the web:request
element).
It is instead represented as a standalone item in the sequence representing the
request. There might be several items in case of multipart. Each item can be of one
of the following types:
a document node in case of an XML or HTML media type
an xs:string in case of an textual media type
an xs:base64Binary in case of any other media type
Selecting the media part is done based on the Content-Type header. In case of a multipart type, each part is represented by its own item. Each part's pseudo header Content-Type is used the same way that the header Content-Type is used for the entire content in case of a single part. The content type is used as follows:
An XML media type has a MIME type of text/xml
,
application/xml
, text/xml-external-parsed-entity
, or
application/xml-external-parsed-entity
, as defined in application/xml-dtd
is considered a
text media type). MIME types ending by +xml
are also XML media
types.
An HTML media type has a MIME type of text/html
. The precise
algorithm to parse HTML into an XDM document node is
implementation-defined.
Text media types are the remaining types beginning with text/
.
Binary types are all the other types. An implementation can treat some of those binary types as either an XML, HTML or text media type if it is more appropriate (this is implementation-defined).
If the content type is text/html
, the content is parsed into an XDM
document node (using some HTML parsing algorithm that is out of the scope of this
specification). If the content type is application/xml
, or
text/xml
, or ends with +xml
, the content is parsed into an
XDM document node. If it is a textual content type, it is converted into a text node
(at least all other types starting with text/
, but an implementation can
treat other types as well as being textual). All other content is converted into a
xs:base64Binary item.
The content itself is represented as additional items in the request sequence, but
for each part, there is a web:body
element in the
web:request
element, describing the part. In case of multipart, the
web:body
elements are wrapped into a web:multipart
element.
In case of no mutipart, there is one web:body
element, direct child of
web:request
:
The attribute position
is the position of the body in a multipart (1, 2,
3...) If there is only a single part, it is always 1. The pseudo headers for each part in a
multipart are before the part they relate to. Their attribute body
also
contains the corresponding body position. It is therefore possible to use either
positional grouping (all the consecutive web:header
elements right
before a web:body
element relates to this one), or value-based grouping
(the web:header
elements relate to the web:body
element
with the same position number). The body position number is also the position of the
corresponding content in the request sequence (the web:request
element
is the first item, then comes the body #1, then the body #2, and so on).
Note that the web:header
elements that are children of a
web:multipart
have the same structure as when they are children of
web:request
, except they have the extra attribute body
.
When invoked, the top-level component (or group, or filter, if any) must return a
sequence, the web:response
, the remaining items being the content of the response to
sent back to the client.
The attribute status
is the HTTP status code, and the attribute
message
is the HTTP status message. The elements web:header
represent HTTP header to set in the response, and look like the same elements in the
request.
The elements web:multipart
and web:body
represent the
content to send back to the client, in a similar way as they are represented by the
container in the request. The difference is that the pseudo-headers do not have an
attribute body
, the elements web:body
do noy have an
attribute position
, and they might have an attribute src
,
an attribute charset
and an attribute item-position
, and
they might have content. The content, src
and item-position
are mutually exclusive. If none of them is present, the body is empty.
charset
is the character encoding, for non-binary media types, and is
UTF-8 by default. item-position
is the position within the response
sequence of the item representing the content of the correpsonding part, after the
web:response
element (that is, the first item after the response
element, which is the second item in the sequence, has the position 1, the next one
the position 2, and so on). src
is the location of a file to return (it
is resolved against the base URI of the web:body
element).
For instance, the following example, when returned as the evaluation of a component
(either a servlet, or the top-level filter if there is any), will return a 200
- Ok
to the client, with the header Content-Disposition: file;
filename="archive.zip"
, and the content is the file archive.zip
,
resolved next to the stylesheet that generated the response element (let us say it
has been generated by an XSLT stylesheet that kept the default base URI of the
stylesheet itself, so the file is part of the webapp package itself):
A component is a piece of code, written in one of the supported languages, that is
executed in some specific context. For instance in response to a HTTP request, or as the
action associated to a filter. A component must follow a few rules (for instance the
number or name of parameters), so the web container can call it. A component is declared
in the webapp descriptor. The format of the webapp descriptor is defined in the section
The webapp itsef is packaged as a standard EXPath package (see
An XProc component can be a pipeline, identified by its public import URI:
An XProc component can also be a step type, identified by its QName:
The prefix used in the step
attribute must be declared in scope (in the
example above, the prefix app
must be bound to the namespace of the
step).
Both pipelines and steps must have one input port named source
, which
accepts a sequence of documents, and one output port named result
, which
might return a sequence of documents. When used as an error handler, they must have
an input port called user-data
, which must accepts a sequence of
documents (only of the application use the user data feature of error reporting, to
pass any data from where the error is thrown to the error handler, see
The following is a simple example of a pipeline, returning the XML representation of the HTTP request as the response itself:
An XQuery component can be a main module (also called a "query"), identified by its public import URI:
An XQuery component can also be a function, identified by its QName:
The prefix used in the function
attribute must be declared in scope (in
the example above, the prefix app
must be bound to the namespace of the
function). Because the XQuery library modules are identified by their target
namespace in the package descriptor, the name of the function itself is enough to
identify uniquely the function.
The request sequence (containing the web:request
element and the request
bodies) is passed to the XQuery function through its only parameter, and to the
XQuery module through the external variable named $web:input
(the
web:request
element is also set as the context element for evaluating
the query).
The following is a simple example of a function, returning the XML representation of the HTTP request as the response itself:
An XSLT component can be a stylesheet, identified by its public import URI:
An XSLT component can also be a named template, identified by its QName:
An XSLT component can also be a function, identified by its QName:
The prefix used in both the function
and template
attributes must be declared in scope (in both examples above, the prefix
app
must be bound to the namespace of the template, or of the function
resp.)
The request sequence (containing the web:request
element and the request
bodies) is passed to the XSLT function through its only parameter, to the named
template through the template parameter named web:input
, and to the
stylesheet through the global parameter named web:input
(the
web:request
element is also set as the context element for evaluating
the stylesheet).
The following is a simple example of a stylesheet, returning the XML representation of the HTTP request as the response itself:
An application (sometimes refered to as a webapp) is described by the webapp descriptor, and is composed of servlets, resources and filters, which can be grouped together, declares error handlers, and provides access to some configuration. An application is deployed within a web container, and has a dedicated context root.
When deployed into a web container, a webapp is assigned a context root. A context
root is an absolute URL, which is the root of all URLs served by the webapp. That is,
all URLs served by the webapp start with the same substring: the context root. For
instance, if a webapp is deployed at http://example.org/myapp/
, that
means that all URLs starting with the same string will be dispatched by the container
to that webapp, for instance:
http://example.org/myapp/
http://example.org/myapp/index.html
http://example.org/myapp/catalog/item/XYZ123
http://example.org/myapp/catalog/item/XYZ123/buy
http://example.org/myapp/search?q=brussels%20hotels
Until deployed, a webapp has no context root. All URLs defined within the webapp are
either relative, or absolute (starting with "/
"). When absolute, they
are relative to the context root. For instance, a webapp handling the above URLs will
define how to serve the following URLs:
/
/index.html
/catalog/item/XYZ123
/catalog/item/XYZ123/buy
/search?q=brussels%20hotels
If this webapp is deployed at the above context root, it will then serve the defined URLs. But it can also be deployed on localhost, for instance for test or dev purposes. The set of URLs served by the webapp will be different then, but the set of URLs served, when they are relative to the context root, are the same for a same webapp.
A servlet is the association of a URL pattern to a component. A servlet can be given a name as well, for documentation and error reporting purposes. A servlet is configured in the webapp descriptor like the following:
The attribute name
is the optional name of the servlet. The attribute
filters
is a space-separated list of filters, error handlers and/or
chains. See the following sections for more information on those objects. The order
of those names is important, as the objects are composed from the left-hand side (the
top-most object) to the right-hand side (the bottom-most object, directly around the
servlet). The pseudo element component
xproc
, xquery
or xslt
(any of their variants).
The attribute pattern
must be a valid XML Schema regular expression (see
/
".
When the web container receives an HTTP request, it identifies the webapp to serve the request (depending on their context roots; that mechanism is implementation-defined). Then it tries to match the request URL against all URL patterns in the webapp descriptor, in the document order, and picks the first servlet with a URL pattern matching the request URL. The corresponding component is executed, by passing the request sequence, and the result of the servlet is used to return a response to the client.
The elements match
of the matching URL are linked to the regex groups in
the URL pattern. Because the URL pattern is a regex, matching against it can assign
values to groups (by using parenthesis in the regex). The element match
assigns a symbolic name to a group number. The corresponding values are put into the
web:request
element, more precisely in its child element
web:path
. For example the following servlet definition, associating an
XSLT stylesheet component to a URL pattern, and naming some specific parts of the
URL:
will produce the following path
element when matching a URL ending with
/users/fgeorges
:
A resource is a special type of servlet, defined by the following element:
The pattern
attribute has the same meaning as for servlet
.
A resource is a servlet, so when the URL dispatcher looks sequentially through the
URL patterns for a match, that includes resources as well. If a resource matches the
URL and has a rewrite
attribute, this one is interpreted as a regex
replacement string, and the path used to be resolved is the result of evaluating the
following expression, where $url
is the actual URL being matched:
replace($url,@pattern,@rewrite)
.
The resulting path (either the resolved URL or the replaced string if their is a
rewrite
attribute) is then resolved into the application package. If the
path starts with a slash, this one removed, then the path is resolved against the
webapp package content directory. The resulting file is the body of the response, the
value of the media-type
attribute is the value of the response header
Content-Type, and the response status is "200 - OK
". If the file does
not exists, the response status is "400 - Not Found
" and the content of
the response is implementation-defined.
For instance, the following definitions set the correct Content-Type for CSS
stylesheets and PNG images, which are resolve in the package in the sub-directories
style/
and images/
respectively:
The following example rewrite the URL so /style/print
will resolve in
the webapp package content dir as css/main-print.css
:
A filter has a name and two components, the "
When a filter is attached to a servlet, the input sequence flowing to the servlet
first passes to the filter "
There are several ways to set a filter around a servlet. Actually, filters can be
chained and mixed with error handlers, so a filter can be set around either a
component, a filter, or an error handler. The concept remains the same, the
Error handlers are a special sort of filter. The do not have any fn:error()
in XSLT and XQuery, and to p:error
in XProc, so
a real customized error reporting and handling strategy can be set up). An error
handler is defined by using the following element:
As any filter, an error handler has a name, which is an NCName. It has also a catch
error list, the format of which is defined by the production rule
CatchErrorList
in
Some examples of catch error lists, given the corresponding prefixes are bound in-scope:
"app:XYZ001
": catch a specific error
"*
": catch all errors
"app:*
": catch all errors in the namespace bound to the prefix
"app"
"app:ABC067|app:XYZ001
": catch both errors, explicitly
"app:*|lib:*|err:XYZ001
": catch all errors in the namespace bound
to the prefix "app", and all the errors in the namespace bound to the prefix
"lib", and the error with the QName "err:XYZ001"
"'http://example.org/ns/error':XYZ001
": catch the error with local
name "XYZ001" in the namespace "http://example.org/ns/error"
"'http://example.org/ns/error':*
": catch all errors in the
namespace "http://example.org/ns/error"
A chain is a special type of filter, composing other filters. It is defined using the following element:
The elements error
and filter
, when inside a chain, are
either empty references to a global named error handler or filter, or are the
anoymous equivalent of their global representation (same format, without the
attribute name
). The element chain
, when appearing as a
child, must be a reference to a named chain (so one can compose chains). If the
attribute filters
is not present, there must be at least one child
element. If the attribute filters
is present, the chain
element must be empty. This attribute is a shortcut to set a chain by using only
named filters, and is constructed like with the attribute filters
on the
element servlet
(left-hand name is the top-most filter).
Groups are a way to set the same filters to a group of servlets. A group is a lexical wrapper around several servlet definitions:
As we see, groups can be nested. They have a filter chain (using the same attribute
filters
as the element servlet
), and contain servlets and
other groups. The only effect is to add their filter chain to each object they
contain. Their filters are added right before the object own filters (so after any
application filter chain, and after any parent group filter chain).
For instance, in the following example:
the servlet "un" is wrapped by the filter chain "first second third fourth" (in that order), "deux" is wrapped by "first second fifth", and "trois" by "first second fifth sixth".
The following functions provide access to configuration parameters and documents. The webapp author can define any parameter and document he/she needs, and access them by name.
The param $name
must be a lexical QName, resolved against the static
context. Depending on the webapp specific needs, the easiest way to define
configuration information might be using string parameters, or using documents. In
either case, the config is accessible in read-only. This mechanism does not replace a
database. The way the config parameters and documents are provided is
implementation-defined.
Configuration parameters and documents can be declared in the webapp descriptor. That is, the name of the parameters and documents the application expect to use at some point can be declared. Declaring their names can help detecting typos in the code. If an application tries to access an undeclared configuration parameter or document, the implementation may throw an error (whether it does or not is implementation-defined). An implementation should provide a way to toggle this error throwing at the server level or on a per-webapp basis. Configuration parameters and documents can be declared using the following element in the webapp descriptor:
The application as a whole is represented using the following element:
This element provides a convenient way to set filters to all servlets and resources within the application (like an error-handler, a view layer, or an authentication filter).
This section defines the webapp descriptor expath-web.xml
. Only the overall
structure of the descriptor is given, referencing to elements defined in the previous
section,
The root element of the descriptor is webapp
:
A webapp is identified by a name, an abbrev and a version number (all three following
the same rules as their corresponding attributes in spec
is the version of the packaging specification the package conforms to.
The current specification requires the package to use the spec number 1.0
(no forward compatibility rules are defined, a processor conforming to this
specification has to generate an error if the spec number is different than the string
1.0
). The title is a plain string, and the home is a URL pointing to the
webapp's or project's homepage or website. The rest of the webapp element is composed of
any combination of the elements application
, error
,
filter
, group
and servlet
.
The following is an example of a descriptor, for a simple webapp with 2 components: an
XProc pipeline and an XQuery query, respectively bound to the URLs patterns /home
and /users/([a-z0-9]+)
, as well as a filter on the second one and an error
handler for the entire application (it also defines a few resource patterns):
A webapp is packaged using expath-web.xml
, at the root of the package.
For instance, the above descriptor example could define a webapp packaged like the following:
Note that there the name of the files for the components do not necessarily match the
last part of their public URI. The components (including the servlets and the components
used in filters and error handlers) must be declared in the package descriptor, that is
expath-pkg.xml
.
Those functions are defined in the namespace web
.
A webapp can access and store values, attached to different contexts: the container, the webapp, the session, and the request. The request is the current HTTP request being served, the session is the session of the request, the webapp is the one containing the component serving the request, and the container represents the webapp container as a whole.
A field has a name (a string) and a value (any sequence). The names beginning with "web:" are reserved for this specification. Note that "web:" is not a namespace prefix, and then does not need to be bound by any mean, field names are simple strings.
Each set of fields has three corresponding functions: one to get the value of a field given its name, one to set its value, and one to enumerate the names of all existing fields.
For the request fields:
For the session fields:
For the webapp fields:
For the container fields:
Note that the container field values are sequences of strings, not any sequence. This is due to the fact that those fields can be used to exchange information across several applications, and could either make the implementation unnecessarily complex or impact greatly the performance (like requiring to serialize a whole document if we try to set it as the value of a field, then parse it again every time it is retrieved), because different applictions could be handled by different processors using incompatible data model implementations.
The request contains the following field:
web:request-id
: a string identifying uniquely the request being
treated. The string macthes the regex "[-_a-zA-Z0-9]+" and is therefore safe to
use in a file name. The strings are ordered through time, that is comparing two
request IDs using the code point collation must order them occordingly to the
time they were received by the server. The precision is implementation-defined
(that is, an implementation might say that the order between 2 request IDs for
2 requests received within the same second is undefined).
The container contains the following fields:
web:product
: the implementation (usually a product name and a
version number).
web:product-html
: a HTML formatted version of the property
web:product
. The HTML must be valid HTML to be the content of a
p
or a div
element (usually it is the same as
web:product
, with additional hyperlinks).
web:vendor
: the vendor of the product (usually a person, a company
or an open-source group).
web:vendor-html
: a HTML formatted version of the property
web:vendor
. The HTML must be valid HTML to be the content of a
p
or a div
element (usually it is the same as
web:vendor
, with additional hyperlinks).
This function parses the value of a structured HTTP header. It returns it in the form of the following element:
Some HTTP headers have values that can be decomposed into multiple elements. In order to be processed by this function, such headers must be in the following form:
Any amount of white space is allowed between any part of the header, element or param
and is ignored. The function returns the following format, representing elements and
params with an XML format. A missing value in any element or param will be returned
as an empty attribute value
; if the "=" is also missing, the attribute
value
is not generated at all.
For example, the following call:
returns the following element:
The user can then easily access the several elements (here the different MIME types), as well as their parameters (here their optional relative quality factors).
This function parses the value of a Basic Access Authentication header, as defined in
The value of the header is "Basic abc" where abc
is "user:password"
encoded using base64.
This section documents an optional feature: a set of functions to manage webapps (deploy a new webapp, remove a deployed webapp, configure a webapp, list all the deployed webapps in a web container, etc.)
TODO: ...
This section documents an optional feature: the format of an on-disk repository. It builds on the repository format from the packaging specification, by adding web-specific features.
TODO: ...
Definition: define the name
Definition: define the format of the several
The web:response
...)
The
In addition to web:parse-basic-auth()
, add a function to support
Define some user management machanism (plus identification, authentification, etc.).