Web Applications w3c-designation EXPath Candidate Module nn Xxx 2013 XML Florent Georges H2O Consulting

Copyright © 2011-2013 Florent Georges, published by the EXPath Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.

This specification was published by the EXPath Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

This specification defines how to write web applications on server-side, using XML technologies like XSLT, XQuery and XProc. It also defines their execution context, as well as some functions they can use. Last but not least it defines how to package such webapps, by using the .

Must be ignored, but is required by the schema...

langusage

revisiondesc

Introduction Namespace conventions

The webapp descriptor uses the namespace http://expath.org/ns/webapp, as the default namespace. This namespace is also used for the XML representation of the HTTP requests and responses, and to define several functions provided by the webapp container. In this document, the web prefix, when used, is bound to this namespace URI.

Error codes are defined in the namespace http://expath.org/ns/error. In this document, the err prefix, when used, is bound to this namespace URI.

Containers and webapps

A web application, or webapp, in this specification, is a set of components, implementing an application. The application responds to HTTP requests and runs in a webapp container. The container provides the context of execution for the webapps, provides them with some functions, and is responsible for translating from and to HTTP. When the container receives an HTTP request, it identifies the corresponding component to process it (e.g. based on the request URL), builds an XML representation of the request, and calls the component. The component receives the HTTP request as an XML document, and returns an XML description of the HTTP response to send back to the client. The container then translates this XML document and sends the corresponding HTTP response.

A component is a piece of XSLT, XQuery or XProc. Each type of component defines how the request is passed to and the response is returned by such a component. But the XML format of the requests and responses is always the same. The existing types of components are:

an XSLT stylesheet

an XSLT function

an XSLT template

an XQuery query

an XQuery function

an XProc pipeline

an XProc step

A webapp contains a descriptor, which defines how to dispatch a request to a specific component. The dispatching mechanism is based on the request URL, by associating a URL pattern to a component public URI (a component public URI is defined in the , basically it is the component name, as an absolute URI). The association of a URL pattern to a component is called a servlet. Every servlet has a name.

A webapp is installed at a specific context root. The context root is a path prefix, and all the requests received "below" that prefix are served by that webapp. For instance, a webapp installed on example.org at the context root /somewhere will serve all requests with a URL starting with http://example.org/somewhere/. The part after that prefix is the path to the servlet.

Requests and responses

The HTTP requests and responses are represented as XML documents. The request is built by the container to represent the actual HTTP request received. The component returns a representation of the response, used by the container to actually respond to the client over HTTP. A sample request:

<request servlet="some" path="/some/resource" method="get" xmlns="http://expath.org/ns/webapp"> <url>http://example.org/myapp/some/resource</url> <authority>http://example.org</authority> <context-root>/myapp</context-root> <path> <part>/some/</part> <match name="rsrc">resource</match> </path> <header name="host" value="example.org"/> <header name="user-agent" value="Firefox/7.0.1"/> <header name="accept" value="text/html,application/xml;q=0.9,*/*;q=0.8"/> <header name="accept-language" value="en-us,en;q=0.5"/> </request>

This request can be built by a container listening at http://example.org/, when it receives a request to GET the resource at /myapp/some/resource. We can see the path has been pre-analyzed, and the request contains various HTTP information like the method, the request URL (decomposed in different ways), and the headers. The request might also contain an entity content, also known as the body of the request (e.g. in case of a PUT request).

In response to the above request, the invoked component could return the following response to the container:

<response status="200" message="Ok" xmlns="http://expath.org/ns/webapp"> <header name="X-My-Header" value="Just an example."/> <body content-type="application/xml"> <hello>World!</hello> </body> </response>

This tells the container to return the XML document <hello>World!</hello> to the client, using the Content-Type "application/xml", with the HTTP status code 200, and an extra header.

Requests

A HTTP request is represented by a sequence, the request sequence. The first item in that sequence is an element web:request, and the remaining items represent the entity content (there might be several of them in case of multipart). The web:request element is defined as the following:

<request servlet = NCName path = string method = NCName> url, authority, context-root, path, param*, header*, (body| multipart)? </request>

A request contains the name of the matched servlet, the request path, and its method (also known as the HTTP verb, like GET and POST) in lower case. The text element url contains the original request URL, including its query parameters if any. The URL also appears cut down into several pieces. The element authority is its first part, including the URL scheme and the domain name, context-root is the webapp context root, path is an alternative representation of the requested path (as in the attribute path), where some specific parts have been analyzed, and the elements param represent the query parameters.

Then come the HTTP header elements, each with a name and a value. Then the content of the HTTP entity (e.g. for a PUT), as an element body or multipart, depending on the content type of the request.

<header name = string value = string> empty <header>

URL

The URL appears at several places in the request element, under different shapes. The element url is the original URL as typed by the user. Or at least it is an educated guess of what it could be, as HTTP does not include the original URL in the request (the port number for instance is not in the HTTP request).

The element authority is the first part of the URL, including the URL scheme and the domain name (up to the slash first, but not including it). The scheme can be either "http:" or "https:". It is then followed by two slashes then the domain name.

The element context-root is the webapp context root on this server. It is fixed for the webpp, and represents where the webapp has been "installed" on the server (the webapp serves all requests coming to URLs "below" its context root).

After the element path (see below), the query parameters, if any, are represented each with an element param, with an attribute name and an attribute value.

<param name = string value = string> empty <param>

Note that the XPath expression "fn:concat(authority, context-root, path)" gives the original URL except the query parameters.

Path

The path is the part of the request URL that comes after the context root (excluding the query parameters). This is thus the part that can vary for a given webapp (for a given webapp, deployed on a specific server at a specific context root, everything up to and including the context root will be always the same). Servlets use regexes to match URLs, and they can give a name to some sub parts of the path matched by the regex (see the definition of the webapp descriptor for all details).

This is represented by having, in the element path, a sequence of elements part and match. The elements part are the non-matched parts of the URL, and the elements match are the matched parts of the URL. They appear in the same order as in the URL.

<path> (part| match)+ </path> <part> string </part> <match name = NCName> string </part>

The entire path is the string value of the element path. Put another way, concatenating all the part and match elements, in order, gives the value of the path. It is also available as the value of the attribute path on the element request. See for an example.

Content

The body of the request, in HTTP parlance the entity content, is not embedded in the XML element representing the request (that is the web:request element). It is instead represented as a standalone item in the sequence representing the request. There might be several items in case of multipart. Each item can be of one of the following types:

a document node in case of an XML or HTML media type

an xs:string in case of an textual media type

an xs:base64Binary in case of any other media type

Selecting the media part is done based on the Content-Type header. In case of a multipart type, each part is represented by its own item. Each part's pseudo header Content-Type is used the same way that the header Content-Type is used for the entire content in case of a single part. The content type is used as follows:

An XML media type has a MIME type of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity, as defined in (except that application/xml-dtd is considered a text media type). MIME types ending by +xml are also XML media types.

An HTML media type has a MIME type of text/html. The precise algorithm to parse HTML into an XDM document node is implementation-defined.

Text media types are the remaining types beginning with text/.

Binary types are all the other types. An implementation can treat some of those binary types as either an XML, HTML or text media type if it is more appropriate (this is implementation-defined).

If the content type is text/html, the content is parsed into an XDM document node (using some HTML parsing algorithm that is out of the scope of this specification). If the content type is application/xml, or text/xml, or ends with +xml, the content is parsed into an XDM document node. If it is a textual content type, it is converted into a text node (at least all other types starting with text/, but an implementation can treat other types as well as being textual). All other content is converted into a xs:base64Binary item.

The content itself is represented as additional items in the request sequence, but for each part, there is a web:body element in the web:request element, describing the part. In case of multipart, the web:body elements are wrapped into a web:multipart element. In case of no mutipart, there is one web:body element, direct child of web:request:

<multipart> (header*, body)+ </multipart> <header body = integer name = string value = string> empty <header> <body position = integer content-type = string> empty <body>

The attribute position is the position of the body in a multipart (1, 2, 3...) If there is only a single part, it is always 1. The pseudo headers for each part in a multipart are before the part they relate to. Their attribute body also contains the corresponding body position. It is therefore possible to use either positional grouping (all the consecutive web:header elements right before a web:body element relates to this one), or value-based grouping (the web:header elements relate to the web:body element with the same position number). The body position number is also the position of the corresponding content in the request sequence (the web:request element is the first item, then comes the body #1, then the body #2, and so on).

Note that the web:header elements that are children of a web:multipart have the same structure as when they are children of web:request, except they have the extra attribute body.

Responses

When invoked, the top-level component (or group, or filter, if any) must return a sequence, the response sequence. The first item must be an element web:response, the remaining items being the content of the response to sent back to the client.

<response status = integer message = string> header*, (body| multipart)? </response>

The attribute status is the HTTP status code, and the attribute message is the HTTP status message. The elements web:header represent HTTP header to set in the response, and look like the same elements in the request.

<multipart> (header*, body)+ </multipart> <body item-position = integer? src = uri? charset = string? content-type = string> optional content </body>

The elements web:multipart and web:body represent the content to send back to the client, in a similar way as they are represented by the container in the request. The difference is that the pseudo-headers do not have an attribute body, the elements web:body do noy have an attribute position, and they might have an attribute src, an attribute charset and an attribute item-position, and they might have content. The content, src and item-position are mutually exclusive. If none of them is present, the body is empty. charset is the character encoding, for non-binary media types, and is UTF-8 by default. item-position is the position within the response sequence of the item representing the content of the correpsonding part, after the web:response element (that is, the first item after the response element, which is the second item in the sequence, has the position 1, the next one the position 2, and so on). src is the location of a file to return (it is resolved against the base URI of the web:body element).

For instance, the following example, when returned as the evaluation of a component (either a servlet, or the top-level filter if there is any), will return a 200 - Ok to the client, with the header Content-Disposition: file; filename="archive.zip", and the content is the file archive.zip, resolved next to the stylesheet that generated the response element (let us say it has been generated by an XSLT stylesheet that kept the default base URI of the stylesheet itself, so the file is part of the webapp package itself):

<response status="200" message="Ok"> <header name="Content-Disposition" value="file; filename=&quot;archive.zip&quot;"/> <body content-type="application/zip" src="archive.zip"/> </response>
Components

A component is a piece of code, written in one of the supported languages, that is executed in some specific context. For instance in response to a HTTP request, or as the action associated to a filter. A component must follow a few rules (for instance the number or name of parameters), so the web container can call it. A component is declared in the webapp descriptor. The format of the webapp descriptor is defined in the section , but the corresponding config for each single type of component is described here in more detail.

The webapp itsef is packaged as a standard EXPath package (see ). Each component must be identifiable through the package descriptor (e.g. an XSLT stylesheet must have a public import URI configured in the package, or the namespace of the XQuery library module containing the function used as a component must be declared in the package as well).

XProc

An XProc component can be a pipeline, identified by its public import URI:

<xproc uri="http://example.org/ns/my-webapp/pipeline.xproc"/>

An XProc component can also be a step type, identified by its QName:

<xproc uri="http://example.org/ns/my-webapp/step-library.xproc" step="app:my-step"/>

The prefix used in the step attribute must be declared in scope (in the example above, the prefix app must be bound to the namespace of the step).

Both pipelines and steps must have one input port named source, which accepts a sequence of documents, and one output port named result, which might return a sequence of documents. When used as an error handler, they must have an input port called user-data, which must accepts a sequence of documents (only of the application use the user data feature of error reporting, to pass any data from where the error is thrown to the error handler, see for details).

The following is a simple example of a pipeline, returning the XML representation of the HTTP request as the response itself:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:web="http://expath.org/ns/webapp" xmlns:app="http://example.org/ns/my-webapp" xmlns:pkg="http://expath.org/ns/pkg" pkg:import-uri="http://example.org/my-webapp/echo.xproc" name="main" version="1.0"> <p:input port="source" primary="true" sequence="true"/> <p:output port="result" primary="true" sequence="true"> <p:pipe step="response" port="result"/> <p:pipe step="main" port="source"/> </p:output> <!-- Echoes the web:request element, ignores the bodies if any. --> <p:identity name="response"> <p:input port="source"> <p:inline> <web:response status="200" message="Ok"> <web:body content-type="application/xml"/> </web:response> </p:inline> </p:input> </p:identity> </p:declare-step>
XQuery

An XQuery component can be a main module (also called a "query"), identified by its public import URI:

<xquery uri="http://example.org/ns/my-webapp/query.xq"/>

An XQuery component can also be a function, identified by its QName:

<xquery function="app:my-function"/>

The prefix used in the function attribute must be declared in scope (in the example above, the prefix app must be bound to the namespace of the function). Because the XQuery library modules are identified by their target namespace in the package descriptor, the name of the function itself is enough to identify uniquely the function.

The request sequence (containing the web:request element and the request bodies) is passed to the XQuery function through its only parameter, and to the XQuery module through the external variable named $web:input (the web:request element is also set as the context element for evaluating the query).

The following is a simple example of a function, returning the XML representation of the HTTP request as the response itself:

xquery version "1.0"; module namespace app = "http://example.org/ns/my-webapp"; (:~ : Echoes the web:request element, ignores the bodies if any. :) declare function app:echo($input as item()+) as element()+ { <web:response status="200" message="Ok"> <web:body content-type="application/xml"/> </web:response> , $input[1] };
XSLT

An XSLT component can be a stylesheet, identified by its public import URI:

<xslt uri="http://example.org/ns/my-webapp/stylesheet.xsl"/>

An XSLT component can also be a named template, identified by its QName:

<xslt uri="http://example.org/ns/my-webapp/stylesheet.xsl" template="app:my-template"/>

An XSLT component can also be a function, identified by its QName:

<xslt uri="http://example.org/ns/my-webapp/stylesheet.xsl" function="app:my-function"/>

The prefix used in both the function and template attributes must be declared in scope (in both examples above, the prefix app must be bound to the namespace of the template, or of the function resp.)

The request sequence (containing the web:request element and the request bodies) is passed to the XSLT function through its only parameter, to the named template through the template parameter named web:input, and to the stylesheet through the global parameter named web:input (the web:request element is also set as the context element for evaluating the stylesheet).

The following is a simple example of a stylesheet, returning the XML representation of the HTTP request as the response itself:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:pkg="http://expath.org/ns/pkg" xmlns:web="http://expath.org/ns/webapp" exclude-result-prefixes="#all" version="2.0"> <pkg:import-uri>http://example.org/ns/my-webapp/echo.xsl</pkg:import-uri> <xsl:template match="/"> <web:response status="200" message="Ok"> <web:body content-type="application/xml"/> </web:response> <xsl:sequence select="."/> </xsl:template> </xsl:stylesheet>
Webapps

An application (sometimes refered to as a webapp) is described by the webapp descriptor, and is composed of servlets, resources and filters, which can be grouped together, declares error handlers, and provides access to some configuration. An application is deployed within a web container, and has a dedicated context root.

Context root

When deployed into a web container, a webapp is assigned a context root. A context root is an absolute URL, which is the root of all URLs served by the webapp. That is, all URLs served by the webapp start with the same substring: the context root. For instance, if a webapp is deployed at http://example.org/myapp/, that means that all URLs starting with the same string will be dispatched by the container to that webapp, for instance:

http://example.org/myapp/

http://example.org/myapp/index.html

http://example.org/myapp/catalog/item/XYZ123

http://example.org/myapp/catalog/item/XYZ123/buy

http://example.org/myapp/search?q=brussels%20hotels

Until deployed, a webapp has no context root. All URLs defined within the webapp are either relative, or absolute (starting with "/"). When absolute, they are relative to the context root. For instance, a webapp handling the above URLs will define how to serve the following URLs:

/

/index.html

/catalog/item/XYZ123

/catalog/item/XYZ123/buy

/search?q=brussels%20hotels

If this webapp is deployed at the above context root, it will then serve the defined URLs. But it can also be deployed on localhost, for instance for test or dev purposes. The set of URLs served by the webapp will be different then, but the set of URLs served, when they are relative to the context root, are the same for a same webapp.

Servlets

A servlet is the association of a URL pattern to a component. A servlet can be given a name as well, for documentation and error reporting purposes. A servlet is configured in the webapp descriptor like the following:

<servlet name = NCName? filters = NCNames?> component, url </servlet> <url pattern = string> match* </url> <match group = integer name = NCName> empty </match>

The attribute name is the optional name of the servlet. The attribute filters is a space-separated list of filters, error handlers and/or chains. See the following sections for more information on those objects. The order of those names is important, as the objects are composed from the left-hand side (the top-most object) to the right-hand side (the bottom-most object, directly around the servlet). The pseudo element component is any of the existing component types defined in , that is either xproc, xquery or xslt (any of their variants). The attribute pattern must be a valid XML Schema regular expression (see ), matching a string starting with "/".

When the web container receives an HTTP request, it identifies the webapp to serve the request (depending on their context roots; that mechanism is implementation-defined). Then it tries to match the request URL against all URL patterns in the webapp descriptor, in the document order, and picks the first servlet with a URL pattern matching the request URL. The corresponding component is executed, by passing the request sequence, and the result of the servlet is used to return a response to the client.

The elements match of the matching URL are linked to the regex groups in the URL pattern. Because the URL pattern is a regex, matching against it can assign values to groups (by using parenthesis in the regex). The element match assigns a symbolic name to a group number. The corresponding values are put into the web:request element, more precisely in its child element web:path. For example the following servlet definition, associating an XSLT stylesheet component to a URL pattern, and naming some specific parts of the URL:

<servlet name="user"> <xslt uri="http://example.org/myapp/users.xsl"/> <url pattern="/users/([a-z0-9]+)"> <match group="1" name="id"/> </url> </servlet>

will produce the following path element when matching a URL ending with /users/fgeorges:

<path> <part>/users/</part> <match name="id">fgeorges</match> </path>
Resources

A resource is a special type of servlet, defined by the following element:

<resource pattern = string rewrite = string? media-type = string> empty </resource>

The pattern attribute has the same meaning as for servlet. A resource is a servlet, so when the URL dispatcher looks sequentially through the URL patterns for a match, that includes resources as well. If a resource matches the URL and has a rewrite attribute, this one is interpreted as a regex replacement string, and the path used to be resolved is the result of evaluating the following expression, where $url is the actual URL being matched: replace($url,@pattern,@rewrite).

The resulting path (either the resolved URL or the replaced string if their is a rewrite attribute) is then resolved into the application package. If the path starts with a slash, this one removed, then the path is resolved against the webapp package content directory. The resulting file is the body of the response, the value of the media-type attribute is the value of the response header Content-Type, and the response status is "200 - OK". If the file does not exists, the response status is "400 - Not Found" and the content of the response is implementation-defined.

For instance, the following definitions set the correct Content-Type for CSS stylesheets and PNG images, which are resolve in the package in the sub-directories style/ and images/ respectively:

<resource pattern="/style/.+\.css" media-type="text/css"/> <resource pattern="/images/.+\.png" media-type="image/png"/>

The following example rewrite the URL so /style/print will resolve in the webapp package content dir as css/main-print.css:

<resource pattern="/style/(.+)" rewrite="css/main-$1.css" media-type="text/css"/>
Filters

A filter has a name and two components, the "in" and the "out" component. Both components are optional, but at least one is required:

<filter name = NCName> in?, out? </filter> <in> component </in> <out> component </out>

When a filter is attached to a servlet, the input sequence flowing to the servlet first passes to the filter "in" component if any. The input sequence becomes the input sequence of the "in" component, and the result of evaluating it becomes the input sequence of the servlet component. The result of evaluating the servlet component passes then through the filter "out" component, if any. The result of the servlet becomes the "out" component input sequence, and the result of evaluating it becomes the result returned to the container.

There are several ways to set a filter around a servlet. Actually, filters can be chained and mixed with error handlers, so a filter can be set around either a component, a filter, or an error handler. The concept remains the same, the in and out components are executed before and after the wrapped object.

Error handling

Error handlers are a special sort of filter. The do not have any in nor out component, so they are like pass-through with regards of the inputs and outputs. However, if an error is thrown in the wrapped object, it offers a way to catch it and so to compute an alternative result sequence. This implementation of an error handler is a user-provided component, which is passed several information about the error (the user has the opportunity to pass all sort of data to fn:error() in XSLT and XQuery, and to p:error in XProc, so a real customized error reporting and handling strategy can be set up). An error handler is defined by using the following element:

<error name = NCName catch = CatchErrorList> component </error>

As any filter, an error handler has a name, which is an NCName. It has also a catch error list, the format of which is defined by the production rule CatchErrorList in . The catch error list is a '|'-separated list of QName wildcards, that is where you can substitute either the prefix, the local name or the whole name by a '*' (meaning "any value here"). In addition it allows using EQNames for explicitly setting the namespace URI in the expression, without having to bind it to a prefix. Such a list therefore identifies a set of matching QNames. When an error is thrown by a component and go up the stack through the error handler, it is caught is its code matches the catch error list. In that case the error is caught, the component is executed, and its result becomes the result sequence of the error handler.

Some examples of catch error lists, given the corresponding prefixes are bound in-scope:

"app:XYZ001": catch a specific error

"*": catch all errors

"app:*": catch all errors in the namespace bound to the prefix "app"

"app:ABC067|app:XYZ001": catch both errors, explicitly

"app:*|lib:*|err:XYZ001": catch all errors in the namespace bound to the prefix "app", and all the errors in the namespace bound to the prefix "lib", and the error with the QName "err:XYZ001"

"'http://example.org/ns/error':XYZ001": catch the error with local name "XYZ001" in the namespace "http://example.org/ns/error"

"'http://example.org/ns/error':*": catch all errors in the namespace "http://example.org/ns/error"

Chains

A chain is a special type of filter, composing other filters. It is defined using the following element:

<chain name = NCName filters = NCNames?> (chain |error |filter)* </chain> <chain ref = NCName> empty </chain> <error ref = NCName> empty </error> <error catch = CatchErrorList> component </error> <filter ref = NCName> empty </filter> <filter> in?, out? </filter>

The elements error and filter, when inside a chain, are either empty references to a global named error handler or filter, or are the anoymous equivalent of their global representation (same format, without the attribute name). The element chain, when appearing as a child, must be a reference to a named chain (so one can compose chains). If the attribute filters is not present, there must be at least one child element. If the attribute filters is present, the chain element must be empty. This attribute is a shortcut to set a chain by using only named filters, and is constructed like with the attribute filters on the element servlet (left-hand name is the top-most filter).

Groups

Groups are a way to set the same filters to a group of servlets. A group is a lexical wrapper around several servlet definitions:

<group filters = NCNames?> (group |servlet)+ </group>

As we see, groups can be nested. They have a filter chain (using the same attribute filters as the element servlet), and contain servlets and other groups. The only effect is to add their filter chain to each object they contain. Their filters are added right before the object own filters (so after any application filter chain, and after any parent group filter chain).

For instance, in the following example:

<group filters="first second"> <servlet name="un" filters="third fourth"> ... </servlet> <group filters="fifth"> <servlet name="deux"> ... </servlet> <servlet name="trois" filters="sixth"> ... </servlet> </servlet> </group>

the servlet "un" is wrapped by the filter chain "first second third fourth" (in that order), "deux" is wrapped by "first second fifth", and "trois" by "first second fifth sixth".

TODO: Introduce a diagram to depict the example.

Config

The following functions provide access to configuration parameters and documents. The webapp author can define any parameter and document he/she needs, and access them by name.

web:config-param($name as xs:string) as xs:string? web:config-param($name as xs:string, $default as xs:string?) as xs:string? web:config-doc($name as xs:string) as document-node()?

The param $name must be a lexical QName, resolved against the static context. Depending on the webapp specific needs, the easiest way to define configuration information might be using string parameters, or using documents. In either case, the config is accessible in read-only. This mechanism does not replace a database. The way the config parameters and documents are provided is implementation-defined.

Configuration parameters and documents can be declared in the webapp descriptor. That is, the name of the parameters and documents the application expect to use at some point can be declared. Declaring their names can help detecting typos in the code. If an application tries to access an undeclared configuration parameter or document, the implementation may throw an error (whether it does or not is implementation-defined). An implementation should provide a way to toggle this error throwing at the server level or on a per-webapp basis. Configuration parameters and documents can be declared using the following element in the webapp descriptor:

<config> <param name="my:header-color"/> <param name="my:header-size"/> <doc name="my:footer"/> </config>
Applications

The application as a whole is represented using the following element:

<application filters = NCNames?> empty </application>

This element provides a convenient way to set filters to all servlets and resources within the application (like an error-handler, a view layer, or an authentication filter).

Webapp descriptor

This section defines the webapp descriptor expath-web.xml. Only the overall structure of the descriptor is given, referencing to elements defined in the previous section, . See the section for a formal schema of the entire descriptor format.

The root element of the descriptor is webapp:

<webapp name = uri abbrev = NCName version = string spec = string> title, home?, (application |chain |config |error |filter |group |resource |servlet)+ </webapp> <title> string </title> <home> uri </home>

A webapp is identified by a name, an abbrev and a version number (all three following the same rules as their corresponding attributes in ). The attribute spec is the version of the packaging specification the package conforms to. The current specification requires the package to use the spec number 1.0 (no forward compatibility rules are defined, a processor conforming to this specification has to generate an error if the spec number is different than the string 1.0). The title is a plain string, and the home is a URL pointing to the webapp's or project's homepage or website. The rest of the webapp element is composed of any combination of the elements application, error, filter, group and servlet.

The following is an example of a descriptor, for a simple webapp with 2 components: an XProc pipeline and an XQuery query, respectively bound to the URLs patterns /home and /users/([a-z0-9]+), as well as a filter on the second one and an error handler for the entire application (it also defines a few resource patterns):

<webapp xmlns="http://expath.org/ns/webapp" xmlns:app="http://example.org/ns/someapp" name="http://example.org/someapp" abbrev="someapp" version="2.6.0"> <title>Example website</title> <resource pattern="/style/.+\.css" media-type="text/css"/> <resource pattern="/images/.+\.png" media-type="image/png"/> <application filters="errors"/> <filter name="authenticate"> <in> <xslt uri="http://example.org/someapp/authenticate.xsl"/> </in> </filter> <error name="errors" catch="*"> <xproc uri="http://example.org/someapp/errors.xproc"/> </error> <!-- The homepage. --> <servlet name="home"> <xproc uri="http://example.org/someapp/home.xproc"/> <url pattern="/home"/> </servlet> <!-- The users. The user ID in the URL is set to the param "user". --> <servlet name="users" filters="authenticate"> <xquery uri="http://example.org/someapp/users.xq"/> <url pattern="/users/([a-z0-9]+)"> <match group="1" name="user"/> </url> </servlet> </webapp>
Packaging

A webapp is packaged using . It is therefore a regular package and can benefit from all the tools supporting this specification. In addition to the standard package format, a webapp contain the webapp descriptor in a file called expath-web.xml, at the root of the package.

For instance, the above descriptor example could define a webapp packaged like the following:

expath-pkg.xml expath-web.xml content/ filters/ auth.xsl errors.xproc images/ logo.png servlets/ home.xproc users.xq style/ style.css

Note that there the name of the files for the components do not necessarily match the last part of their public URI. The components (including the servlets and the components used in filters and error handlers) must be declared in the package descriptor, that is expath-pkg.xml.

Webapp functions

Those functions are defined in the namespace web.

Fields

A webapp can access and store values, attached to different contexts: the container, the webapp, the session, and the request. The request is the current HTTP request being served, the session is the session of the request, the webapp is the one containing the component serving the request, and the container represents the webapp container as a whole.

A field has a name (a string) and a value (any sequence). The names beginning with "web:" are reserved for this specification. Note that "web:" is not a namespace prefix, and then does not need to be bound by any mean, field names are simple strings.

Each set of fields has three corresponding functions: one to get the value of a field given its name, one to set its value, and one to enumerate the names of all existing fields.

For the request fields:

web:get-request-field($name as xs:string) as item()* web:set-request-field($name as xs:string, $value as item()*) as empty-sequence() web:get-request-field-names() as xs:string*

For the session fields:

web:get-session-field($name as xs:string) as item()* web:set-session-field($name as xs:string, $value as item()*) as empty-sequence() web:get-session-field-names() as xs:string*

For the webapp fields:

web:get-webapp-field($name as xs:string) as item()* web:set-webapp-field($name as xs:string, $value as item()*) as empty-sequence() web:get-webapp-field-names() as xs:string*

For the container fields:

web:get-container-field($name as xs:string) as xs:string* web:set-container-field($name as xs:string, $value as xs:string*) as empty-sequence() web:get-container-field-names() as xs:string*

Note that the container field values are sequences of strings, not any sequence. This is due to the fact that those fields can be used to exchange information across several applications, and could either make the implementation unnecessarily complex or impact greatly the performance (like requiring to serialize a whole document if we try to set it as the value of a field, then parse it again every time it is retrieved), because different applictions could be handled by different processors using incompatible data model implementations.

The request contains the following field:

web:request-id: a string identifying uniquely the request being treated. The string macthes the regex "[-_a-zA-Z0-9]+" and is therefore safe to use in a file name. The strings are ordered through time, that is comparing two request IDs using the code point collation must order them occordingly to the time they were received by the server. The precision is implementation-defined (that is, an implementation might say that the order between 2 request IDs for 2 requests received within the same second is undefined).

The container contains the following fields:

web:product: the implementation (usually a product name and a version number).

web:product-html: a HTML formatted version of the property web:product. The HTML must be valid HTML to be the content of a p or a div element (usually it is the same as web:product, with additional hyperlinks).

web:vendor: the vendor of the product (usually a person, a company or an open-source group).

web:vendor-html: a HTML formatted version of the property web:vendor. The HTML must be valid HTML to be the content of a p or a div element (usually it is the same as web:vendor, with additional hyperlinks).

Parse HTTP web:parse-header-value($header as xs:string) as element(web:header)

This function parses the value of a structured HTTP header. It returns it in the form of the following element:

Some HTTP headers have values that can be decomposed into multiple elements. In order to be processed by this function, such headers must be in the following form:

header = [ element ] *( "," [ element ] ) element = name [ "=" [ value ] ] *( ";" [ param ] ) param = name [ "=" [ value ] ] name = token value = ( token | quoted-string ) token = 1*<any char except "=", ",", ";", <"> and white space> quoted-string = <"> *( text | quoted-char ) <"> text = any char except <"> quoted-char = "\" char

Any amount of white space is allowed between any part of the header, element or param and is ignored. The function returns the following format, representing elements and params with an XML format. A missing value in any element or param will be returned as an empty attribute value; if the "=" is also missing, the attribute value is not generated at all.

<header> element+ </header> <element name = string> param* </element> <param name = string value = string> empty </param>

For example, the following call:

web:parse-header-value('text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')

returns the following element:

<header> <element name="text/html"/> <element name="application/xhtml+xml"/> <element name="application/xml"> <param name="q" value="0.9"/> </element> <element name="*/*"> <param name="q" value="0.8"/> </element> </header>

The user can then easily access the several elements (here the different MIME types), as well as their parameters (here their optional relative quality factors).

web:parse-basic-auth($header as xs:string) as element(web:basic-auth)

This function parses the value of a Basic Access Authentication header, as defined in . It returns it in the form of the following element:

<basic-auth username = string password = string> empty </basic-auth>

The value of the header is "Basic abc" where abc is "user:password" encoded using base64.

Webapp management

This section documents an optional feature: a set of functions to manage webapps (deploy a new webapp, remove a deployed webapp, configure a webapp, list all the deployed webapps in a web container, etc.)

TODO: ...

Repository

This section documents an optional feature: the format of an on-disk repository. It builds on the repository format from the packaging specification, by adding web-specific features.

TODO: ...

<webapps> <webapp root="myapp"> <package name="http://example.org/my/webapp"/> </webapp> <webapp root="catalog"> <package name="http://example.com/catalog-project/application"/> </webapp> ... </webapps>
Webapp schema References Packaging System. Florent Georges, editor. EXPath. 9 May 2012. RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach and T. Berners-Lee, editors. Internet Engineering Task Force. June, 1999. RFC 2617: HTTP Authentication: Basic and Digest Access Authentication. J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen and L. Stewart, editors. Internet Engineering Task Force. June, 1999. RFC 3023: XML Media Types. M. Murata, S. St. Laurent and D. Kohn, editors. Internet Engineering Task Force. January, 2001. XML Schema Part 2: Datatypes Second Edition. P. Viron and A. Malhotra, editors. W3C. October, 2013. XQuery 3.0: An XML Query Language. J. Robie, D. Chamberlin, M. Dyck and J. Snelson, editors. W3C. XXX, 2013. (TODO: Update the publication date once XQuery 3.0 has been published) TODO list

Definition: define the name accessor, to use when I use "URL" as "URL relatively to the context root", for instance for matching against patterns. And use it consistently across the spec.

Definition: define the format of the several names (servlets, url matches, filters, chains, error handlers, etc.) And use it consistently across the spec. Or do we just use NCName all the time?

The filters mechanism is not satisfactory. It does not allow a filter to decide whether or not (and when) the wrapped component is called. The minimal use case to accomodate is an authentication filter returning a "401 Unauthorized" when the user is not connected, without actually calling the wrapped component at all). An idea could be to shortcut the component if the filter returns a valid response sequence (starting with an element web:response...)

The config mechanism is probably under-specified. We should probably define how a webapp provides default values (where in the package), as well as how this is represented in an on-disk repository (this is optional, but this is a way to align several impementations if they choose the same kind of approach).

In addition to web:parse-basic-auth(), add a function to support Digest Access Authentication (see ).

Define some user management machanism (plus identification, authentification, etc.).