W3C

HTTP Model

EXPath Candidate Module 11 January 2014

This version:
http://expath.org/spec/http/editor
Latest version:
http://expath.org/spec/http
Editor:
Florent Georges, H2O Consulting

This document is also available in these non-normative formats: XML.


Abstract

This module defines an XML format for HTTP requests and responses, as well as a set of functions to access specific informations of those requests and responses.

Table of Contents

1 Namespace conventions
2 Introduction
3 Requests and responses
4 Functions
    4.1 http:header
    4.2 http:header-names
    4.3 http:param
    4.4 http:param-names
    4.5 http:is-request
    4.6 http:is-response
    4.7 http:method
    4.8 http:href
    4.9 http:scheme
    4.10 http:authority
    4.11 http:port
    4.12 http:uri-part
    4.13 http:code
    4.14 http:status
    4.15 http:bodies
    4.16 http:body
    4.17 http:get-body
    4.18 http:id
    4.19 http:request


1 Namespace conventions

This module defines an XML format for HTTP requests and responses, as well as a set of functions to access specific informations of those requests and responses. Elements and functions defined by this module are in the namespace http://expath.org/ns/http. In this document, the http prefix, when used, is bound to this namespace URI.

2 Introduction

One of the obvious usages of XML if to transfer information between different systems, as a universally supported information representation language. In that regard, it is often used together with HTTP, as a universally supported transfert protocol. XPath and associated programming languages like XQuery and XSLT, are therefore often used to create or consume XML documents received or to be sent through HTTP. But they have no (standard) way to access or represent the information in the HTTP layer itself.

This specification defines an XML format to represent HTTP requests and responses, which cary all the semantics of the HTTP layer. They are low-level formats aimed at mapping as closely as possible the HTTP specification, so all the power and expressiveness of HTTP is available to XPath. Libraries or other specifications can then use those requests and responses, to provide more high-level features on top of them. This specification does not define any way to send a request (as a client) or to receive a request and serve the corresponding response (as a server). Those features would have to be defined by other specifications or by specific products.

Here is an example of HTTP request:

<http:request ...>
   ...
</http:request>

The corresponding response could look like:

<http:response ...>
   ...
</http:response>

Accessing the value of the header Content-Type in the response (given it is bound to the variable $response) could be done in XPath as following:

$response/http:header[@name eq 'content-type']/string(@value)

The exact same result can be achived by using the function http:header defined in this specification:

http:header($response, 'Content-Type')

This specification provides, in the appendices, the XML Schema for the XML format, as well as an XQuery library module and an XSLT stylesheet defining the functions as standard XQuery and XSLT functions. This is only informative, and an implementation can provide built-in functions as long as they provide the same results.

3 Requests and responses

TODO: Do we use the elements http:request and http:response, or like a single element http:entity? (with an attribute or element to tell whether they are or they are intended to be a request or a response) This allows to define functions for instance that accept either of them (like http:header() for instance). Other possibilities to solve this problem is to define a base type in XML Schema and use it in function signatures...

The request and response are defined by the following elements:

<http:request
   method = method
   href = anyURI >
    (http:uri?,
     http:param*,
     http:header*,
     (http:body
      |http:multipart)?
</http:request>

<http:response
   code = code
   status = string? >
    (http:header*,
     (http:body
      |http:multipart)?
</http:response>

TODO: Is it possible to have query parameters in @href? It would be useful in the context of an HTTP client. In the context of a server-side container, it would be less a problem, as the user would mostly access the information, and we could provide different functions to access different "flavours" of the URI (with or without the query parameters, with or w/o the fragment identifier, etc.) The same is true for the element http:uri: it is useful mostly as a convenience to access the "parsed" URI, but this is not a big deal if we provide accessor functions (and we can provide as well a function to construct the full URI based on an element http:uri).

TODO: Define that http:response/@status is optional in a server context (when the user constructs the response, so the implementation can set the standard status phrase for the given response code), but is mandatory in a client context (the implementation must then provide the user with the status phrase returned to the client). Same for http:request/http:uri, mandatory in a server context (but certainly not in a client context).

The method name and the header names are always lower case. Because the case is not significant, it is then possible to the user to compare them (for instance to check the method or to select a specific header) by simple string comparison, instead of having to handle the case insensitivity (the corresponsing accessor functions provide the case insensitivity feature though).

The element http:uri in the request is defined as follow:

<http:uri>
    (http:scheme,
     http:part,
     http:authority,
     (http:part,
      http:port)?,
     http:part*
</http:uri>

TODO: Include the query parameters and the fragment identifier as well?

The element http:scheme is the HTTP scheme, either http or https. The element authority is the machine name, either an IP address or a domain name. The element port is, unsurprisingly, the port number, as an integer. The elements http:part contain any other text part of the URI, as a string. The first one has always the value "://". The one before the port, if any, has always the value ":". The following elements http:part can present the URI split in any number of pieces, as defined by the implementation. The rule is that concatenating the value of all the elements in the document order reconstruct the same URI as in http:request/@href. The element http:part is defined as:

<http:part
   name = token? >
    text
</http:part>

In particular, note the optional attribute @name. A server-side implementation can provide a way to "parse" the request URI, for instance for RESTfull services, and break it down to pieces, annotating some of them with names. The following URI for instance:

http://example.org/users/fgeorges

could result in the following element http:uri:

<http:uri>
   <http:scheme>http</http:scheme>
   <http:part>://</http:part>
   <http:authority>example.org</http:authority>
   <http:part>/users/</http:part>
   <http:part name="id">fgeorges</http:part>
</http:uri>

The way names are attributed is implementation-defined. With the above http:uri element, the user can then easily retrieve the ID of the user from the URI with:

$request/http:uri/http:part[@name eq 'id']/string(.)

Note: Informally, the intend is to provide server context libraries, framworks and specifications the ability to "parse" the URI and name some specific parts. This is useful in RESTful environment for instance, like the above example suggests. If a RESTful specification defines a way to map requests to implementations based on URI path patterns, the above example could be generated by the following pattern (on example.org then): "/users/{id}".

The elements http:param and http:header are defined as follow:

<http:param
   name = token? >
    text
</http:param>

<http:header
   name = token? >
    text
</http:header>

The elements http:body and http:multipart are defined as follow:

<http:multipart
   content-type = string
   separator = string? >
    (http:header*,
     http:body)+
</http:multipart>

<http:body
   content-type = string
   position = integer? >
    (text
     |document-node)?
</http:body>

The attribute content-type is the content of the multipart or single body. The attribute position is used on the body elements when they are children of a multipart entity. The attribute separator is the multipart separator string (it is optional so when it is used on a response in a server-side context, or on a request in a client context, the implementation can compute it automatically; it must be present on a request in a server-side context and on a response in a client-side context).

The content of a body element is optional. Besides the fact that the content of a body can be empty, this is also a way to let a specific library or specification to define a way to pass the content of a body alongside the HTTP element, instead of embedding it (this is outside the scope of this specification). For instance, an HTTP client could accept an HTTP request element and a sequence of items representing the content of the bodies (or it could require the content to be embedded in the http:body elements all the time).

4 Functions

Most of the functions defined in this specification are convenience functions to access specific information from the request and response elements. They take such a request or a response as parameter (except http:request), and a such could be written as standard user functions in XQuery or XSLT. Defining those functions in this specification achieves 3 different goals: 1) providing the user with those functions even in XPath, which does not allow defining user functions, 2) providing the user with those functions without requiring to write them over and over again, in XQuery and XSLT, or to have to import libraries, and 3) providing the implementation with optimization opportunities.

The function http:request is different in this regards, as it does not return inormation from a request or a response, but it returns the "current in-scope request". All other functions are functional, in the sense that they return some values based on their parameters (the returned values depend only on the parameters, and are always the same for the same parameters). But the function http:request does not take any parameter, it depends on the dynamic context when it is called, and so makes the function it is called from not functional.

The use of the function http:request is deprecated as much as possible. In a server-side environment, the specification or the product defining the HTTP layer should provide a way to pass the HTTP request to the user code, e.g. through a stylesheet or a query parameter.

The other functions take an HTTP entity as first parameter. Some functions expect a request, some expect a response, and others expect any of them.

Note:

TODO: Describe the advantages and drawbacks of having 2 different names for http:request and http:response (probably deriving from the same common XML Schema base type), versus having one element http:entity with an attribute telling whether it is a request or a response.

4.1 http:header

http:header($entity as element(*, http:entity),
            $name   as xs:string) as xs:string*

This function returns the values of the headers with the name $name in the HTTP entity $entity. A header can appear several times in HTTP, so the return value is a sequence of string. If the header does not appear in the entity, the empty sequence is returned.

The names of the headers are case-insensitive. The following calls therefore all return the same values:

http:header($response, 'Content-Type')
http:header($response, 'content-type')
http:header($response, 'CoNtEnT-tYpE')

4.2 http:header-names

http:header-names($entity as element(*, http:entity)) as xs:string*

This function returns the name of all the headers in the $entity. If a header appears more than once, its name is returned only once. The order of the names is undefined, but is stable accros several calls on the same element.

4.3 http:param

http:param($entity as element(*, http:entity),
           $name   as xs:string) as xs:string*

This function returns the values of the parameters with the name $name in the HTTP entity $entity. If the parameter does not appear in the entity, the empty sequence is returned. A parameter is either a query parameter from the URI, or a parameter from the body (TODO: define exactly under which conditions, which match the HTML specification, like x-www-url-encoded-parameters, are something like that).

4.4 http:param-names

http:param-names($entity as element(*, http:entity)) as xs:string*

This function returns the name of all the parameters in the $entity. Parameters are defined in the definition of the function http:param. The order of the names is undefined, but is stable accros several calls on the same element.

4.5 http:is-request

http:is-request($entity as element(*, http:entity)) as boolean

This function returns true if $entity is an HTTP request, false if it is a response.

4.6 http:is-response

http:is-response($entity as element(*, http:entity)) as boolean

This function returns true if $entity is an HTTP response, false if it is a request.

4.7 http:method

http:method($request as element(http:request)) as xs:string

This function returns the method of $request. The method name is always in lower case.

4.8 http:href

http:href($request as element(http:request)) as xs:string?

This function returns the href of $request.

TODO: Define what it contains (fragment identifier?, query parameters?, etc.)

4.9 http:scheme

http:scheme($request as element(http:request)) as xs:string?

This function returns the URI scheme of $request. The scheme is either 'http' ot 'https'.

4.10 http:authority

http:authority($request as element(http:request)) as xs:string?

This function returns the URI scheme of $request.

4.11 http:port

http:port($request as element(http:request)) as xs:integer

This function returns the port number of $request. This is the actual port a request has to be sent to (in a client context) or has been received on (in a server context), regardless whether is has been explicitely written or not in the request URI.

4.12 http:uri-part

http:uri-part($request as element(http:request),
              $name    as xs:string) as xs:string?

This function returns the value of the URI part with the same name as $name. The name comparison is using the codepoint collation. The way some parts are named is implementation-defined (see also the definition of [http:uri]).

4.13 http:code

http:code($response as element(http:response)) as xs:integer

This function returns the result code of $response. For instance 200 for a succesful response, or 404 for a "Page not found" error.

4.14 http:status

http:status($response as element(http:response)) as xs:string

This function returns the status line of $response. In a server context, if $response has not been given an explicit status line, then this function returns the default value the implementation will use, given $response status code (as defined in the HTTP specification, e.g. "OK" for 200 or "Page not found" for 404).

4.15 http:bodies

http:bodies($entity as element(*, http:entity)) as element(http:body)*

This function returns the body descriptors of $entity. There is one optional body descriptor (that is one element http:body) for single part requests and responses, but there can be several body descriptors in case of multipart requests or responses (one of them for each part in the multipart entity).

4.16 http:body

http:body($entity as element(*, http:entity)) as element(http:body)?
http:body($entity as element(*, http:entity),
          $pos    as xs:integer) as element(http:body)?

This function returns a specific body descriptor of $entity. The first arity can be used only for single part requests and responses. It throws the error http:not-a-single-part if $entity has a multipart content type. The second arity returns the body descriptor at position $pos, in case of multipart (the position numbering starts at 1). If $pos is 1 and $entity is a single part content type, it return the same results as the first arity. If $pos is greater than 1 and $entity has a single part content type, it throws the error http:not-multipart. Both arities throw the error http:out-of-bound if $pos is lower than 1. If there is no body corresponding to position $pos (or if the single part has no body for the first arity), the empty sequence is returned.

4.17 http:get-body

http:get-body($body as element(http:body)) as item()
http:get-body($body   as element(http:body),
              $bodies as item()*) as item()

This function returns the actual content associated with a body descriptor. The first arity get the content from the body descriptor only, whilst the second arity can use the body content list passed as $bodies. In some cases, a server context implementation can provide the user code with an http:request and a body content list, so the body content can be lazy evaluated more easily, without being embedded in http:request. More fundamentally, if the body content is XML, we really want a separate XML document node to be returned, instead of an element node, child of the http:body (traversing the parent or ancestor axis would then be surprising, by returning the http:body or http:request, which is really not what we want from a body content.

TODO: Should we define more precisely the body content list? Should we use a map? Should we then impose such a separate list? (which makes a lot of sense for XML document bodies, to avoid them to be poluted by their http:body parent, like in SOAP envelopes). But then, it would make sense to make the map itself representing the whole request or response, and so to make it the parameter of most functions here. So it would be not possible to implement it without map support... Which really is a an acceptable strategy (isn't it what the Binary Module is doing?)

4.18 http:id

http:id($entity as element(*, http:entity)) as xs:string

This function returns a string which is unique to $entity. The context of uniqueness is implementation-defined, but is losely defined to be a single web server or client, not restricted to one XPath, XQuery or XSLT evaluation.

Note:

Note: For instance, on a server, this can be used to log each request received in a separate file, as the returned ID will be different for all http:request elements, but always the same for the same request. This is guaranteed by the implementation even between different evaluations.

4.19 http:request

http:request() as element(http:request)

This function returns the HTTP request from the context, in a server context.

It is the only function in this specification depending on the context, all other functions get an explicit HTTP request or response as parameter. An implementation might chose not to support it (in which case it throws an error http:not-supported). If an implementation chose to support this function, it is encouraged to provide the user with the ability to enable or disable this support (e.g. through some implementation-defined configuration machanism). A server context library, framework or specification using this module is then encouraged to define another way to pass the HTTP request to the user code (e.g. through external parameters for an XQuery main module), so the user always have a way not to use this function. If only for that, it is better and easier to test to write code not using the execution context.

...