The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.
This document is also available in these non-normative formats: XML and Revision markup.
This proposal provides an HTTP client interface for XPath 2.0. It defines one extension function to perform HTTP requests, and has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage.
1 Introduction
1.1 Namespace conventions
1.2 Error management
2 The http:send-request function
3 Sending a request
3.1 The request elements
3.2 Serializing the request content
3.3 Authentication
4 Dealing with the response
4.1 The result element
4.2 Representing the result content
5 Content types handling
TODO: Add @href to http:response, taking redirects into account.
The module defined by this document does define one function in the namespace http://expath.org/ns/http-client
. In this document, the
http
prefix, when used, is bound to this namespace URI.
Error codes are defined in the namespace
http://expath.org/ns/error
. In this document, the err
prefix, when used, is bound to this namespace URI.
Error conditions are identified by a code (a QName
). When such an error
condition is reached during the execution of the
function, a dynamic error is thrown, with the corresponding error code (as
if the standard XPath function error
had been called). TODO: Have not been defined yet.
Error codes are defined through the spec. For too many reasons to enumerate here, the HTTP protocol layer can raise an error. In this case, if the error condition is not mentioned explicitly in the spec, the implementation must raise an error with an appropriate message [err:HC001].
http:send-request
functionThis module defines an XPath extension function that sends an HTTP request and return the corresponding response. It supports HTTP multi-part messages. Here is the signature of this function:
http:send-request
($request aselement(http:request)?
, $href asxs:string?
, $content asitem()?
, $bodies asitem()*
) asitem()+
$request
contains the various parameters of the request, for instance
the HTTP method to use or the HTTP headers. Among other things, it can also
contain the other param's values: the URI and the
bodies. If they are not set as parameter to the function, their value
in $request, if any, is used instead. See the following section for the detailed
definition of the http:request element. If the parameter does
not follow the grammar defined in this spec, this is an error [err:HC005].
$href
is the HTTP or HTTPS URI to send the request to. It is an
xs:anyURI, but is declared as a string to be able to pass literal strings (without
requiring to explicitly cast it to an xs:anyURI).
$bodies
is the request body content, for HTTP methods that can
contain a body in the request (e.g. POST). This is an error if this param is not
the empty sequence for methods that must be empty (e.g. DELETE). The details of
the methods are defined in their respective specs (e.g. [RFC 2616] or
[RFC 4918]). In case of a multipart request, it can be a sequence
of several items, each one is the body of the corresponding body descriptor in
$request
. See below for details.
$content
is the request body content, for HTTP methods that can
contain a body in the request (POST and PUT.) This is an error if this param is
not the empty sequence for other methods (DELETE, GET, HEAD and OPTIONS.)
$serial
defines the serialization option to serialize the content to
the HTTP request. It can be either a serialization method (a string, either 'xml',
'html', 'xhtml' or 'text',) the name of an output definition (a string, which is
the name of a named xsl:output instruction,) or an xsl:output element itself. The
content is then serialized accordingly to the chosen method or xsl:output
regarding [Serialization].
Besides the 3-params signature above, there are 2 other signatures that are convenient shortcuts (corresponding to the full version in which corresponding params have been set to the empty sequence). They are:
http:send-request
($request aselement(http:request)
) asitem()+
http:send-request
($request aselement(http:request)?
, $href asxs:string?
) asitem()+
http:send-request
($uri asxs:string?
, $request aselement(http:request)?
, $content asitem()?
) asitem()+
The functions defined in this module make one able to send a request to an HTTP server and receive the corresponding response. Here is how the request is represented by the parameters to this function, and how they are used to generate the actual HTTP request to send.
The http:request
element represents all the needed
information to send the HTTP request. So it is always possible
to create such an element that will carry over all the needed info
for a particular request. For some of those values though, you
can use an additional param instead. For instance, some signatures
define the parameter $href
. If the value of this parameter
is not the empty sequence, it will then be used instead of the value
of the attribute href
on the http:request
element.
<http:request method = ncname
href? = uri
status-only? = boolean
username? = string
password? = string
auth-method? = string
send-authorization? = boolean
override-media-type? = string
follow-redirect? = boolean
timeout? = integer>
(http:header*,
(http:multipart|
http:body)?)
</http:request>
method
is the HTTP verb to use, as GET, POST, etc. It is case
insensitive
href
is the URI the request has to be sent to. It can be overridden
by the parameter $href
status-only
control how the response will look like; if it is
true, only the status code and the headers are returned, the content is not
(no http:body nor http:multipart, nor the interpreted additional value in the
returned sequence, see hereafter).
username
, password
, auth-method
and send-authorization
are used for authentication (see section
below).
override-media-type
is a MIME type. It can be used only
with http:request
, and will override the Content-Type header
returned by the server.
follow-redirect
control whether an HTTP redirect is automatically
followed or not. If it is false, the HTTP redirect is returned as the response.
If it is true (the default) the function tries to follow the redirect, by sending
the same request to the new address (including body, headers, and authentication
credentials). Maximum one redirect is followed (there is no attempt to follow
a redirect in response to following a first redirect).
timeout
is the maximum number of seconds to wait for the server to
respond. If this time duration is reached, an error is thrown [err:HC006]. (TODO: Allow one to ask for an empty sequence
instead?)
http:header
represent an HTTP header, either in the
http:request
or in the http:response
elements, as
defined below.
http:multipart
represents a multi-part body, either in a request
or a response, as defined below.
http:body
represents the multi-part
body, either of a request or a response, as defined below. It can be overridden by the parameter $content
(the way
$content
is used to build the body can be controlled by the
parameter $serial
, see section below for details.)
The http:header
element represents an HTTP header, either in a request
or in a response:
<http:header name = string value = string/>
The http:body
element represents the body of either an HTTP request or
of an HTTP response (in multipart requests and responses, it represents the body of a
single one part):
<http:body id? = string description? = string media-type = string src? = uri method? = "xml" | "html" | "xhtml" | "text" | "binary" | qname-but-not-ncname byte-order-mark? = "yes" | "no" cdata-section-elements? = qnames doctype-public? = string doctype-system? = string encoding? = string escape-uri-attributes? = "yes" | "no" indent? = "yes" | "no" normalization-form? = "NFC" | "NFD" | "NFKC" | "NFKD" | "fully-normalized" | "none" | nmtoken omit-xml-declaration? = "yes" | "no" standalone? = "yes" | "no" | "omit" suppress-indentation? = qnames undeclare-prefixes? = "yes" | "no" output-version? = nmtoken> any* </http:body>
The media-type
is the MIME media type of the body part. It is
mandatory. In a request it is given by the user and is the default value of the
Content-Type header if it is not set explicitly. In a response, it is given by the
implementation from the Content-Type header returned by the server. The
src
attribute can be used in a request to set the body content as the
content of the linked resource instead of using the children of the
http:body
element. When this attribute is used, only the
media-type
attribute must also be present, and there can be neither
content in the http:body
element, nor any other attribute, or this is an
error [err:HC004].
All the attributes, except src
, are used to set the
corresponding serialization parameter defined in [Serialization], as defined
for the XPath 2.1 function fn:serialize()
[F&O 1.1]. A difference here is that the serialization parameter
include-content-type
does not make sense, so it is not available on
the http:body
element (its value is always "yes"). Those attributes can
be given by the user on a request to control the way a part body is serialized. In
the response, the implementation can, but is not required, to provide some of them if
it has the corresponding information (some of them do not make any sense in a
response, therefore they will never be on a response element, for instance
output-version
).
The content-type
and encoding
attributes are
used to control the way the content of this element is used to create the HTTP
request (how it is serialized to the request content.) See section below for details.
The id
attribute specifies the value of the HTTP header
Content-ID
and description
the value of the HTTP header
Content-Description
. The href
attribute can be used in a
request to set the body content as the content of the linked resource instead of
using the children of the http:body
element (children of this element
and the href
attribute are mutually exclusive.)
The http:multipart
element represents an HTTP multi-part request or
response:
<http:multipart media-type = string boundary? = string> (http:header*, http:body)+ </http:multipart>
The media-type
attribute is the media type of the whole
request or response, and has to be a multipart media type (that is, its main type
must be multipart
). The boundary
attribute is the boundary
marker used to separate the several parts in the message (the value of the attribute
is prefixed with "--
" to form the actual boundary marker in the request;
on the other way, this prefix is removed from the boundary marker in the response to
set the value of the attribute).
If the request can have content (one body or several body parts), it can be specified
by the http:multipart
element, the http:body
element,
and/or the parameter $bodies
. If
$content
is not the empty sequence, it replaces the value of the
http:body
element (in multipart, if there are several bodies,
exactly one http:body
must be empty). For each body, the
content of the HTTP body is generated as follow.
Except when its attribute src
is present, a
http:request
element can have several attributes representing
serialization parameters, as defined in [Serialization]. This spec defines in
addition the method 'binary'
; in this case the body content must be
either an xs:hexBinary or an xs:base64Binary item, and no other serialization
parameter can be set besides media-type
.
The default value of the serialization method depends on the
media-type
: it is 'xml'
if it is an XML media type,
'html'
if it is an HTML media type, 'xhtml'
if it is
application/xhtml+xml
, 'text'
if it is a textual media
type, and 'binary'
for any other case.
When a body element has an empty content (i.e. it has no child node at
all) its content is given by the parameter $bodies
. In a single part
request, this param must have at most one item. If the body is empty, the param
cannot be the empty sequence. In a multipart request, $bodies
must have
as many items as there are empty body elements. If there are three empty body
elements, the content of the first of them is $bodies[1]
, and so on. The
number of empty body elements must be equal to the number of items in
$bodies
.
The parameter $serial
is used to control the way the content
is serialized. This parameter can be an xsl:output
element, as defined
in [XSLT 2.0], and the serialization is defined in [Serialization]. $serial
can also be a string, either 'xml
',
'html
', 'xhtml
' or 'text
' (other values are
implementation-defined, as explained in the above mentioned recommendations.) (Note:
$serial
should be able to be a function item too, when EXPath will
have defined the corresponding module.) If $serial
is the empty
sequence, the default value for this parameter depends on the
content-type
of the body: it is 'xml
' if it is an XML
media type, 'html
' if it is an HTML media type, 'xhtml
' if
it is application/xhtml+xml
or 'text' for any other case.
HTTP authentication when sending a request is controlled by the attributes
username
, password
, auth-method
and
send-authorization
on the element http:request
.
If username
has a value, password
and
auth-method
must have a value too. And if any one of the three
other attributes have been set, username
must be set too.
The attribute auth-method
can be either "Basic"
or
"Digest"
, but other values can also be used, in an
implementation-defined way. The handling of those attributes must be done
in conformance to [RFC 2617]. If send-authorization
is true (default value is false) and the authentication method supports
generating the header Authorization
without challenge, the
request contains this header. The default value is to send a non-authenticated
request, and if the response is an authentication challenge, then only send
the credentials in a second message.
After having sent the request to the HTTP server, the function waits for
the response. It analyses it and returns a sequence representing this
response. This sequence has an http:response
element as
first item, which is followed be an additional item for each body or
body part in the response.
<http:response status = integer message = string> (http:header*, (http:multipart | http:body)?) </http:response>
This is the first item returned by the function defined in this module.
The status
attribute is the HTTP status code returned by the
server, and message
is the message coming with the status on the
status line. The http:header
elements are as defined for the
request, but represent instead the response headers. The http:body
and http:multipart
elements are also like in the request, but
http:body
elements must be empty.
Instead of being inserted within the http:response
element, the
content of each body is returned as a single item in the return sequence.
Each item is in the same order (after the http:response
element)
than the http:body
elements. For each body, the way this item
is built from the HTTP response is as follow.
If the status-only
attribute has the value true
(default is false
), the returned sequence will only contain the
http:response
element (with the headers, but also the empty
http:body
or http:multipart
elements, as if
status-only
was false), and the following items, representing
the bodies content are not generated from the HTTP response.
For each body that has to be interpreted, the following rules apply in order to build the corresponding item. If the body media type is a text media type, the item is a string, containing the body content. If the media type is an XML media type, the content is parsed and the item is the resulting document node. If the media type is an HTML type, the content is tidied up and parsed (this process is implementation-dependant) and the item is the resulting document node. If this is a binary media type, the content is returned as a base64Binary item. From the previous rules, a result item can then be either a document node (from XML or HTML), a string or a base64Binary.
When the type of a part is either XML or HTML, its body has to be parsed into a document node. If there is any error when parsing the content, an error is raised with an appropriate message [err:HC002].
If the attribute override-media-type
is set on the request, its value is
used instead of the Content-Type returned by the HTTP server. If
the Content-Type of the response is a multipart type, the value of
override-media-type
can only be a multipart type, or
application/octet-stream
(to get the raw entity as a binary item).
If it is not, this is an error [err:HC003].(TODO: how does it fit with multipart responses?)
In both requests and responses, MIME type strings are used to choose the way the entity content has to be respectively serialized or parsed. Four different kinds of type are defined here, which are used in the above text about sending request and receiving response. The intent is to provide the spirit of the entity content handling regarding its content type, but an implementation is encouraged to deviate from those rules if it is obvious that a particular type should be treated in a specific way (normally, that would be the case only to treat a binary type as another type).
An XML media type has a MIME type of text/xml
,
application/xml
, text/xml-external-parsed-entity
,
or application/xml-external-parsed-entity
, as defined in
[RFC 3023] (except that application/xml-dtd
is
considered a text media type). MIME types ending by +xml
are
also XML media types.
An HTML media type has a MIME type of text/html
.
Text media types are the remaining types beginning with text/
.
Binary types are all the other types. An implementation can treat some of those binary types as either an XML, HTML or text media type if it is more appropriate (this is implementation-defined).
The structure of most of the elements and most of the attributes used in this candidate are inspired from the corresponding step in [XProc].