Copyright ©2013 Jirka Kosek, published by the
This specification was published by the
This proposal provides an API for XPath 2.0 to handle binary data. It defines extension functions to read binary files, perform basic binary operations on the data in memory, as well as a new serialization method. It has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage.
revisiondesc
This document is in an early draft stage. Comments are welcomed
at
The module defined by this document defines several functions and a serialization
method, all contained in the namespace http://expath.org/ns/binary
. In
this document, the bin
prefix, when used, is bound to this namespace
URI.
Error codes are defined in the namespace http://expath.org/ns/error
. In
this document, the err
prefix, when used, is bound to this namespace
URI.
Development of this specification was driven strictly by requirements which XML developer regularly faces.
Some typical use cases:
Getting dimensions of an image file.
Extracting image metadata.
Processing images embeded and base64 encoded inside SOAP message.
Processing legacy text file which uses several different encodings in different places.
The bin:unparsed-binary
function reads an external resource (for example, an image
file) and returns a binary data of the resource.
The $href
argument
If the value of the $href
argument is an empty sequence, the function
returns an empty sequence.
The result of the function is a binary value of the resource retrieved using the URI.
Converting external images in HTML document into internal data: resources:
The bin:binary-subsequence
functions returns specified part of binary data.
Returns part of original binary data starting at $offset
. Size of
returned data is $size
octets.
The $offset
is zero based.
The value of $offset
argument
It is dynamic error if $offset
+ $size
is larger then size of binary data passed in $in
argument.
Testing whether $data
variable contains content of PDF file:
25504446
is magic number for PDF files, it is US-ASCII encoded value for %PDF
.
The bin:binary-length
functions returns size of binary data.
Returns size of binary data in octets.
Returns a binary data created by concatenating the binary data items in a sequence.
The function returns an xs:hexBinary
created by concatenating the items in the
sequence $in
, in order.
If the value of $in
is the empty sequence, the function returns the
zero-length binary data.
Returns binary data as a sequence of octets.
If $in
is zero length binary data then empty sequence is returned.
Octets are returned as integers from 0 to 255.
Converts sequence of octets into binary data.
Octets are integers from 0 to 255.
Decodes binary data as a string in a given encoding.
The $encoding
argument is the name of an encoding. The values
for this attribute follow the same rules as for the encoding
attribute in
an XML declaration. The only values which every implementation is utf-8
and utf-16
.
Decodes chunk of binary data at a specified offset as a string in a given encoding.
If $size
is greater then 0 this function is identical to calling bin:decode-string(bin:binary-subsequence($in, $offset, $size), $encoding)
.
If $size
is zero then all non-zero octets starting at $offset
until first zero octet are extracted and then decoding is applied. This way zero-terminated strings can be easily decoded.
The $encoding
argument is the name of an encoding. The values
for this attribute follow the same rules as for the encoding
attribute in
an XML declaration. The only values which every implementation is utf-8
and utf-16
.
Encodes string into binary data using a given encoding.
The $encoding
argument is the name of an encoding. The values
for this attribute follow the same rules as for the encoding
attribute in
an XML declaration. The only values which every implementation is utf-8
and utf-16
.
Extract double value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract float value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract long (64-bit signed integer) value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract unsignedLong (64-bit unsigned integer) value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract int (32-bit signed integer) value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract unsignedInt (32-bit unsigned integer) value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract short (16-bit signed integer) value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract unsignedShort (16-bit unsigned integer) value stored at the particular offset in binary data.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
The $offset
is zero based.
Extract byte (8-bit signed integer) value stored at the particular offset in binary data.
The $offset
is zero based.
Extract unsignedByte (8-bit unsigned integer) value stored at the particular offset in binary data.
The $offset
is zero based.
Returns binary representation of a double value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of a float value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of a long value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of an unsignedLong (64-bit unsigned integer) value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of an int (32-bit signed integer) value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of an unsignedInt (32-bit unsigned integer) value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of a short (16-bit signed integer) value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of an unsignedShort (16-bit unsigned integer) value.
Little endian number representation is assumed unless $bigendian
parameter is specified and has true()
value.
Returns binary representation of a byte (8-bit signed integer) value.
Returns binary representation of an unsignedByte (8-bit unsigned integer) value.
Returns "bitwise or" applied on arguments.
Returns "bitwise or" applied on $a
and $b
.
If $a
and $b
do not have same length
then shorter is padded with zero octets to match size of a longer argument.
Returns "bitwise xor" applied on arguments.
Returns "bitwise exclusive or" applied on $a
and $b
.
If $a
and $b
do not have same length
then shorter is padded with zero octets to match size of a longer argument.
Returns "bitwise and" applied on arguments.
Returns "bitwise and" applied on $a
and $b
.
If $a
and $b
do not have same length
then shorter is padded with zero octets to match size of a longer argument.
Returns "bitwise not" of an argument.
Returns "bitwise not" applied on $in
argument.
Shift bits in binary data.
If $by
is positive then bits are shifted $by
times to the left.
If $by
is negative then bits are shifted -$by
times to the right.
If $by
is zero result is identical to $in
argument.
Result has always the same size as $in
argument.
Shift is logical, zeros are placed into discarded bits.
New serialization method bin:binary
is defined.
It can serialize sequence containing only items of type xs:hexBinary
or xs:base64Binary
. Such sequence is turned into one block of binary data
using bin:binary-join
and written out to the specified location.
Joining several blobs of data into a single file:
Template for extracting HTML images represented as data: URI scheme into separate external image files:
Proper error codes and conditions will be defined in the next version of this draft.