This document is also available in these non-normative formats: XML.
Copyright ©2013 Jirka Kosek, published by the EXPath Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.
This specification was published by the EXPath Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.
This proposal provides an API for XPath 2.0 to handle binary data. It defines extension functions to read binary files, perform basic binary operations on the data in memory, as well as a new serialization method. It has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage.
1 Status of this document
2 Introduction
2.1 Namespace Conventions
3 Use cases
4 Loading binary data
4.1 bin:unparsed-binary
5 Basic operations
5.1 bin:binary-subsequence
5.2 bin:binary-length
5.3 bin:binary-join
5.4 bin:binary-to-octets
5.5 bin:octets-to-binary
6 Text decoding and encoding
6.1 bin:decode-string
6.2 bin:unpack-string
6.3 bin:encode-string
7 Packing and unpacking of encoded numeric values
7.1 bin:unpack-double
7.2 bin:unpack-float
7.3 bin:unpack-long
7.4 bin:unpack-unsignedLong
7.5 bin:unpack-int
7.6 bin:unpack-unsignedInt
7.7 bin:unpack-short
7.8 bin:unpack-unsignedShort
7.9 bin:unpack-byte
7.10 bin:unpack-unsignedByte
7.11 bin:pack-double
7.12 bin:pack-float
7.13 bin:pack-long
7.14 bin:pack-unsignedLong
7.15 bin:pack-int
7.16 bin:pack-unsignedInt
7.17 bin:pack-short
7.18 bin:pack-unsignedShort
7.19 bin:pack-byte
7.20 bin:pack-unsignedByte
8 Bitwise operations
8.1 bin:binary-or
8.2 bin:binary-xor
8.3 bin:binary-and
8.4 bin:binary-not
8.5 bin:binary-shift
9 Serialization
This document is in an early draft stage. Comments are welcomed at public-expath@w3.org mailing list (archive).
The module defined by this document defines several functions and a serialization
method, all contained in the namespace http://expath.org/ns/binary
. In
this document, the bin
prefix, when used, is bound to this namespace
URI.
Error codes are defined in the namespace http://expath.org/ns/error
. In
this document, the err
prefix, when used, is bound to this namespace
URI.
Development of this specification was driven strictly by requirements which XML developer regularly faces.
Some typical use cases:
Getting dimensions of an image file.
Extracting image metadata.
Processing images embeded and base64 encoded inside SOAP message.
Processing legacy text file which uses several different encodings in different places.
The bin:unparsed-binary
function reads an external resource (for example,
an image file) and returns a binary data of the resource.
bin:unparsed-binary
($href asxs:string?
) asxs:hexBinary?
The $href
argument must be a string in the form of a URI
reference, which must contain no fragment identifier, and
must identify a resource for which a string representation is
available. If the URI is a relative URI reference, then it is resolved relative to the
Static Base URI property from the static context.
If the value of the $href
argument is an empty sequence, the function
returns an empty sequence.
The result of the function is a binary value of the resource retrieved using the URI.
Converting external images in HTML document into internal data: resources:
<xsl:template match="img[ends-with(@src, '.jpg')]"> <xsl:copy> <xsl:copy-of select="@* except @src"/> <xsl:attribute name="src"> <xsl:text>data:image/jpeg;base64,</xsl:text> <xsl:value-of select="xs:base64Binary(bin:unparsed-binary(resolve-uri(@src)))"/> </xsl:attribute> </xsl:copy> </xsl:template>
The bin:binary-subsequence
functions returns specified part of binary
data.
bin:binary-subsequence
($in asxs:hexBinary
, $offset asxs:integer
, $size asxs:integer
) asxs:hexBinary
Returns part of original binary data starting at $offset
. Size of returned
data is $size
octets.
The $offset
is zero based.
The value of $offset
argument must be non-negative
integer.
It is dynamic error if $offset
+ $size
is larger then size of
binary data passed in $in
argument.
Testing whether $data
variable contains content of PDF file:
bin:binary-subsequence($data, 0, 4) eq xs:hexBinary("25504446")
25504446
is magic number for PDF files, it is US-ASCII encoded value for
%PDF
.
The bin:binary-length
functions returns size of binary data.
bin:binary-length
($in asxs:hexBinary
) asxs:integer
Returns size of binary data in octets.
Returns a binary data created by concatenating the binary data items in a sequence.
bin:binary-join
($in asxs:hexBinary*
) asxs:hexBinary
The function returns an xs:hexBinary
created by concatenating the items in
the sequence $in
, in order.
If the value of $in
is the empty sequence, the function returns the
zero-length binary data.
Returns binary data as a sequence of octets.
bin:binary-to-octets
($in asxs:hexBinary
) asxs:integer*
If $in
is zero length binary data then empty sequence is returned.
Octets are returned as integers from 0 to 255.
Converts sequence of octets into binary data.
bin:octets-to-binary
($in asxs:integer*
) asxs:hexBinary
Octets are integers from 0 to 255.
Decodes binary data as a string in a given encoding.
bin:decode-string
($in asxs:hexBinary
, $encoding asxs:string
) asxs:string
The $encoding
argument is the name of an encoding. The values for this
attribute follow the same rules as for the encoding
attribute in an XML
declaration. The only values which every implementation is required
to recognize are utf-8
and utf-16
.
Decodes chunk of binary data at a specified offset as a string in a given encoding.
bin:unpack-string
($in asxs:hexBinary
, $offset asxs:integer
, $size asxs:integer
, $encoding asxs:string
) asxs:string
If $size
is greater then 0 this function is identical to calling
bin:decode-string(bin:binary-subsequence($in, $offset, $size),
$encoding)
.
If $size
is zero then all non-zero octets starting at $offset
until first zero octet are extracted and then decoding is applied. This way
zero-terminated strings can be easily decoded.
The $encoding
argument is the name of an encoding. The values for this
attribute follow the same rules as for the encoding
attribute in an XML
declaration. The only values which every implementation is required
to recognize are utf-8
and utf-16
.
Encodes string into binary data using a given encoding.
bin:encode-string
($in asxs:string
, $encoding asxs:string
) asxs:hexBinary
The $encoding
argument is the name of an encoding. The values for this
attribute follow the same rules as for the encoding
attribute in an XML
declaration. The only values which every implementation is required
to recognize are utf-8
and utf-16
.
Extract double value stored at the particular offset in binary data.
bin:unpack-double
($in asxs:hexBinary
, $offset asxs:integer
) asxs:double
bin:unpack-double
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:double
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract float value stored at the particular offset in binary data.
bin:unpack-float
($in asxs:hexBinary
, $offset asxs:integer
) asxs:float
bin:unpack-float
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:float
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract long (64-bit signed integer) value stored at the particular offset in binary data.
bin:unpack-long
($in asxs:hexBinary
, $offset asxs:integer
) asxs:long
bin:unpack-long
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:long
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract unsignedLong (64-bit unsigned integer) value stored at the particular offset in binary data.
bin:unpack-unsignedLong
($in asxs:hexBinary
, $offset asxs:integer
) asxs:unsignedLong
bin:unpack-unsignedLong
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:unsignedLong
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract int (32-bit signed integer) value stored at the particular offset in binary data.
bin:unpack-int
($in asxs:hexBinary
, $offset asxs:integer
) asxs:int
bin:unpack-int
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:int
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract unsignedInt (32-bit unsigned integer) value stored at the particular offset in binary data.
bin:unpack-unsignedInt
($in asxs:hexBinary
, $offset asxs:integer
) asxs:unsignedInt
bin:unpack-unsignedInt
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:unsignedInt
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract short (16-bit signed integer) value stored at the particular offset in binary data.
bin:unpack-short
($in asxs:hexBinary
, $offset asxs:integer
) asxs:short
bin:unpack-short
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:short
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract unsignedShort (16-bit unsigned integer) value stored at the particular offset in binary data.
bin:unpack-unsignedShort
($in asxs:hexBinary
, $offset asxs:integer
) asxs:unsignedShort
bin:unpack-unsignedShort
($in asxs:hexBinary
, $offset asxs:integer
, $bigendian asxs:boolean
) asxs:unsignedShort
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
The $offset
is zero based.
Extract byte (8-bit signed integer) value stored at the particular offset in binary data.
bin:unpack-byte
($in asxs:hexBinary
, $offset asxs:integer
) asxs:byte
The $offset
is zero based.
Extract unsignedByte (8-bit unsigned integer) value stored at the particular offset in binary data.
bin:unpack-unsignedByte
($in asxs:hexBinary
, $offset asxs:integer
) asxs:unsignedByte
The $offset
is zero based.
Returns binary representation of a double value.
bin:pack-double
($in asxs:double
) asxs:hexBinary
bin:pack-double
($in asxs:double
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of a float value.
bin:pack-float
($in asxs:float
) asxs:hexBinary
bin:pack-float
($in asxs:float
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of a long value.
bin:pack-long
($in asxs:long
) asxs:hexBinary
bin:pack-long
($in asxs:long
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of an unsignedLong (64-bit unsigned integer) value.
bin:pack-unsignedLong
($in asxs:unsignedLong
) asxs:hexBinary
bin:pack-unsignedLong
($in asxs:unsignedLong
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of an int (32-bit signed integer) value.
bin:pack-int
($in asxs:int
) asxs:hexBinary
bin:pack-int
($in asxs:int
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of an unsignedInt (32-bit unsigned integer) value.
bin:pack-unsignedInt
($in asxs:unsignedInt
) asxs:hexBinary
bin:pack-unsignedInt
($in asxs:unsignedInt
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of a short (16-bit signed integer) value.
bin:pack-short
($in asxs:short
) asxs:hexBinary
bin:pack-short
($in asxs:short
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of an unsignedShort (16-bit unsigned integer) value.
bin:pack-unsignedShort
($in asxs:unsignedShort
) asxs:hexBinary
bin:pack-unsignedShort
($in asxs:unsignedShort
, $bigendian asxs:boolean
) asxs:hexBinary
Little endian number representation is assumed unless $bigendian
parameter
is specified and has true()
value.
Returns binary representation of a byte (8-bit signed integer) value.
bin:pack-byte
($in asxs:byte
) asxs:hexBinary
Returns binary representation of an unsignedByte (8-bit unsigned integer) value.
bin:pack-unsignedByte
($in asxs:unsignedByte
) asxs:hexBinary
Returns "bitwise or" applied on arguments.
bin:binary-or
($a asxs:hexBinary
, $b asxs:hexBinary
) asxs:hexBinary
Returns "bitwise or" applied on $a
and $b
.
If $a
and $b
do not have same length then shorter is padded
with zero octets to match size of a longer argument.
Returns "bitwise xor" applied on arguments.
bin:binary-xor
($a asxs:hexBinary
, $b asxs:hexBinary
) asxs:hexBinary
Returns "bitwise exclusive or" applied on $a
and $b
.
If $a
and $b
do not have same length then shorter is padded
with zero octets to match size of a longer argument.
Returns "bitwise and" applied on arguments.
bin:binary-and
($a asxs:hexBinary
, $b asxs:hexBinary
) asxs:hexBinary
Returns "bitwise and" applied on $a
and $b
.
If $a
and $b
do not have same length then shorter is padded
with zero octets to match size of a longer argument.
Returns "bitwise not" of an argument.
bin:binary-and
($in asxs:hexBinary
) asxs:hexBinary
Returns "bitwise not" applied on $in
argument.
Shift bits in binary data.
bin:binary-shift
($in asxs:hexBinary
, $by asxs:integer
) asxs:hexBinary
If $by
is positive then bits are shifted $by
times to the
left.
If $by
is negative then bits are shifted -$by
times to the
right.
If $by
is zero result is identical to $in
argument.
Result has always the same size as $in
argument.
Shift is logical, zeros are placed into discarded bits.
New serialization method bin:binary
is defined. It can serialize sequence
containing only items of type xs:hexBinary
or xs:base64Binary
.
Such sequence is turned into one block of binary data using bin:binary-join
and written out to the specified location.
Joining several blobs of data into a single file:
<xsl:result-document href="image.png" method="bin:binary"> <xsl:sequence select="$image-header"/> <xsl:sequence select="$image-data"/> </xsl:result-document>
Template for extracting HTML images represented as data: URI scheme into separate external image files:
<xsl:template match="img[starts-with(@src, 'data:image/png;base64,')]"> <xsl:copy> <xsl:copy-of select="@* except @src"/> <xsl:attribute name="src" select="concat(generate-id(), '.png')"/> <xsl:result-document href="{generate-id()}.png" method="bin:binary"> <xsl:sequence select="xs:base64Binary(substring-after(@src, 'data:image/png;base64,'))"/> </xsl:result-document> </xsl:copy> </xsl:template>