W3C

Binary Module

EXPath Candidate Module 12 March 2013

This version:
http://expath.org/spec/binary/20130312
Latest version:
http://expath.org/spec/binary
Editor:
Jirka Kosek <jirka@kosek.cz>

This document is also available in these non-normative formats: XML.


Abstract

This proposal provides an API for XPath 2.0 to handle binary data. It defines extension functions to read binary files, perform basic binary operations on the data in memory, as well as a new serialization method. It has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage.

Table of Contents

1 Status of this document
2 Introduction
    2.1 Namespace Conventions
3 Use cases
4 Loading binary data
    4.1 bin:unparsed-binary
5 Basic operations
    5.1 bin:binary-subsequence
    5.2 bin:binary-length
    5.3 bin:binary-join
    5.4 bin:binary-to-octets
    5.5 bin:octets-to-binary
6 Text decoding and encoding
    6.1 bin:decode-string
    6.2 bin:unpack-string
    6.3 bin:encode-string
7 Packing and unpacking of encoded numeric values
    7.1 bin:unpack-double
    7.2 bin:unpack-float
    7.3 bin:unpack-long
    7.4 bin:unpack-unsignedLong
    7.5 bin:unpack-int
    7.6 bin:unpack-unsignedInt
    7.7 bin:unpack-short
    7.8 bin:unpack-unsignedShort
    7.9 bin:unpack-byte
    7.10 bin:unpack-unsignedByte
    7.11 bin:pack-double
    7.12 bin:pack-float
    7.13 bin:pack-long
    7.14 bin:pack-unsignedLong
    7.15 bin:pack-int
    7.16 bin:pack-unsignedInt
    7.17 bin:pack-short
    7.18 bin:pack-unsignedShort
    7.19 bin:pack-byte
    7.20 bin:pack-unsignedByte
8 Bitwise operations
    8.1 bin:binary-or
    8.2 bin:binary-xor
    8.3 bin:binary-and
    8.4 bin:binary-not
    8.5 bin:binary-shift
9 Serialization

Appendices

A References
B Summary of Error Conditions


1 Status of this document

This document is in an early draft stage. Comments are welcomed at public-expath@w3.org mailing list (archive).

2 Introduction

2.1 Namespace Conventions

The module defined by this document defines several functions and a serialization method, all contained in the namespace http://expath.org/ns/binary. In this document, the bin prefix, when used, is bound to this namespace URI.

Error codes are defined in the namespace http://expath.org/ns/error. In this document, the err prefix, when used, is bound to this namespace URI.

3 Use cases

Development of this specification was driven strictly by requirements which XML developer regularly faces.

Some typical use cases:

4 Loading binary data

4.1 bin:unparsed-binary

Summary

The bin:unparsed-binary function reads an external resource (for example, an image file) and returns a binary data of the resource.

Signature

bin:unparsed-binary($href as xs:string?) as xs:hexBinary?

Rules

The $href argument must be a string in the form of a URI reference, which must contain no fragment identifier, and must identify a resource for which a string representation is available. If the URI is a relative URI reference, then it is resolved relative to the Static Base URI property from the static context.

If the value of the $href argument is an empty sequence, the function returns an empty sequence.

The result of the function is a binary value of the resource retrieved using the URI.

Examples

Converting external images in HTML document into internal data: resources:

<xsl:template match="img[ends-with(@src, '.jpg')]">
   <xsl:copy>
      <xsl:copy-of select="@* except @src"/>
      <xsl:attribute name="src">
         <xsl:text>data:image/jpeg;base64,</xsl:text>
         <xsl:value-of select="xs:base64Binary(bin:unparsed-binary(resolve-uri(@src)))"/>
      </xsl:attribute>
   </xsl:copy>
</xsl:template>

5 Basic operations

5.1 bin:binary-subsequence

Summary

The bin:binary-subsequence functions returns specified part of binary data.

Signature

bin:binary-subsequence($in     as xs:hexBinary,
                       $offset as xs:integer,
                       $size   as xs:integer) as xs:hexBinary

Rules

Returns part of original binary data starting at $offset. Size of returned data is $size octets.

The $offset is zero based.

The value of $offset argument must be non-negative integer.

It is dynamic error if $offset + $size is larger then size of binary data passed in $in argument.

Examples

Testing whether $data variable contains content of PDF file:

bin:binary-subsequence($data, 0, 4) eq xs:hexBinary("25504446")

25504446 is magic number for PDF files, it is US-ASCII encoded value for %PDF.

5.2 bin:binary-length

Summary

The bin:binary-length functions returns size of binary data.

Signature

bin:binary-length($in as xs:hexBinary) as xs:integer

Rules

Returns size of binary data in octets.

5.3 bin:binary-join

Summary

Returns a binary data created by concatenating the binary data items in a sequence.

Signature

bin:binary-join($in as xs:hexBinary*) as xs:hexBinary

Rules

The function returns an xs:hexBinary created by concatenating the items in the sequence $in, in order.

Notes

If the value of $in is the empty sequence, the function returns the zero-length binary data.

5.4 bin:binary-to-octets

Summary

Returns binary data as a sequence of octets.

Signature

bin:binary-to-octets($in as xs:hexBinary) as xs:integer*

Rules

If $in is zero length binary data then empty sequence is returned.

Octets are returned as integers from 0 to 255.

5.5 bin:octets-to-binary

Summary

Converts sequence of octets into binary data.

Signature

bin:octets-to-binary($in as xs:integer*) as xs:hexBinary

Rules

Octets are integers from 0 to 255.

6 Text decoding and encoding

6.1 bin:decode-string

Summary

Decodes binary data as a string in a given encoding.

Signature

bin:decode-string($in       as xs:hexBinary,
                  $encoding as xs:string) as xs:string

Rules

The $encoding argument is the name of an encoding. The values for this attribute follow the same rules as for the encoding attribute in an XML declaration. The only values which every implementation is required to recognize are utf-8 and utf-16.

6.2 bin:unpack-string

Summary

Decodes chunk of binary data at a specified offset as a string in a given encoding.

Signature

bin:unpack-string($in       as xs:hexBinary,
                  $offset   as xs:integer,
                  $size     as xs:integer,
                  $encoding as xs:string) as xs:string

Rules

If $size is greater then 0 this function is identical to calling bin:decode-string(bin:binary-subsequence($in, $offset, $size), $encoding).

If $size is zero then all non-zero octets starting at $offset until first zero octet are extracted and then decoding is applied. This way zero-terminated strings can be easily decoded.

The $encoding argument is the name of an encoding. The values for this attribute follow the same rules as for the encoding attribute in an XML declaration. The only values which every implementation is required to recognize are utf-8 and utf-16.

6.3 bin:encode-string

Summary

Encodes string into binary data using a given encoding.

Signature

bin:encode-string($in       as xs:string,
                  $encoding as xs:string) as xs:hexBinary

Rules

The $encoding argument is the name of an encoding. The values for this attribute follow the same rules as for the encoding attribute in an XML declaration. The only values which every implementation is required to recognize are utf-8 and utf-16.

7 Packing and unpacking of encoded numeric values

7.1 bin:unpack-double

Summary

Extract double value stored at the particular offset in binary data.

Signatures

bin:unpack-double($in     as xs:hexBinary,
                  $offset as xs:integer) as xs:double

bin:unpack-double($in        as xs:hexBinary,
                  $offset    as xs:integer,
                  $bigendian as xs:boolean) as xs:double

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.2 bin:unpack-float

Summary

Extract float value stored at the particular offset in binary data.

Signatures

bin:unpack-float($in     as xs:hexBinary,
                 $offset as xs:integer) as xs:float

bin:unpack-float($in        as xs:hexBinary,
                 $offset    as xs:integer,
                 $bigendian as xs:boolean) as xs:float

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.3 bin:unpack-long

Summary

Extract long (64-bit signed integer) value stored at the particular offset in binary data.

Signatures

bin:unpack-long($in     as xs:hexBinary,
                $offset as xs:integer) as xs:long

bin:unpack-long($in        as xs:hexBinary,
                $offset    as xs:integer,
                $bigendian as xs:boolean) as xs:long

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.4 bin:unpack-unsignedLong

Summary

Extract unsignedLong (64-bit unsigned integer) value stored at the particular offset in binary data.

Signatures

bin:unpack-unsignedLong($in     as xs:hexBinary,
                        $offset as xs:integer) as xs:unsignedLong

bin:unpack-unsignedLong($in        as xs:hexBinary,
                        $offset    as xs:integer,
                        $bigendian as xs:boolean) as xs:unsignedLong

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.5 bin:unpack-int

Summary

Extract int (32-bit signed integer) value stored at the particular offset in binary data.

Signatures

bin:unpack-int($in     as xs:hexBinary,
               $offset as xs:integer) as xs:int

bin:unpack-int($in        as xs:hexBinary,
               $offset    as xs:integer,
               $bigendian as xs:boolean) as xs:int

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.6 bin:unpack-unsignedInt

Summary

Extract unsignedInt (32-bit unsigned integer) value stored at the particular offset in binary data.

Signatures

bin:unpack-unsignedInt($in     as xs:hexBinary,
                       $offset as xs:integer) as xs:unsignedInt

bin:unpack-unsignedInt($in        as xs:hexBinary,
                       $offset    as xs:integer,
                       $bigendian as xs:boolean) as xs:unsignedInt

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.7 bin:unpack-short

Summary

Extract short (16-bit signed integer) value stored at the particular offset in binary data.

Signatures

bin:unpack-short($in     as xs:hexBinary,
                 $offset as xs:integer) as xs:short

bin:unpack-short($in        as xs:hexBinary,
                 $offset    as xs:integer,
                 $bigendian as xs:boolean) as xs:short

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.8 bin:unpack-unsignedShort

Summary

Extract unsignedShort (16-bit unsigned integer) value stored at the particular offset in binary data.

Signatures

bin:unpack-unsignedShort($in     as xs:hexBinary,
                         $offset as xs:integer) as xs:unsignedShort

bin:unpack-unsignedShort($in        as xs:hexBinary,
                         $offset    as xs:integer,
                         $bigendian as xs:boolean) as xs:unsignedShort

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

The $offset is zero based.

7.9 bin:unpack-byte

Summary

Extract byte (8-bit signed integer) value stored at the particular offset in binary data.

Signature

bin:unpack-byte($in     as xs:hexBinary,
                $offset as xs:integer) as xs:byte

Rules

The $offset is zero based.

7.10 bin:unpack-unsignedByte

Summary

Extract unsignedByte (8-bit unsigned integer) value stored at the particular offset in binary data.

Signature

bin:unpack-unsignedByte($in     as xs:hexBinary,
                        $offset as xs:integer) as xs:unsignedByte

Rules

The $offset is zero based.

7.11 bin:pack-double

Summary

Returns binary representation of a double value.

Signatures

bin:pack-double($in as xs:double) as xs:hexBinary

bin:pack-double($in        as xs:double,
                $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.12 bin:pack-float

Summary

Returns binary representation of a float value.

Signatures

bin:pack-float($in as xs:float) as xs:hexBinary

bin:pack-float($in        as xs:float,
               $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.13 bin:pack-long

Summary

Returns binary representation of a long value.

Signatures

bin:pack-long($in as xs:long) as xs:hexBinary

bin:pack-long($in        as xs:long,
              $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.14 bin:pack-unsignedLong

Summary

Returns binary representation of an unsignedLong (64-bit unsigned integer) value.

Signatures

bin:pack-unsignedLong($in as xs:unsignedLong) as xs:hexBinary

bin:pack-unsignedLong($in        as xs:unsignedLong,
                      $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.15 bin:pack-int

Summary

Returns binary representation of an int (32-bit signed integer) value.

Signatures

bin:pack-int($in as xs:int) as xs:hexBinary

bin:pack-int($in        as xs:int,
             $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.16 bin:pack-unsignedInt

Summary

Returns binary representation of an unsignedInt (32-bit unsigned integer) value.

Signatures

bin:pack-unsignedInt($in as xs:unsignedInt) as xs:hexBinary

bin:pack-unsignedInt($in        as xs:unsignedInt,
                     $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.17 bin:pack-short

Summary

Returns binary representation of a short (16-bit signed integer) value.

Signatures

bin:pack-short($in as xs:short) as xs:hexBinary

bin:pack-short($in        as xs:short,
               $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.18 bin:pack-unsignedShort

Summary

Returns binary representation of an unsignedShort (16-bit unsigned integer) value.

Signatures

bin:pack-unsignedShort($in as xs:unsignedShort) as xs:hexBinary

bin:pack-unsignedShort($in        as xs:unsignedShort,
                       $bigendian as xs:boolean) as xs:hexBinary

Rules

Little endian number representation is assumed unless $bigendian parameter is specified and has true() value.

7.19 bin:pack-byte

Summary

Returns binary representation of a byte (8-bit signed integer) value.

Signature

bin:pack-byte($in as xs:byte) as xs:hexBinary

Rules

7.20 bin:pack-unsignedByte

Summary

Returns binary representation of an unsignedByte (8-bit unsigned integer) value.

Signature

bin:pack-unsignedByte($in as xs:unsignedByte) as xs:hexBinary

Rules

8 Bitwise operations

8.1 bin:binary-or

Summary

Returns "bitwise or" applied on arguments.

Signature

bin:binary-or($a as xs:hexBinary,
              $b as xs:hexBinary) as xs:hexBinary

Rules

Returns "bitwise or" applied on $a and $b.

If $a and $b do not have same length then shorter is padded with zero octets to match size of a longer argument.

8.2 bin:binary-xor

Summary

Returns "bitwise xor" applied on arguments.

Signature

bin:binary-xor($a as xs:hexBinary,
               $b as xs:hexBinary) as xs:hexBinary

Rules

Returns "bitwise exclusive or" applied on $a and $b.

If $a and $b do not have same length then shorter is padded with zero octets to match size of a longer argument.

8.3 bin:binary-and

Summary

Returns "bitwise and" applied on arguments.

Signature

bin:binary-and($a as xs:hexBinary,
               $b as xs:hexBinary) as xs:hexBinary

Rules

Returns "bitwise and" applied on $a and $b.

If $a and $b do not have same length then shorter is padded with zero octets to match size of a longer argument.

8.4 bin:binary-not

Summary

Returns "bitwise not" of an argument.

Signature

bin:binary-and($in as xs:hexBinary) as xs:hexBinary

Rules

Returns "bitwise not" applied on $in argument.

8.5 bin:binary-shift

Summary

Shift bits in binary data.

Signature

bin:binary-shift($in as xs:hexBinary,
                 $by as xs:integer) as xs:hexBinary

Rules

If $by is positive then bits are shifted $by times to the left.

If $by is negative then bits are shifted -$by times to the right.

If $by is zero result is identical to $in argument.

Result has always the same size as $in argument.

Shift is logical, zeros are placed into discarded bits.

9 Serialization

New serialization method bin:binary is defined. It can serialize sequence containing only items of type xs:hexBinary or xs:base64Binary. Such sequence is turned into one block of binary data using bin:binary-join and written out to the specified location.

Examples

Joining several blobs of data into a single file:

<xsl:result-document href="image.png" method="bin:binary">
   <xsl:sequence select="$image-header"/>
   <xsl:sequence select="$image-data"/>
</xsl:result-document>

Template for extracting HTML images represented as data: URI scheme into separate external image files:

<xsl:template match="img[starts-with(@src, 'data:image/png;base64,')]">
   <xsl:copy>
      <xsl:copy-of select="@* except @src"/>
      <xsl:attribute name="src" select="concat(generate-id(), '.png')"/>
      <xsl:result-document href="{generate-id()}.png" method="bin:binary">
         <xsl:sequence select="xs:base64Binary(substring-after(@src, 'data:image/png;base64,'))"/>
      </xsl:result-document>
   </xsl:copy>
</xsl:template>

A References

Serialization
XSLT 2.0 and XQuery 1.0 Serialization. Scott Boag, Michael Kay, Joanne Tong, Norman Walsh, and Henry Zongaro, editors. W3C Recommendation. 23 January 2007.
F&O 1.1
XPath and XQuery Functions and Operators 1.1. Michael Kay, editor. W3C Working Draft. 15 January 2009.
XSLT 2.0
XSL Transformations (XSLT) Version 2.0. Michael Kay, editor. W3C Recommendation. 23 January 2007.

B Summary of Error Conditions

Note:

Proper error codes and conditions will be defined in the next version of this draft.