This document is also available in these non-normative formats: XML.
Copyright © 2014 John Lumley, Christian Grün, Matthias Brantner and Florent Georges, published by the EXPath Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.
This specification was published by the EXPath Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.
This proposal provides an API for XPath 2.0 and XPath 3.0 to handle archive data (i.e. collected and possibly compressed sets of files and directories). It defines extension functions to process data from and to such archives files, including creation, determining and setting properties, listing and extracting contents and adding and updating entries. It has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage. Some additional features for use in XPath 3.0 are also defined.
1 Status of this document
2 Introduction
2.1 Namespace conventions
2.2 Error management
2.3 Archive representation
2.4 Archive types
2.5 Optional interfaces
3 Use cases
3.1 Creating a simple EPUB document
3.2 Examining a JAR file
4 Describing archives and entries
4.1 Archive properties and options
4.2 Entry descriptions
5 Loading and saving archives
6 Information about an archive and its contents
6.1 arch:options
6.2 arch:entry-names
6.3 arch:entries
7 Extracting entries from an archive
7.1 arch:extract-binary
7.2 arch:extract-text
8 Updating entries in an archive
8.1 arch:delete
8.2 arch:update
9 Creating an archive
9.1 arch:create
10 Creating and extracting complete archives from and to file systems
10.1 arch:from-files
10.2 arch:to-files
11 Convenience functions
11.1 arch:text
11.2 arch:xml
12 Functions using XSLT3.0 map() type
12.1 Using map types to describe entries and options
12.1.1 Archive property maps
12.1.2 Entry property maps
12.2 arch:options-map
12.3 arch:entries-map
12.4 arch:extract-map
12.5 arch:extract-binary-map
12.6 arch:extract-text-map
12.7 arch:create-map
12.8 arch:update-map
12.9 arch:delete-map
This document is in an interim draft stage. Comments are welcomed at public-expath@w3.org mailing list (archive).
The module defined by this document defines several functions, all contained in the
namespace http://expath.org/ns/archive
. In this document, the
arch
prefix, when used, is bound to this namespace URI.
Error codes are defined in the same namespace
(http://expath.org/ns/archive
), and in this document are displayed with the
same prefix, arch
.
Note:
This follows the suggestion (in late August 2013) for a coherent naming standard in EXPath modules.
Binary file I/O, to read and write complete archives to files, uses facilities defined in
[EXPath File], which defines functions in the namespace
http://expath.org/ns/file
. In this document, the file
prefix,
when used, is bound to this namespace URI.
Manipulation of binary data itself can employ functions from [EXPath Binary], which defines functions in the namespace http://expath.org/ns/binary
. In
this document, the bin
prefix, when used, is bound to this namespace URI.
Error conditions are identified by a code (a QName
.) When such an error
condition is reached in the evaluation of an expression, a dynamic error is thrown, with
the corresponding error code (as if the standard XPath function error()
had
been called.) The namespace of the code follows that of the module within whose processing
the error occurs, i.e. http://expath.org/ns/archive
for errors in archive
manipulation, http://expath.org/ns/file
for errors in file operations and
http://expath.org/ns/binary
for errors in processing binary data.
Archives in this module are represented principally as items of type
xs:base64Binary
, i.e. in their basic binary (byte sequence) forms.
Archives are treated as being arranged structurally as a description of overall options of the archive and a sequence of named entries. Each entry has:
A name, which is treated as a sequence of Unicode characters. In many cases
the solidus character (/
) is used to imply the entries being logically
arranged in positions within a directory tree, but this is not mandatory.
A set of properties, denoting at least the uncompressed size of the entry, archive internal properties for the entry, such as the compression method used on the stored data and other indications such as the date of last modification.
Data, treated as (possibly null) binary data.
It is most common that archives are considered to be arranged logically as directories,
using the entry names to denote paths and file names (e.g.
tests/qt3/archive/main.xml
) In such circumstances, archives may contain
entries to represent the directories themselves (e.g. tests/qt3/archive/
)
presumably with no data. [This could be used such that full extraction of an archive to a
file system generates empty output directories for example.] This specification makes no
distinction between these two cases – if an archive has an empty 'directory' entry it will
be treated similarly to any other 'file' entry. Semantic intrepretation of entry
names as files in directory trees is an application issue.
Note:
Behaviour when entries with duplicate names are detected in an archive is implementation dependent. Nevertheless, if an error is not thrown, only one entry should be returned when reading. Implementations must not write duplicate entries in result archives.
The module is designed to be able to support a number of different types of archive, providing a coherent access mechanism.
The following archive types are required to be supported:
[ZIP]: (which also covers derivative archive formats, such as JAR or OpenDocument.)
[GZIP] : A compressed archive of a sequence of files
Note:
Within GZIP names of entries (original file names) are optional, on a per-file basis, so special measures may need to be taken to handle 'unnamed' sections.
Specific issues arise from i) archives used in streaming situations, where the internal manifests of the archives cannot be completed until all data is written, ii) archives where the order of entries is important, such as JAR, where the mainfest entries need to be first.
Note:
Currently there are no proposals within this module to cover encrypted archives.
This module defines two distinctly different interface schemes for reporting on and manipulating archive data. The first uses XML-structured trees to describe entries, their names and their properties, leaving (binary) data described in separate arguments to or results from the functions defined. All conformant implementations must support this interface.
An alternative interface, using the proposed XPath3.0 map()
type (see
12 Functions using XSLT3.0 map() type), may be supported by an implementation. This significantly
increases the coherence of the connection between entries and their data (as binary data
can be the 'value' of a map entry), at the minor cost of having to specify entry order for
those archive usages which are order sensitive (e.g. EPUB). This map interface can
co-exist with the XML-structured one.
Development of this specification was driven by requirements which some XML developers regularly encounter in examining or generating data which is presented in archival forms. Some typical use cases include:
Manipulating EPUB documents.
Examining Java classes and resources stored in JAR
formats.
An [EPUB] document is a collection of content sections, written in XHTML,
with a metadata descriptor (usually the content.opf
file) and a navigation
description (usually the toc.ncx
file), all collected together and
potentially compressed in a ZIP format. A simple example of creating such a document in
XQuery is:
arch:create( ( "mimetype", "META-INF/container.xml", "OEBPS/content.opf", "OEBPS/Text/title.xhtml", "OEBPS/Text/chap01.xhtml", "OEBPS/toc.ncx" ), ( content:mimetype(), content:metainf(), content:oebps-content(), content:title(), content:chapter(), content:toc() ) )
The user-supplied XQuery function content:mimetype()
returns the appropriate
mimetype description for the EPUB document as a base64-encoding of a string
("application/epub+zip"). Each of the other content:*()
functions generates a
serialized form of the appropriate XML structure again in a base64 encoding, e.g.:
declare function content:title() as xs:base64Binary { bin:encode-string(fn:serialize( <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Title Page</title> </head> <body> <div> <h2 id="heading_id_2">Sample Book</h2> <h2 id="heading_id_3">A Sample .epub Book</h2> <h3 id="heading_id_4">Title Page</h3> </div> </body> </html> ))) };
For an EPUB document the mimetype entry must be uncompressed (so effectively it can be read by simple string searching), but other entries may be compressed.
JAR files contain class code and definitions for Java classes, in entries whose names are
path/classname
.class
. Local classes (classes
defined within a class) have separate code entries with a classname
outerclass
$
innerclass
.
To find all the main package-qualified classes the following XPath should suffice:
for $e in arch:entry-names(file:read-binary("lib/saxon9-sql.jar"))[ends-with(.,'.class') and not(contains(.,'$'))] return replace(replace($e,'\.class$',''),'/','.') => "net.sf.saxon.option.sql.SQLClose", "net.sf.saxon.option.sql.SQLColumn", "net.sf.saxon.option.sql.SQLConnect", ...., "net.sf.saxon.option.sql.SQLUpdate"
The properties of overall archives and individual entries at the XDM level are described by
small structured elements, with optional information attached. In common with
description of serialization parameters, these i) use child elements as the property key
and ii) place scalar values as the @value
attribute of that child.
Archive options and properties are described as a structured element
(element(arch:options)
) with the following child elements, all of whose
values are described in their @value
attribute:
arch:format: the type of the archive, e.g. "zip". This is mandatory.
arch:algorithm: the default compression used in the archive, e.g. "deflate".
Other attributes may be dependent upon the type of the archive and the implementation.
Entries within the archive can be accessed by name (xs:string
) or a
structured element (element(arch:entry)
). In the latter case the entry name
is the value of the @value
attribute of the arch:name
child.
When describing an existing entry in an archive, element(arch:entry)
may be
returned with the following (optional) children, all of whose values are described in the
@value
attribute:
arch:name: the (path) name of the entry. REQUIRED
arch:size: the original file size of the entry.
arch:compressed-size: the compressed file size of the entry, i.e. the number of bytes it occupies in the archive.
arch:last-modified: the date of last modification of this entry, in
xs:dateTime
notation.
arch:compression-level: an indicator of the level of (lossless?) compression.
When used to create or update an entry in an archive, element(arch:entry)
may also have the following (optional) children:
arch:name: the (path) name of the entry. REQUIRED
arch:last-modified: the date of last modification to be written on this
entry, in xs:dateTime
notation.
arch:compression-level: the level of (lossless?) compression to be used in writing the entry into the archive.
arch:encoding: the encoding to be used for converting textual items to a
byte sequence, prior to possible compression and writing to the archive. The only
values which every implementation is required to recognize are
utf-8
and utf-16
(In writing actions, unknown children are ignored. In the case of duplicate children, the value of the first child is taken.)
This module defines no specific functions for reading and writing archives from files, as distinct from their binary data. The EXPath File Module [EXPath File] provides two suitable functions to do this:
file:read-binary
($file as xs:string
) as
xs:base64Binary
. Returns the content of a file in its Base64
representation.
file:write-binary
($file as xs:string
, $value as
xs:base64Binary
) as empty-sequence()
. Writes a
Base64 item as binary to a file.
Note:
The functions detailed in 10.2 arch:to-files and 10.1 arch:from-files may be used to transfer between file system directory trees and archives in a single operation.
Returns a description of the type and properties of a given archive.
arch:options
($archive
as
xs:base64Binary
) as
element(arch:options)*
The description is returned as an element <arch:options>
with an
unordered sequence of child elements describing the details. The following are currently
supported:
arch:format
: format of this archivearch:algorithm
: the compression algorithm that was used.If the archive format supports a compression algorithm varying on a per-entry basis, and
more than one algorithm has been used in the archive, mixed
is returned for
arch:algorithm
.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Finding the properties of the archive stored in a file located at $uri
:
arch:options(file:read-binary($uri)) => <arch:options> <arch:format value="ZIP"/> <arch:algorithm value="deflate"/> </arch:options>
Returns the entry names for all the entries found within the archive as a sequence of string values in the order in which they appear in the archive.
arch:entry-names
($archive
as
xs:base64Binary
) as
xs:string*
Returns the entry names for all the entries found within the archive as a sequence of string values in the order in which they appear in the archive.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Returns the set of entry descriptors for all the entries found within the archive.
arch:entries
($archive
as
xs:base64Binary
) as
element(arch:entry)*
Each descriptor is an element <arch:entry>
whose text value is the path
of the file within the archive. For more details of this structure see 4.2 Entry descriptions.
The entries are returned in the order in which they encountered serially within the archive.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
There may be a case for providing a sorted version, probably using some form of collation.
Finding the entries of the archive stored in a file located at $uri
:
arch:entries(file:read-binary($uri)) => <arch:entry> <arch:name value="lumley.jpg"/> <arch:size value="2194"/> <arch:compressed-size value="652"/> <arch:last-modified value="2013-07-18T11:22:12"/> </arch:entry> <arch:entry size="84983" compressed-size="84872" last-modified="2009-03-23T11:15:06">lumley.jpg</arch:entry> <arch:entry size="10058" compressed-size="1381" last-modified="2013-08-06T13:14:08">tests/qt3/binary/binary.xml</arch:entry>
Summing the size of the apparent XML files in the previous example:
sum(arch:entries(file:read-binary($uri))[ends-with(arch:name/@value,'.xml')]arch:size/@value) => 10058
The module does not attempt to discern the 'type' of an entry (such as 'text', 'XML',
'raw-binary'), leaving that to the programmer. Two forms of reading result are supported:
raw binary (xs:base64Binary
) and decoded text (xs:string
).
Returns the sequence of requested entries from the archive as binary data.
arch:extract-binary ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:base64Binary* |
Returns as binary data each entry in the archive $archive
that corresponds to the
entry name input, in sequence.
The entries must be returned in the order corresponding to that of the
entries requested in $entries
, not in the order in which they may exist in the
archive.
Multiple requests for the same entry will be honoured, with copies of the entry appearing in corresponding multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
There have been suggestions for a signature arch:extract-binary($archive as
xs:base64Binary)
returning all the entries. In the absence of maps in the return
type, this does not make sense, since the entries are totally unlabelled, and to get
anything meaningful, a parallel call on arch:entries()
would be required.
Returning the binary data for an entry in the archive stored in a file located at
$uri
:
arch:extract-binary(file:read-binary($uri),'build.xml') => stuff
Returns the sequence of requested entries from the archive as strings. If
$encoding
is specified the strings are decoded appropriately, otherwise UTF-8
encoding is assumed.
arch:extract-text ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:string* |
arch:extract-text ( | $archive | as xs:base64Binary , |
$entries | as xs:string* , | |
$encoding | as xs:string ) as xs:string* |
Returns as a string each entry in the archive $archive
that corresponds to the
entry name input, in sequence.
If $encoding
is specified the strings are decoded appropriately, otherwise
UTF-8 encoding is assumed.
The entries must be returned in the order corresponding to that of the
entries requested in $entries
, not in the order in which they may exist in the
archive.
Multiple requests for the same entry will be honoured, with copies of the entry appearing in corresponding multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:unknown-encoding] is raised if the encoding requested is unknown or unsupported.
[arch:decoding-error] is raised if there was an error in decoding the entry.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
This function should be equivalent to the use of arch:extract-binary()
and the
function bin:decode-string()
from [EXPath Binary]:
arch:extract-binary($archive,$entries) ! bin:decode-string(.,$encoding) [XPath 3.0]
for $b in arch:extract-binary($archive,$entries) return bin:decode-string($b,$encoding) [XPath 2.0]
Further conversion into XML can be achieved using the XPath3.0 function
fn:parse-XML()
on each of the returned strings.
There have been suggestions for a signature arch:extract-text($archive as
xs:base64Binary)
returning all the entries. In the absence of maps in the return
type, this does not make sense, since the entries are totally unlabelled, and to get
anything meaningful, a parallel call on arch:entries()
would be required.
Returning the text data for an entry in the archive stored in a file located at
$uri
:
arch:extract-text(file:read-binary($uri),'build.xml','UTF-8') => stuff
There are two atomic actions available to change entries within an archive: complete deletion of an entry, or complete updating (overwriting) of that entry – the latter adds new entries when the given name does not already exist in the archive
Returns an archive with the given entries deleted.
arch:delete ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:base64Binary |
Returns an archive of the same format as $archive
with all the entries named in
$entries
deleted.
The relative order of the remaining entries within the archive is preserved.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
Duplicate entries in $entries
are ignored.
If $entries
is the empty sequence, the original archive shall be returned.
[arch:unknown-entry] is raised if an entry requested for deletion does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Whilst the uncompressed entries remaining after deletion should of course be the same size
and content as those before deletion, depending upon the (lossless) compression algorithm
used, the compressed sizes and content might not be. In the absence of a special check, in
these circumstances $archive
may not be identical to
arch:delete($archive,())
. This needs discussion.
Deleting the entries of the archive stored in a file located at $uri
:
arch:entries(arch:delete(file:read-binary($uri),'lumley.jpg')) => <arch:entry size="2194" compressed-size="652" last-modified="2013-07-18T11:22:12">build.xml</arch:entry> <arch:entry size="10058" compressed-size="1381" last-modified="2013-08-06T13:14:08">tests/qt3/binary/binary.xml</arch:entry>
Returns an archive with each of the given entries in $entries
updated to the
corresponding values in the sequence $new
. If an entry is not found, a new
entry is added to the end of the archive.
arch:update ( | $archive | as xs:base64Binary , |
$entries | as xs:string* , | |
$new | as xs:base64Binary* ) as xs:base64Binary |
arch:update ( | $archive | as xs:base64Binary , |
$entries | as xs:string* , | |
$new | as xs:base64Binary* , | |
$last-modified | as xs:dateTime ) as xs:base64Binary |
Returns an archive of the same format as $archive
with each of the given
entries in $entries
updated to the corresponding value in the sequence
$new
. If an entry is not found, a new entry for it is added to the end of the
archive.
The relative order of all the existing and replaced entries within the archive is preserved. New entries appear at the end of the archive in the order in which they were specified in the call.
If specified, and the format supports it, the last-modified date for each of the updated
entries will be set to $last-modified
. In the absence of such a parameter, it
is implementation-dependent whether last-modified information will be written on the updated
entries. If such default last-modification is written, it should be comparable to the value
of fn:current-dateTime()
in an XSLT environment.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
The compression methods of the updated entries shall be preserved.
When duplicate names appear in the entry list, the value of the entry in the resulting
archive will be that of the value of $new
corresponding to the
last matching entry name.
[arch:entry-data-mismatch] is raised if count($entries) ne
count($new)
.
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
New archives can be created in empty or filled states.
Returns a new archive with each of the given entries in $entries
set to the
corresponding values in the sequence $new
.
arch:create
($entries
as
xs:string*
, $new
as
xs:base64Binary*
) as
xs:base64Binary
arch:create ( | $entries | as xs:string* , |
$new | as xs:base64Binary* , | |
$options | as element(arch:options) ) as xs:base64Binary |
Returns an archive of format specified by $options
with each of the given
entries in $entries
set to the corresponding value in the sequence
$new
.
The relative order of new entries within the archive follows that of the input.
Content provided for any entry considered to be a directory is ignored.
When duplicate names appear in the entry list, the value of the entry in the resulting
archive will be that of the value of $new
corresponding to the
last matching entry name.
[arch:entry-data-mismatch] is raised if count($entries) ne
count($new)
.
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
Collects all the binary file contents from $files
and writes them into an new
archive which is returned.
arch:from-files
($files
as
xs:string*
) as
xs:base64Binary
Collects all the binary file contents from $files
and writes them into an new
archive which is returned.
All file content is collected in binary mode, with no attempt at any conversion or decoding.
File and directory path names are normalized to use the solidus ('/') path separator.
Directories are written as empty entries.
Error conditions from [EXPath File] may be raised if there are problems on reading from the filesystem, most noteably:
This function should be equivalent to the following XSLT function:
<xsl:function name="arch:from-files" as="xs:base64Binary"> <xsl:param name="files" as="xs:string*"/> <xsl:variable name="all" as="xs:string*" select="for $f in $files return if(file:is-dir($f)) then (for $f1 in file:list($f,true()) return concat($f,$f1)) else $f"/> <xsl:variable name="normalized.names" select="for $n in $all return replace($n,'\\','/')"/> <xsl:variable name="content" as="xs:base64Binary*" select="for $f in $normalized.names return if(file:is-dir($f)) then xs:base64Binary('') else file:read-binary($f)"/> <xsl:sequence select="arch:create($normalized.names,$content)"/> </xsl:function>
This function may be provided by an XSLT package (which will probably use functions from [EXPath File], and from which appropriate error conditions may be propagated, or caught within the package) or by a purpose-built extension function that may be able to support such an operation within a context of streaming processing.
Extracts all the entries from $archive
and writes them into an equivalent tree
of directories and files in the filesystem at the current directory.
arch:to-files
($archive
as
xs:base64Binary
) as
()
Extracts all the entries from $archive
and writes them into an equivalent tree
of directories and files in the filesystem at the current directory.
All entries are written in binary mode, with no attempt at any conversion or decoding.
Entry names are considered as file paths, with '/' and '\' separators normalized to the path separator for the execution operating system.
Necessary intermediate directories are created.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Error conditions from [EXPath File] may be raised if there are problems on writing to the filesystem, most noteably:
This function should be equivalent to the following XSLT function:
<xsl:function name="arch:to-files"> <xsl:param name="archive" as="xs:base64Binary"/> <xsl:variable name="entries" select="arch:entries($archive)"/> <xsl:variable name="dirs" select="$entries[ends-with(.,'/')]"/> <xsl:variable name="required.dirs" select="distinct-values(for $r in ($entries except $dirs) return replace($r,'/[^/]+$','/'))[ends-with(.,'/')]"/> <xsl:sequence select="for $d in distinct-values(($required.dirs,$dirs)) return file:create-dir(replace($d,'/$',''))"/> <xsl:sequence select="for $f in ($entries except $dirs) return file:write-binary($f,arch:extract-binary($archive,$f))"/> </xsl:function>
This function may be provided by an XSLT package (which will probably use functions from [EXPath File], and from which appropriate error conditions may be propagated, or caught within the package) or by a purpose-built extension function that may be able to support such an operation within a context of streaming processing.
A small number of convenience functions are defined for common cases of content, specifically to ensure that 'empty' entries (empty binary data) are produced for empty sequences, to ensure coherence between members of the parallel entry name and entry content sequences.
Encodes a string into binary data using a given encoding, suitable for content data for an entry.
arch:text
($in
as
xs:string*
) as
xs:base64Binary
arch:text
($in
as
xs:string*
, $encoding
as
xs:string
) as
xs:base64Binary
The $encoding
argument is the name of an encoding. The values for this
attribute follow the same rules as for the encoding
attribute in an XML
declaration. The only values which every implementation is required to
recognize are utf-8
and utf-16
.
If $encoding
is ommitted, utf-8
encoding is assumed.
If the value of $in
is the empty sequence, the function returns an empty
binary data. This is unlike bin:encode-string()
, which will return an
empty sequence.
[arch:unknown-encoding] is raised if $encoding
is invalid or not
supported by the implementation.
[error.encoding]is raised if there is an error or malformed input during encoding the string. Additional information about the error may be passed through suitable error reporting mechanisms – this is implementation-dependant.
Encodes the serialization of an XML tree into binary data using a given encoding, suitable for content data for an entry.
arch:xml
($args
as
item()*
) as
xs:base64Binary
arch:xml ( | $args | as item()* , |
$params | as element(output:serialization-parameters)? ) as xs:base64Binary |
arch:xml ( | $args | as item()* , |
$params | as element(output:serialization-parameters)? , | |
$encoding | as xs:string ) as xs:base64Binary |
The single-argument version of this function has the same effect as the two-argument
version called with $params
set to an empty sequence. This in turn is the same as the effect
of passing an output:serialization-parameters
element with no child
elements.
The $params
argument is used to identify a set of serialization parameters.
These are supplied in the form of an output:serialization-parameters
element,
having the format described in Section 3.1 Setting Serialization Parameters by Means of a Data Model Instance.
The $encoding
argument is the name of an encoding. The values for this
attribute follow the same rules as for the encoding
attribute in an XML
declaration. The only values which every implementation is required to
recognize are utf-8
and utf-16
.
If $encoding
is ommitted, utf-8
encoding is assumed.
[arch:unknown-encoding] is raised if $encoding
is invalid or not
supported by the implementation.
[error.encoding]is raised if there is an error or malformed input during encoding the string. Additional information about the error may be passed through suitable error reporting mechanisms – this is implementation-dependant.
This function is equivalent to
arch:text(fn:serialize($args,$params),$encoding)
.
The map type (map(xs:untypedAtomic,item()*)
) proposed for XSLT3.0 can increase
the coherence of the functions in this module significantly, mainly by retaining the
structured connection between the entry name and its properties and content. In addition the
properties of the overall archive (and its defaults for new entries) can similarly be
defined in a single map.
This section defines optional parallel functions to those above using maps for arguments or
results. In general these functions have separate names (e.g.
arch:entries-map()
) derived from a consistent suffix ('-map
')
attached to the standard, element-based form.
Note:
map:keys($map as map(*)) as xs:anyAtomicType*
returns the keys that are
present in a map, in unpredictable order. This means that if order within an archive is
important (either in extraction or updating) other mechanisms, such as the
position property, are needed to track or set that order.
Note:
It should be possible to implement all the functions in this section as user-defined XSLT3.0 functions using the library described above.
Note:
FOR DISCUSSION. In general when using maps for denoting the entries to be manipulated, the arguments could be considered to be a (possibly empty) sequence of maps that are treated as if concatentated. [THIS NEEDS THOUGHT ABOUT OVERWRITING/MERGING COMMON KEYS]. In this draft the arguments are single maps.
An archive is described as a map name -> properties
, where the properties
of each entry themselves are represented as a further map. The 'content', i.e. the real
data, of an archive entry is described by the content property of that map. Thus
a set of archive entries has type map(xs:string, map(xs:string,item()*))
Support for similar approaches using other map representations, such as [JSONiq] objects may be implementation dependent.
The properties of an archive itself, as opposed to its entries, can be described or defined with a map with the following entries:
Property | Type | Meaning |
---|---|---|
format | xs:string | The format of this archive |
compression | xs:string | The compression algorithm used for compressing the archive. |
Note:
Using a reserved name within an overall map (such as arch:options
) would
allow the options/properties for an archive to be stored alongside the entries
themselves.
Entries within the archive can be also be accessed or described by entries in a map
(map(xs:string,map(xs:string,item()*))
). In this case the map key gives
the (path)name of the archive entry (e.g. build/build-j.xml
) and the value
is a map of the properties of that entry.
The keys are described in the following table, and specific use is described under each of the functions:
Property | Type | Meaning |
---|---|---|
size | xs:integer | The original file size of the entry |
compressed-size | xs:integer | The compressed file size of the entry, i.e. the number of bytes it occupies in the archive |
last-modified | xs:dateTime | The date of last modification of this entry |
compression-level | xs:string | An indicator of the level of (lossless?) compression |
content | xs:base64Binary or xs:string | The value of the entry read from the archive. This will only be set from
arch:entries-map() if $return-content is requested in
the call. |
encoding | xs:string | The encoding to be used for converting textual items to or from a byte sequence.
The absence of such an entry implies binary content. The only values which every
implementation is required to recognize are utf-8
and utf-16 |
position | xs:integer | The position of the entry in the archive, starting at 1. |
Returns a description of the type and properties of a given archive as a map.
arch:options-map
($archive
as
xs:base64Binary
) as
map(xs:string,item()?)
The description is returned as a map map(xs:string,item()?)
with entries
describing the details. The following are currently supported:
If the archive format supports a compression algorithm varying on a per-entry basis, and
more than one algorithm has been used in the archive, mixed
is returned for the
compression entry.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Finding the properties of the archive stored in a file located at $uri
:
arch:options-map(file:read-binary($uri)) => map {'format' :'zip', 'compression' : 'deflate'}
Returns the entry descriptors for all the entries found within the archive as a map, optionally each with their content.
arch:entries-map ( | $archive | as xs:base64Binary ) as map(xs:string,map(xs:string,item()*)) |
arch:entries-map ( | $archive | as xs:base64Binary , |
$return-content | as xs:boolean ) as map(xs:string,map(xs:string,item()*)) |
Keys to the returned map are the entry (path) names.
The value for each map entry is a map describing the properties of that entry. For more details of this structure see 12.1.2 Entry property maps. The specific properties returned are:
$return-content
is defined and
equals true()
. The type will be xs:base64Binary
. [arch:read-error] is raised if there is an unspecified problem in reading the archive.
As the returned order of keys from map:keys()
is not defined and can be
implementation-dependant, the results of the function
arch:entry-names(xs:base64Binary) as xs:string*
can be used as a key sequence
to iterate through this map, or a sort based on the position property.
Using $return-content
makes it possible to return a complete archive in a
single call. Archive options can be added through a compound shown in the examples.
Finding the entries of the archive stored in a file located at $uri
:
arch:entries-map(file:read-binary($uri)) => map{ "build.xml" : map{ "size" : 2194, "compressed-size" : 652, "last-modified" : "2013-07-18T11:22:12"}, "lumley.jpg" : map{ "size" : 84983, "compressed-size" : 84872, "last-modified" : "2009-03-23T11:15:06"}, "tests/qt3/binary/binary.xml" : map{ "size" : 10058, "compressed-size" : 1381, "last-modified" : "2013-08-06T13:14:08"}}
Counting the number of apparent XML files in the previous example:
count(map:keys(arch:entries-map(file:read-binary($uri)))[ends-with(.,'.xml')]) => 2
Returning an archive complete with options:
map:new((map:new('arch:options',arch:options-map($archive)),arch:entries-map($archive))) => map{ "arch:options" : map{ "format" : "ZIP", "compression" : "flat" }, "build.xml" : map{ "size" : 2194, "compressed-size" : 652, "last-modified" : "2013-07-18T11:22:12"}, "lumley.jpg" : map{ "size" : 84983, "compressed-size" : 84872, "last-modified" : "2009-03-23T11:15:06"}, "tests/qt3/binary/binary.xml" : map{ "size" : 10058, "compressed-size" : 1381, "last-modified" : "2013-08-06T13:14:08"}}
Returns a copy of $entries
with the content entries set to binary or decoded
string data for the appropriate entry in the archive.
arch:extract-map ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) ) as map(xs:string,map(xs:string,item()?)) |
Return a copy of $entries
with the content property of each entry set
to binary or decoded string data for the appropriate entry in the archive.
The map entries in $entries
define whether binary or decoded string data is to
be returned. (For details of properties see 12.1.2 Entry property maps.) The only
relevant property is:
xs:string
according to the named encoding. If absent, then the type will be
xs:base64Binary
. The value for each map entry in the return is the original entry from $entries
plus an additional or replaced property:
xs:string
or xs:base64Binary
dependant upon the presence of the encoding property.The behaviour of this function is defined by equivalent XPath:
map:new(for $k in map:keys($entries) return let $a := $entries($k), $text := map:contains($a,'encoding'), $encoding := ($a('encoding'),'UTF-8')[1], $data := arch:extract-binary($archive,$k) // error if not found return map:entry($k, map:new(($a, map:entry('content',if($text) bin:decode-string($data,$encoding) else $data) )) )
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:decoding-error] is raised if there was an error in decoding an entry.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
As the original $entries
are returned in the result map, with content added,
other information , such as position is retained.
To collect all the XML entries as XML:
let $archive := file:read-binary($uri) $entries := arch:entries-map($archive), $xml-names := map:keys($entries)[ends-with(.,'.xml')], $get := map:new($xml-names ! map:entry(.,map:entry('encoding','UTF-8'))), $content := arch:extract-map($archive,$get) return $xml-names ! fn:parse-XML($content(.)('content'))
Returns the sequence of requested entries from the archive as binary data.
arch:extract-binary-map ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,item()*) ) as xs:base64Binary* |
Returns as binary data each entry in the archive $archive
that corresponds to
map:keys($entries)
, in sequence.
Any information in the values of each entry of $entries
is ignored.
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Collection of all the entries as binary data can also be accomplished using
arch:entries-map($archive,true())
and collecting the 'content'
entry from each of the returned maps.
Returns the sequence of requested entries from the archive as decoded string data.
arch:extract-text-map ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) ) as xs:string* |
arch:extract-text-map ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) , | |
$encoding | as xs:string ) as xs:string* |
Returns as decoded string data each entry in the archive $archive
that corresponds
to map:keys($entries)
, in sequence.
If $encoding
is specified, or the property encoding appears in the
entry in $entries
, the strings are decoded according to that encoding,
otherwise UTF-8 encoding is assumed.
The behaviour of this function is defined by equivalent XPath:
for $k in map:keys($entries) return let $a := $entries($k), $thisEncoding := ($a('encoding'),$encoding,'UTF-8')[1], $data := arch:extract-binary($archive,$k) // error if not found return bin:decode-string($data,$thisEncoding)
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:unknown-encoding] is raised if an encoding requested is unknown or unsupported.
[arch:decoding-error] is raised if there was an error in decoding an entry.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Returns a new archive with each of the given entries named as a key in
$entries
set to the corresponding value in
$entries($key)('content')
.
arch:create-map ( | $entries | as map(xs:string,map(xs:string,item()*)) ) as xs:base64Binary |
arch:create-map ( | $entries | as map(xs:string,map(xs:string,item()*)) , |
$options | as map(xs:string,item()*) ) as xs:base64Binary |
Returns an archive of format specified by $options
with each of the given
entries named as a key in $entries
set to the corresponding value in
$entries($key)('content')
, and with other properties defined by
$entries($key)(*)
or $options
.
The map $options
can contain properties both for the archive itself, and
defaults for each entry. Relevant properties for the archive (see also 12.1.1 Archive property maps) are:
Relevant properties for entries (see also 12.1.2 Entry property maps) are:
xs:string
to binary according to the named encoding. If absent, then
content is assumed of type xs:base64Binary
. The only values which
every implementation is required to recognize are utf-8
and utf-16
.xs:string
or
xs:base64Binary
, dependent upon encoding.The relative order of entries within the archive follows that of the position property, if specified, followed by all those lacking such a property, in an implementation-dependant order. The specific ordering is equivalent to:
<xsl:variable name="$keys" select="map:keys($entries)"/> <xsl:variable name="positioned" as="xs:string*"> <xsl:perform-sort select="$keys[map:contains($entries(.),'position']"> <xsl:sort select="$entries(.)('position')"/> </xsl:perform-sort> </xsl:variable> <xsl:for-each select="$positioned, $keys[not(.=$positioned)]"> .... process .... </xsl:for-each>
If $options
is specified, the overall archive properties (and defaults for the
entries) are set to those specified in the map.
[arch:read-error] is raised if there was an unspecified problem in creating the archive.
[arch:duplicate-position] is raised if two or more entries request the same position in the archive.
Returns an archive with each of the given entries in the keys of $entries
updated to the corresponding values in the $entries($key)('content')
and with
other properties defined by $entries($key)(*)
. If an entry is not found, a new
entry is added to the end of the archive.
arch:update-map ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()*)) ) as xs:base64Binary |
arch:update-map ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()*)) , | |
$default.options | as map(xs:string,item()*) ) as xs:base64Binary |
Returns an archive of the same format as $archive
with each of the given
entries in the keys of $entries
updated to the corresponding values in the
$entries($key)('content')
and with other properties defined by
$entries($key)(*)
or $default.options
. If an entry is not found,
a new entry is added to the end of the archive. Relevant properties (see also 12.1.2 Entry property maps) are:
xs:string
to binary according to the named encoding. If absent, then
content is assumed of type xs:base64Binary
. The only values which
every implementation is required to recognize are utf-8
and utf-16
.xs:string
or
xs:base64Binary
, dependent upon encoding.If $options
is specified, values will be used for the default properties for
each entry, which may be overloaded by the property map for each individual entry.
The relative order of all the existing and replaced entries within the archive is preserved. New entries appear at the end of the archive: any which have a position property specified, are ordered according to that property, followed by any others in an implementation-dependent order.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
The compression methods of the updated entries shall be preserved.
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
[arch:duplicate-position] is raised if two or more entries request the same position in the archive.
Using the $default
map a common compression method, last-modification date and
similar can be set for a set of entries, whose minimal map entries are
map{"content":=$content}
Returns an archive with the given entries deleted.
arch:delete-map ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,item()*) ) as xs:base64Binary |
Returns an archive of the same format as $archive
with all the entries named in
map:keys($entries)
deleted.
The relative order of the remaining entries within the archive is preserved.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
If $entries
is an empty map, the original archive shall be returned.
Any information in the values of each entry of $entries
is ignored.
[arch:unknown-entry] is raised if an entry requested for deletion does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Whilst the uncompressed entries remaining after deletion should of course be the same size
and content as those before deletion, depending upon the (lossless) compression algorithm
used, the compressed sizes and content might not be. In the absence of a special check,
implied in the rules,$archive
may not be identical to
arch:delete-map($archive,map:new())
.
Errors possibly generated by code executed from module [EXPath File]: