This document is also available in these non-normative formats: XML.
Copyright © 2013 John Lumley, Christian Grün, Matthias Brantner and Florent Georges, published by the EXPath Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.
This specification was published by the EXPath Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.
This proposal provides an API for XPath 2.0 and XPath 3.0 to handle archive data (i.e. collected and possibly compressed sets of files and directories). It defines extension functions to process data from and to such archives files, including creation, determining and setting properties, listing and extracting contents and adding and updating entries. It has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage. Some additional features for use in XPath 3.0 are also defined.
1 Status of this document
2 Introduction
2.1 Namespace conventions
2.2 Error management
2.3 Archive representation
2.4 Archive types
3 Use cases
3.1 Creating a simple EPUB document
3.2 Examining a JAR file
3.3 Extracting a ZIP archive to a file system
4 Describing archives and entries
4.1 Archive properties and options
4.2 Entry descriptions
4.3 Using map types to describe entries and options
5 Loading and saving archives
6 Information about an archive and its contents
6.1 arch:options
6.2 arch:entries
7 Extracting entries from an archive
7.1 arch:extract-binary
7.2 arch:extract-text
8 Updating entries in an archive
8.1 arch:delete
8.2 arch:update
9 Creating an archive
9.1 arch:create
10 Functions using XPath3.0 map() type
10.1 Archive property maps
10.2 Entry property maps
10.3 archM:options
10.4 archM:entries
10.5 archM:entry-names
10.6 archM:extract
10.7 archM:extract-binary
10.8 archM:extract-text
10.9 archM:create
10.10 archM:update
10.11 archM:delete
This document is in an interim draft stage. Comments are welcomed at public-expath@w3.org mailing list (archive).
The module defined by this document defines several functions, all contained in the
namespace http://expath.org/ns/archive
. In this document, the
arch
prefix, when used, is bound to this namespace URI.
Alternative versions of these functions using the proposed XPath3.0
map()
type (see 4.3 Using map types to describe entries and options) are defined in the
namespace http://expath.org/ns/archive-map
. In this document, the
archM
prefix, when used, is bound to this namespace URI.
Error codes are defined in the same namespace
(http://expath.org/ns/archive
), and in this document are displayed
with the same prefix, arch
.
Note:
This follows the suggestion (in late August 2013) for a coherent naming standard in EXPath modules.
Binary file I/O, to read and write complete archives to files, uses facilities
defined in [EXPath File], which defines functions in the namespace
http://expath.org/ns/file
. In this document, the file
prefix,
when used, is bound to this namespace URI.
Manipulation of binary data itself can employ functions from [EXPath Binary], which defines functions in the namespace
http://expath.org/ns/binary
. In this document, the bin
prefix,
when used, is bound to this namespace URI.
Error conditions are identified by a code (a QName
.) When such an error
condition is reached in the evaluation of an expression, a dynamic error is thrown,
with the corresponding error code (as if the standard XPath function
error()
had been called.)
Archives in this module are represented principally as items of type
xs:base64Binary
, i.e. in their basic binary (byte sequence)
forms.
Archives are treated as being arranged structurally as a description of overall options of the archive and a sequence of named entries. Each entry has:
A name, which is treated as a sequence of Unicode characters. In many
cases the solidus character (/
) is used to imply the entries being
logically arranged in positions within a directory tree, but this is not
mandatory.
A set of properties, denoting at least the uncompressed size of the entry, archive internal properties for the entry, such as the compression method used on the stored data and other indications such as the date of last modification.
Data, treated as (possibly null) binary data.
It is most common that archives are considered to be arranged logically as
directories, using the entry names to denote paths and file names (e.g.
tests/qt3/archive/main.xml
) In such circumstances, archives may
contain entries to represent the directories themselves (e.g.
tests/qt3/archive/
) presumably with no data. [This could be used such
that full extraction of an archive to a file system generates empty output
directories for example.] This specification makes no distinction between these two
cases - if an archive has an empty 'directory' entry it will be treated similarly to
any other 'file' entry. Semantic intrepretation of entry names as files in
directory trees is an application issue.
Note:
Behaviour when entries with duplicate names are detected in an archive is implementation dependent. Nevertheless, if an error is not thrown, only one entry should be returned when reading. Implementations must not write duplicate entries in result archives.
The module is designed to be able to support a number of different types of archive, providing a coherent access mechanism.
The following archive types are required to be supported:
[ZIP]: (which also covers derivative archive formats, such as JAR or OpenDocument.)
[GZIP] : A compressed archive of a sequence of files
Note:
Within GZIP names of entries (original file names) are optional, on a per-file basis, so special measures may need to be taken to handle 'unnamed' sections.
Specific issues arise from i) archives used in streaming situations, where the internal manifests of the archives cannot be completed until all data is written, ii) archives where the order of entries is important, such as JAR, where the mainfest entries need to be first.
Note:
Currently there are no proposals within this module to cover encrypted archives.
Development of this specification was driven by requirements which some XML developers regularly encounter in examining or generating data which is presented in archival forms. Some typical use cases include:
Manipulating EPUB documents.
Examining Java classes and resources stored in JAR
formats.
An [EPUB] document is a collection of content sections, written in
XHTML, with a metadata descriptor (usually the content.opf
file) and a
navigation description (usually the toc.ncx
file), all collected
together and potentially compressed in a ZIP format. A simple example of creating
such a document in XQuery is:
arch:create( ( { "name" : "mimetype", "compression" : "store" }, "META-INF/container.xml", "OEBPS/content.opf", "OEBPS/Text/title.xhtml", "OEBPS/Text/chap01.xhtml", "OEBPS/toc.ncx" ), ( content:mimetype(), content:metainf(), content:oebps-content(), content:title(), content:chapter(), content:toc() ) )
The user-supplied XQuery function content:mimetype()
returns the
appropriate mimetype description for the EPUB document as a string
("application/epub+zip"). Each of the other content:*()
functions
generates a serialized form of the appropriate XML structure, e.g.:
declare function content:title() as xs:string { fn:serialize( <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Title Page</title> </head> <body> <div> <h2 id="heading_id_2">Sample Book</h2> <h2 id="heading_id_3">A Sample .epub Book</h2> <h3 id="heading_id_4">Title Page</h3> </div> </body> </html> ) };
Using a map struture to define an entry enables properties such as compression to be altered on on entry-by-entry basis. For and EPUB document the mimetype entry must be uncompressed (so effectively it can be read by simple string searching), but other entries may be compressed.
JAR files contain class code and definitions for Java classes, in entries whose names
are path/classname.class
. Local classes (classes defined
within a class) have separate code entries with a classname
outerclass$innerclass
. To find all the
main package-qualified classes the following XPath should suffice:
for $e in arch:entries(file:read-binary("lib/saxon9-sql.jar"))[ends-with(.,'.class') and not(contains(.,'$'))] return replace(replace($e,'\.class$',''),'/','.') => "net.sf.saxon.option.sql.SQLClose", "net.sf.saxon.option.sql.SQLColumn", "net.sf.saxon.option.sql.SQLConnect", ...., "net.sf.saxon.option.sql.SQLUpdate"
Assuming the ZIP file in question has (empty) entries denoting any directories required, the following XSLT will unzip an archive to the current directory, using the file writing functions of [EXPath File]:
<xsl:variable name="arch" select="file:read-binary($uri)"/> <xsl:variable name="entries" select="arch:entries($arch)"/> <xsl:variable name="dirs" select="$entries[ends-with(.,'/')]"/> <xsl:variable name="required.dirs" select="for $r in distinct-values(($entries except $dirs) return replace($r,'/[^/]+$','/'))[ends-with(.,'/')]"/> <xsl:sequence select="for $d in distinct-values(($required.dirs,$dirs)) return file:create-dir(replace($d,'/$',''))"/> <xsl:sequence select="for $f in ($entries except $dirs) return file:write-binary($f,arch:extract-binary($arch,$f))"/>
(file:create-dir()
creates necessary intermediate directories, so
$dirs
does not need to be in a sorted order. If the ZIP archive does
not have entries for all directories, further intermediate code is
required to identify those missing.)
The properties of overall archives and individual entries at the XDM level are described by small structured elements, with optional information attached. In this proposal this information is attached as attributes.
Note:
Parallels with XPath 3.0 serialization parameters, which are now sets of (element)
nodes, become awkward. In arch:entry
we would need to add an element
arch:name
to hold the name of an entry, rather than rely on the
string value. The major point in favour of using elements rather than attributes
would be where we need to read or set complex structured parameters, such as
character maps. This needs discussion.
Archive options and properties are described as a structured element
(element(arch:options)
) with the following attributes:
format
: the type of the archive, e.g. "zip". This is
mandatory.
algorithm
: the default compression used in the archive, e.g.
"deflate".
Other attributes may be dependent upon the type of the archive and the implementation.
Entries within the archive can be accessed by name (xs:string
) or a
structured element (element(arch:entry)
). In the latter case the entry
name is the string value of the element.
When describing an existing entry in an archive, element(arch:entry)
may
be returned with the following optional attributes:
size
: the original file size of the entry.
compressed-size
: the compressed file size of the entry, i.e. the
number of bytes it occupies in the archive.
last-modified
: the date of last modification of this entry, in
xs:dateTime
notation.
compression-level
: an indicator of the level of (lossless?)
compression.
When used to create or update an entry in an archive,
element(arch:entry)
may also have the following optional
attributes:
last-modified
: the date of last modification to be written on this
entry, in xs:dateTime
notation.
compression-level
: the level of (lossless?) compression to be used
in writing the entry into the archive.
encoding
: the encoding to be used for converting textual items to
a byte sequence, prior to possible compression and writing to the archive.
(In writing actions, unknown attributes are ignored.)
Proposals in XPath 3.0 have been made for a type
map(xs:untypedAtomic,item()*)
, which could be exploited beneficially
for manipulating archives, using the entry name as the key and the
(xs:base64Binary
) value of the entry as the corresponding value in
the map. These maps could be used both as output (in arch:entries()
and
arch:extract-[text|binary]()
) or for input (in
arch:update()
and arch:create()
). Equally well such maps
can be used, reading keys only in arch:delete()
.
An attractive alternative would be for each entry itself to be a map($property
as xs:string, $value as item()*)
, with suitable keys, e.g.
content
-> xs:base64Binary
. Thus the entry 'set' can be
a map map(xs;string, map(xs:string,item()*))
.
Note:
Functions using such maps for arguments and results could either have separate
names (e.g. arch:entries-as-map()
) or be defined in a separate
namespace (archM:entries()
) - in this current draft the second is
used. Details are discussed in 10 Functions using XPath3.0 map() type
Support for similar approaches using other map representations, such as [JSONiq] objects may be implementation dependent.
This module defines no specific functions for reading and writing archives from files, as distinct from their binary data. The EXPath File Module [EXPath File] provides two suitable functions to do this:
file:read-binary
($file as xs:string
) as
xs:base64Binary
. Returns the content of a file in its Base64
representation.
file:write-binary
($file as xs:string
,
$value as xs:base64Binary
) as
empty-sequence()
. Writes a Base64 item as binary to a file.
Note:
There may be some desire for some convenience functions arch:write($file as
which does
creation and file writing as one action.xs:string
,....) as empty-sequence()
Returns a description of the type and properties of a given archive.
arch:options
($archive
as
xs:base64Binary
) as
element(arch:options)*
The description is returned as an element <arch:options>
with an
unordered sequence of child elements describing the details. The following are currently
supported:
arch:format
: format of this archivearch:algorithm
: the compression algorithm that was used.If the archive format supports a compression algorithm varying on a per-entry basis, and
more than one algorithm has been used in the archive, mixed
is returned for
arch:algorithm
.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Finding the properties of the archive stored in a file located at
$uri
:
arch:options(file:read-binary($uri)) => <arch:options> <arch:format>ZIP</arch:format> <arch:algorithm>deflate</arch:algorithm> </arch:options>
Returns the set of entry descriptors for all the entries found within the archive.
arch:entries
($archive
as
xs:base64Binary
) as
element(arch:entry)*
Each descriptor is an element <arch:entry>
whose text value is the
path of the file within the archive. For more details of this structure see 4.2 Entry descriptions.
The entries are returned in the order in which they encountered serially within the archive.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
There may be a case for providing a sorted version, probably using some form of collation.
Finding the entries of the archive stored in a file located at $uri
:
arch:entries(file:read-binary($uri)) => <arch:entry size="2194" compressed-size="652" last-modified="2013-07-18T11:22:12">build.xml</arch:entry> <arch:entry size="84983" compressed-size="84872" last-modified="2009-03-23T11:15:06">lumley.jpg</arch:entry> <arch:entry size="10058" compressed-size="1381" last-modified="2013-08-06T13:14:08">tests/qt3/binary/binary.xml</arch:entry>
Counting the number of apparent XML files in the previous example:
count(arch:entries(file:read-binary($uri))[ends-with(.,'.xml')]) => 2
The module does not attempt to discern the 'type' of an entry (such as 'text', 'XML',
'raw-binary'), leaving that to the programmer. Two forms of reading result are
supported: raw binary (xs:base64Binary
) and decoded text
(xs:string
).
Returns the sequence of requested entries from the archive as binary data.
arch:extract-binary ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:base64Binary* |
Returns as binary data each entry in the archive $in
that corresponds to
the entry name input, in sequence.
The entries must be returned in the order corresponding to that of
the entries requested in $entries
, not in the order in which they may exist
in the archive.
Multiple requests for the same entry will be honoured, with copies of the entry appearing in corresponding multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
There have been suggestions for a signature arch:extract-binary($archive as
xs:base64Binary)
returning all the entries. In the absence of maps in the
return type, this does not make sense, since the entries are totally unlabelled, and to
get anything meaningful, a parallel call on arch:entries()
would be
required.
Returning the binary data for an entry in the archive stored in a file located at
$uri
:
arch:extract-binary(file:read-binary($uri),'build.xml') => stuff
Returns the sequence of requested entries from the archive as strings. If
$encoding
is specified the strings are decoded appropriately, otherwise
UTF-8 encoding is assumed.
arch:extract-text ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:string* |
arch:extract-text ( | $archive | as xs:base64Binary , |
$entries | as xs:string* , | |
$encoding | as xs:string ) as xs:string* |
Returns as a string each entry in the archive $in
that corresponds to the
entry name input, in sequence.
If $encoding
is specified the strings are decoded appropriately, otherwise
UTF-8 encoding is assumed.
The entries must be returned in the order corresponding to that of
the entries requested in $entries
, not in the order in which they may exist
in the archive.
Multiple requests for the same entry will be honoured, with copies of the entry appearing in corresponding multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:unknown-encoding] is raised if the encoding requested is unknown or unsupported.
[arch:decoding-error] is raised if there was an error in decoding the entry.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
This function should be equivalent to the use of arch:extract-binary()
and
the function bin:decode-string()
from [EXPath Binary]:
arch:extract-binary($in,$entries) ! bin:decode-string(.,$encoding) [XPath 3.0]
for $b in arch:extract-binary($in,$entries) return bin:decode-string($b,$encoding) [XPath 2.0]
Further conversion into XML can be achieved using the XPath3.0 function
fn:parse-XML()
on each of the returned strings.
There have been suggestions for a signature arch:extract-text($archive as
xs:base64Binary)
returning all the entries. In the absence of maps in the
return type, this does not make sense, since the entries are totally unlabelled, and to
get anything meaningful, a parallel call on arch:entries()
would be
required.
Returning the text data for an entry in the archive stored in a file located at
$uri
:
arch:extract-text(file:read-binary($uri),'build.xml','UTF-8') => stuff
There are two atomic actions available to change entries within an archive: complete deletion of an entry, or complete updating (overwriting) of that entry - the latter adds new entries when the given name does not already exist in the archive
Returns an archive with the given entries deleted.
arch:delete ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:base64Binary |
Returns an archive of the same format as $in
with all the entries named in
$entries
deleted.
The relative order of the remaining entries within the archive is preserved.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
Duplicate entries in $entries
are ignored.
If $entries
is the empty sequence, the original archive shall be
returned.
[arch:unknown-entry] is raised if an entry requested for deletion does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Whilst the uncompressed entries remaining after deletion should of course be the same
size and content as those before deletion, depending upon the (lossless) compression
algorithm used, the compressed sizes and content might not be. In the absence of a
special check, in these circumstances $in
may not be identical to
arch:delete($in,())
. This needs discussion.
Deleting the entries of the archive stored in a file located at
$uri
:
arch:entries(arch:delete(file:read-binary($uri),'lumley.jpg')) => <arch:entry size="2194" compressed-size="652" last-modified="2013-07-18T11:22:12">build.xml</arch:entry> <arch:entry size="10058" compressed-size="1381" last-modified="2013-08-06T13:14:08">tests/qt3/binary/binary.xml</arch:entry>
Returns an archive with each of the given entries in $entries
updated to
the corresponding values in the sequence $new
. If an entry is not found, a
new entry is added to the end of the archive.
arch:update ( | $archive | as xs:base64Binary , |
$entries | as xs:string* , | |
$new | as xs:base64Binary* ) as xs:base64Binary |
arch:update ( | $archive | as xs:base64Binary , |
$entries | as xs:string* , | |
$new | as xs:base64Binary* , | |
$last-modified | as xs:dateTime ) as xs:base64Binary |
Returns an archive of the same format as $in
with each of the given entries
in $entries
updated to the corresponding value in the sequence
$new
. If an entry is not found, a new entry for it is added to the end
of the archive.
The relative order of all the existing and replaced entries within the archive is preserved. New entries appear at the end of the archive in the order in which they were specified in the call.
If specified, and the format supports it, the last-modified date for each of the updated
entries will be set to $last-modified
. In the absence of such a parameter,
it is implementation-dependent whether last-modified information will be written on the
updated entries. If such default last-modification is written, it should be comparable
to the value of fn:current-dateTime()
in an XSLT environment.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
The compression methods of the updated entries shall be preserved.
When duplicate names appear in the entry list, the value of the entry in the resulting
archive will be that of the value of $new
corresponding to the
last matching entry name.
[arch:entry-data-mismatch] is raised if count($entries) ne
count($new)
.
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
new archives need to be created
Returns a new archive with each of the given entries in $entries
set to the
corresponding values in the sequence $new
.
arch:create
($entries
as
xs:string*
, $new
as
xs:base64Binary*
) as
xs:base64Binary
arch:create ( | $entries | as xs:string* , |
$new | as xs:base64Binary* , | |
$options | as element(arch:options) ) as xs:base64Binary |
Returns an archive of format specified by $options
with each of the given
entries in $entries
set to the corresponding value in the sequence
$new
.
The relative order of new entries within the archive follows that of the input.
When duplicate names appear in the entry list, the value of the entry in the resulting
archive will be that of the value of $new
corresponding to the
last matching entry name.
[arch:entry-data-mismatch] is raised if count($entries) ne
count($new)
.
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
Maps proposed for XPath3.0 can increase the coherence of the functions in the module, mainly by retaining the structured connection between the entry name and its properties and content. In addition the properties of the overall archive (and its defaults for new entries) can similarly be defined in a single map.
This section proposes parallel functions to those above using maps.
Note:
map:keys($map as map(*)) as xs:anyAtomicType*
returns the keys that are
present in a map, in unpredictable order. This means that if order within an archive
is important (either in extraction or updating) other mechanisms may be needed.
In general when using maps for denoting the entries to be manipulated, the arguments might be considered to be a (possibly empty) sequence of maps that are treated as if concatentated. [THIS NEEDS THOUGHT ABOUT OVERWRITING/MERGING COMMON KEYS]
Using a reserved name within the overall map (such as arch:options
)
would allow the options/properties for an archive to be stored alongside the
entries.
Entries within the archive can be also be accessed or described by entries in a map
(map(xs:string,map(xs:string,item()*))
). In this case the map key
gives the (path)name of the archive entry (e.g. build/build-j.xml
) and
the value is a map of the properties of that entry.
The following keys are provided when reporting on entries:
size
: the original file size of the entry as
xs:integer
compressed-size
: the compressed file size of the entry as
xs:integer
, i.e. the number of bytes it occupies in the
archive.
last-modified
: the date of last modification of this entry, in
xs:dateTime
notation
compression-level
: an indicator of the level of (lossless?)
compression.
content
: the value of the entry read from the archive, as
xs:base64Binary
. This will only be set if
$return-content
is requested in the call to
archM:entries()
.
When used to extract an entry from an archive, this map may have the following optional key/value pairs:
encoding
: the encoding to be used for converting textual items
from a byte sequence.
When used to create or update an entry in an archive, this map may have the following optional key/value pairs:
content
: the value of the entry to be written in the archive,
either as xs:base64Binary
or, when encoding
is set,
as xs:string
.
Note:
This is awkward - why not just insist on xs:base64Binary
and let the programmer encode?
last-modified
: the date of last modification to be written on this
entry, in xs:dateTime
notation
compression-level
: the level of (lossless?) compression to be used
in writing the entry into the archive.
encoding
: the encoding to be used for converting textual items to
a byte sequence, prior to possible compression and writing to the archive.
Returns a description of the type and properties of a given archive as a map.
archM:options
($archive
as
xs:base64Binary
) as
map(xs:string,item()?)
The description is returned as a map map(xs:string,item()?)
with entries
describing the details. The following are currently supported:
format
: format of this archivecompression
: the compression algorithm that was used.If the archive format supports a compression algorithm varying on a per-entry basis, and
more than one algorithm has been used in the archive, mixed
is returned for
the compression
entry.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Finding the properties of the archive stored in a file located at
$uri
:
archM:options(file:read-binary($uri)) => {'format' :'zip', 'compression' : 'deflate'}
Returns the entry descriptors for all the entries found within the archive as a map, optionally each with their content.
archM:entries ( | $archive | as xs:base64Binary ) as map(xs:string,map(xs:string,item()*)) |
archM:entries ( | $archive | as xs:base64Binary , |
$return-content | as xs:boolean ) as map(xs:string,map(xs:string,item()*)) |
Keys to the returned map are the entry (path) names.
The value for each map entry is a map describing the properties of that entry. For more details of this structure see 10.2 Entry property maps.
If $return-content
is defined and equals true()
, then the
content for each entry is returned as the content
entry in the property
map, as a xs:base64Binary
item.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
As the returned order of keys from map:keys()
is not defined and can be
implementation-dependant, there may be a need for a simple function
(archM:entry-names(xs:base64Binary) as xs:string*)
which returns purely
the names in the order in which they appear in the archive.
Using $return-content
makes it possible to return a complete archive in a
single call. (What about the archive options?
Finding the entries of the archive stored in a file located at $uri
:
archM:entries(file:read-binary($uri)) => map{ "build.xml" := map{ "size":=2194, "compressed-size":=652, "last-modified":="2013-07-18T11:22:12"}, "lumley.jpg" := map{ "size":=84983, "compressed-size":=84872, "last-modified":="2009-03-23T11:15:06"}, "tests/qt3/binary/binary.xml" := map{ "size":=10058, "compressed-size":=1381, "last-modified":="2013-08-06T13:14:08"}}
Counting the number of apparent XML files in the previous example:
count(map:keys(archM:entries(file:read-binary($uri)))[ends-with(.,'.xml')]) => 2
Returns the entry names for all the entries found within the archive as a sequence of string values in the order in which they appear in the archive.
archM:entry-names
($archive
as
xs:base64Binary
) as
xs:string*
Returns the entry names for all the entries found within the archive as a sequence of string values in the order in which they appear in the archive.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Returns a copy of $entries
with the content entries set to binary or
decoded string data for the appropriate entry in the archive.
archM:extract ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) ) as map(xs:string,map(xs:string,item()?)) |
The map entries in $entries
define whether binary or decoded string data is
to be returned.
The behaviour of this function is defined by equivalent XPath:
map:new(for $k in map:keys($entries) return let $a := $entries($k), $text := map:contains($a,'encoding'), $encoding := ($a('encoding'),'UTF-8')[1], $data := arch:extract-binary($archive,$k) // error if not found return map:entry($k, map:new(($a, map:entry('content',if($text) bin:decode-string($data,$encoding) else $data) )) )
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
To collect all the XML entries as XML:
let $archive := file:read-binary($uri) $entries := archM:entries($archive), $xml-names := map:keys($entries)[ends-with(.,'.xml')], $get := map:new($xml-names ! map:entry(.,map:entry('encoding','UTF-8'))), $content := archM:extract($archive,$get) return $xml-names ! fn:parse-XML($content(.)('content'))
Returns the sequence of requested entries from the archive as binary data.
archM:extract-binary ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) ) as xs:base64Binary* |
archM:extract-binary ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:base64Binary* |
Returns as binary data each entry in the archive $in
that corresponds to
the entry name input, or map:keys($entries)
, in sequence.
When $entries
has type xs:string*
, the entries
must be returned in the order corresponding to that of the entries
requested in $entries
, not in the order in which they may exist in the
archive.
When $entries
has type xs:string*
, multiple requests for the
same entry will be honoured, with copies of the entry appearing in corresponding
multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Collection of all the entries as binary data can be accomplished using
archM:entries($archive,true())
and collecting the 'content'
entry from each of the returned maps.
The signatures with $entries instance of xs:string*
are equivalent to
arch:extract-binary()
.
Returns the sequence of requested entries from the archive as decoded string data.
archM:extract-text ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) ) as xs:string* |
archM:extract-text ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) , | |
$encoding | as xs:string ) as xs:string* |
archM:extract-text ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:string* |
archM:extract-text ( | $archive | as xs:base64Binary , |
$entries | as xs:string* , | |
$encoding | as xs:string ) as xs:string* |
Returns as decoded string data each entry in the archive $in
that
corresponds to the entry name input, or map:keys($entries)
, in
sequence.
When $entries
has type xs:string*
, the entries
must be returned in the order corresponding to that of the entries
requested in $entries
, not in the order in which they may exist in the
archive.
When $entries
has type xs:string*
, multiple requests for the
same entry will be honoured, with copies of the entry appearing in corresponding
multiple locations in the output sequence.
If $encoding
is specified, or the field 'decoding'
appears in
the entry in $entries
, the strings are decoded according to that encoding,
otherwise UTF-8 encoding is assumed.
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:unknown-encoding] is raised if an encoding requested is unknown or unsupported.
[arch:decoding-error] is raised if there was an error in decoding an entry.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
The signatures with $entries instance of xs:string*
are equivalent to
arch:extract-text()
.
Returns a new archive with each of the given entries named as a key in
$entries
set to the corresponding value in
$entries($key)('content')
.
archM:create ( | $entries | as map(xs:string,map(xs:string,item()?))* ) as xs:base64Binary |
archM:create ( | $entries | as map(xs:string,map(xs:string,item()?))* , |
$options | as map(xs:string,item()) ) as xs:base64Binary |
Returns an archive of format specified by $options
with each of the given
entries named as a key in $entries
set to the corresponding value in
$entries($key)('content')
..
The relative order of new entries within the archive follows that of the input.
If $options
is specified, the overall archive properties (and defaults for
the entries) are set to those specified in the map.
[arch:read-error] is raised if there was an unspecified problem in creating the archive.
Returns an archive with each of the given entries in the keys of $entries
updated to the corresponding values in the $entries($key)('content')
and
with other properties defined by $entries($key)(*)
. If an entry is not
found, a new entry is added to the end of the archive.
archM:update ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) ) as xs:base64Binary |
archM:update ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()?)) , | |
$default | as map(xs:string,item()) ) as xs:base64Binary |
Returns an archive with each of the given entries in the keys of $entries
updated to the corresponding values in the $entries($key)('content')
and
with other properties defined by $entries($key)(*)
. If an entry is not
found, a new entry is added to the end of the archive.
If $options
is specified, values will be used for the default properties
for each entry, which may be overloaded by the property map for each individual
entry.
The relative order of all the existing and replaced entries within the archive is preserved. New entries appear at the end of the archive in the order in which they were specified in the call.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
The compression methods of the updated entries shall be preserved.
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
Using the $default
map a common compression method, last-modification date
and similar can be set for a set of entries, whose minimal map entries are
map{"content":=$content}
Returns an archive with the given entries deleted.
archM:delete ( | $archive | as xs:base64Binary , |
$entries | as xs:string* ) as xs:base64Binary |
archM:delete ( | $archive | as xs:base64Binary , |
$entries | as map(xs:string,map(xs:string,item()))* ) as xs:base64Binary |
Returns an archive of the same format as $in
with all the entries named in
$entries
or $entries!map:keys(.)
deleted.
The relative order of the remaining entries within the archive is preserved.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
Duplicate entries in $entries
are ignored.
If $entries
is the empty sequence, or an empty map, the original archive
shall be returned.
[arch:unknown-entry] is raised if an entry requested for deletion does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Whilst the uncompressed entries remaining after deletion should of course be the same
size and content as those before deletion, depending upon the (lossless) compression
algorithm used, the compressed sizes and content might not be. In the absence of a
special check, in these circumstances $in
may not be identical to
arch:delete($in,())
. This needs discussion.
The signature with $entries as xs:string*
is defined as a convenience, to
avoid the creation of a simple map. Otherwise it is completely analagous to
arch:delete(xs:base64Binary,xs:string*)
.