This document is also available in these non-normative formats: XML.
Copyright © 2013 John Lumley, Christian Grün, Matthias Brantner and Florent Georges, published by the EXPath Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.
This specification was published by the EXPath Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.
This proposal provides an API for XPath 2.0 and XPath 3.0 to handle archive data (i.e. collected and possibly compressed sets of files and directories). It defines extension functions to process data from and to such archives files, including creation, determining and setting properties, listing and extracting contents and adding and updating entries. It has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage. Some additional features for use in XPath 3.0 are also defined.
1 Status of this document
2 Introduction
    2.1 Namespace conventions
    2.2 Error management
    2.3 Archive representation
    2.4 Archive types
3 Use cases
    3.1 Creating a simple EPUB document
    3.2 Examining a JAR file
    3.3 Extracting a ZIP archive to a file system
4 Describing archives and entries
    4.1 Archive properties and options
    4.2 Entry descriptions
    4.3 Using map types to describe entries and options
5 Loading and saving archives
6 Information about an archive and its contents
    6.1 arch:options
    6.2 arch:entries
7 Extracting entries from an archive
    7.1 arch:extract-binary
    7.2 arch:extract-text
8 Updating entries in an archive
    8.1 arch:delete
    8.2 arch:update
9 Creating an archive
    9.1 arch:create
10 Functions using XPath3.0 map() type
    10.1 Archive property maps
    10.2 Entry property maps
    10.3 archM:options
    10.4 archM:entries
    10.5 archM:entry-names
    10.6 archM:extract
    10.7 archM:extract-binary
    10.8 archM:extract-text
    10.9 archM:create
    10.10 archM:update
    10.11 archM:delete
This document is in an interim draft stage. Comments are welcomed at public-expath@w3.org mailing list (archive).
The module defined by this document defines several functions, all contained in the
               namespace http://expath.org/ns/archive. In this document, the
                  arch prefix, when used, is bound to this namespace URI.
Alternative versions of these functions using the proposed XPath3.0
                  map() type (see 4.3 Using map types to describe entries and options) are defined in the
               namespace http://expath.org/ns/archive-map. In this document, the
                  archM prefix, when used, is bound to this namespace URI. 
Error codes are defined in the same namespace
                  (http://expath.org/ns/archive), and in this document are displayed
               with the same prefix, arch.
Note:
This follows the suggestion (in late August 2013) for a coherent naming standard in EXPath modules.
Binary file I/O, to read and write complete archives to files, uses facilities
               defined in [EXPath File], which defines functions in the namespace
               http://expath.org/ns/file. In this document, the file prefix,
               when used, is bound to this namespace URI.
Manipulation of binary data itself can employ functions from [EXPath Binary], which defines functions in the namespace
               http://expath.org/ns/binary. In this document, the bin prefix,
               when used, is bound to this namespace URI.
Error conditions are identified by a code (a QName.) When such an error
               condition is reached in the evaluation of an expression, a dynamic error is thrown,
               with the corresponding error code (as if the standard XPath function
                  error() had been called.)
Archives in this module are represented principally as items of type
                  xs:base64Binary, i.e. in their basic binary (byte sequence)
               forms.
Archives are treated as being arranged structurally as a description of overall options of the archive and a sequence of named entries. Each entry has:
A name, which is treated as a sequence of Unicode characters. In many
                     cases the solidus character (/) is used to imply the entries being
                     logically arranged in positions within a directory tree, but this is not
                     mandatory.
A set of properties, denoting at least the uncompressed size of the entry, archive internal properties for the entry, such as the compression method used on the stored data and other indications such as the date of last modification.
Data, treated as (possibly null) binary data.
It is most common that archives are considered to be arranged logically as
               directories, using the entry names to denote paths and file names (e.g.
                  tests/qt3/archive/main.xml) In such circumstances, archives may
               contain entries to represent the directories themselves (e.g.
                  tests/qt3/archive/) presumably with no data. [This could be used such
               that full extraction of an archive to a file system generates empty output
               directories for example.] This specification makes no distinction between these two
               cases - if an archive has an empty 'directory' entry it will be treated similarly to
               any other 'file' entry. Semantic intrepretation of entry names as files in
                  directory trees is an application issue.
Note:
Behaviour when entries with duplicate names are detected in an archive is implementation dependent. Nevertheless, if an error is not thrown, only one entry should be returned when reading. Implementations must not write duplicate entries in result archives.
The module is designed to be able to support a number of different types of archive, providing a coherent access mechanism.
The following archive types are required to be supported:
[ZIP]: (which also covers derivative archive formats, such as JAR or OpenDocument.)
[GZIP] : A compressed archive of a sequence of files
Note:
Within GZIP names of entries (original file names) are optional, on a per-file basis, so special measures may need to be taken to handle 'unnamed' sections.
Specific issues arise from i) archives used in streaming situations, where the internal manifests of the archives cannot be completed until all data is written, ii) archives where the order of entries is important, such as JAR, where the mainfest entries need to be first.
Note:
Currently there are no proposals within this module to cover encrypted archives.
Development of this specification was driven by requirements which some XML developers regularly encounter in examining or generating data which is presented in archival forms. Some typical use cases include:
Manipulating EPUB documents.
Examining Java classes and resources stored in JAR formats.
An [EPUB] document is a collection of content sections, written in
               XHTML, with a metadata descriptor (usually the content.opf file) and a
               navigation description (usually the toc.ncx file), all collected
               together and potentially compressed in a ZIP format. A simple example of creating
               such a document in XQuery is:
arch:create(
    (
      { "name" : "mimetype", "compression" : "store" },
      "META-INF/container.xml",
      "OEBPS/content.opf",
      "OEBPS/Text/title.xhtml",
      "OEBPS/Text/chap01.xhtml",
      "OEBPS/toc.ncx"
    ),
    (
      content:mimetype(),
      content:metainf(),
      content:oebps-content(),
      content:title(),
      content:chapter(),
      content:toc()
    )
  )
The user-supplied XQuery function content:mimetype() returns the
               appropriate mimetype description for the EPUB document as a string
               ("application/epub+zip"). Each of the other content:*() functions
               generates a serialized form of the appropriate XML structure, e.g.:
declare function content:title() as xs:string
{
  fn:serialize(
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
      <title>Title Page</title>
    </head>
    
    <body>
      <div>
        <h2 id="heading_id_2">Sample Book</h2>
    
        <h2 id="heading_id_3">A Sample .epub Book</h2>
    
        <h3 id="heading_id_4">Title Page</h3>
      </div>
    </body>
    </html>
  )
};
Using a map struture to define an entry enables properties such as compression to be altered on on entry-by-entry basis. For and EPUB document the mimetype entry must be uncompressed (so effectively it can be read by simple string searching), but other entries may be compressed.
JAR files contain class code and definitions for Java classes, in entries whose names
               are path/classname.class. Local classes (classes defined
               within a class) have separate code entries with a classname
                     outerclass$innerclass. To find all the
               main package-qualified classes the following XPath should suffice: 
for $e in arch:entries(file:read-binary("lib/saxon9-sql.jar"))[ends-with(.,'.class') and not(contains(.,'$'))] 
  return replace(replace($e,'\.class$',''),'/','.')
=> 
   "net.sf.saxon.option.sql.SQLClose", 
   "net.sf.saxon.option.sql.SQLColumn", 
   "net.sf.saxon.option.sql.SQLConnect",
   ....,
   "net.sf.saxon.option.sql.SQLUpdate" 
Assuming the ZIP file in question has (empty) entries denoting any directories required, the following XSLT will unzip an archive to the current directory, using the file writing functions of [EXPath File]:
<xsl:variable name="arch" select="file:read-binary($uri)"/>
<xsl:variable name="entries" select="arch:entries($arch)"/>
<xsl:variable name="dirs" select="$entries[ends-with(.,'/')]"/>
<xsl:variable name="required.dirs"
            select="for $r in distinct-values(($entries except $dirs) return
            replace($r,'/[^/]+$','/'))[ends-with(.,'/')]"/>
<xsl:sequence select="for $d in distinct-values(($required.dirs,$dirs))
         return file:create-dir(replace($d,'/$',''))"/>
<xsl:sequence select="for $f in ($entries except $dirs) 
        return file:write-binary($f,arch:extract-binary($arch,$f))"/>
(file:create-dir() creates necessary intermediate directories, so
                  $dirs does not need to be in a sorted order. If the ZIP archive does
                  not have entries for all directories, further intermediate code is
               required to identify those missing.)
The properties of overall archives and individual entries at the XDM level are described by small structured elements, with optional information attached. In this proposal this information is attached as attributes.
Note:
Parallels with XPath 3.0 serialization parameters, which are now sets of (element)
               nodes, become awkward. In arch:entry we would need to add an element
                  arch:name to hold the name of an entry, rather than rely on the
               string value. The major point in favour of using elements rather than attributes
               would be where we need to read or set complex structured parameters, such as
               character maps. This needs discussion.
Archive options and properties are described as a structured element
                  (element(arch:options)) with the following attributes:
format: the type of the archive, e.g. "zip". This is
                     mandatory.
algorithm: the default compression used in the archive, e.g.
                     "deflate".
Other attributes may be dependent upon the type of the archive and the implementation.
Entries within the archive can be accessed by name (xs:string) or a
               structured element (element(arch:entry)). In the latter case the entry
               name is the string value of the element.
When describing an existing entry in an archive, element(arch:entry) may
               be returned with the following optional attributes:
size: the original file size of the entry.
compressed-size: the compressed file size of the entry, i.e. the
                     number of bytes it occupies in the archive.
last-modified: the date of last modification of this entry, in
                        xs:dateTime notation.
compression-level: an indicator of the level of (lossless?)
                     compression.
When used to create or update an entry in an archive,
                  element(arch:entry) may also have the following optional
               attributes:
last-modified: the date of last modification to be written on this
                     entry, in xs:dateTime notation.
compression-level: the level of (lossless?) compression to be used
                     in writing the entry into the archive.
encoding: the encoding to be used for converting textual items to
                     a byte sequence, prior to possible compression and writing to the archive.
(In writing actions, unknown attributes are ignored.)
Proposals in XPath 3.0 have been made for a type
                  map(xs:untypedAtomic,item()*), which could be exploited beneficially
               for manipulating archives, using the entry name as the key and the
                  (xs:base64Binary) value of the entry as the corresponding value in
               the map. These maps could be used both as output (in arch:entries() and
                  arch:extract-[text|binary]()) or for input (in
                  arch:update() and arch:create()). Equally well such maps
               can be used, reading keys only in arch:delete().
An attractive alternative would be for each entry itself to be a map($property
                  as xs:string, $value as item()*), with suitable keys, e.g.
                  content -> xs:base64Binary. Thus the entry 'set' can be
               a map map(xs;string, map(xs:string,item()*)).
Note:
Functions using such maps for arguments and results could either have separate
                  names (e.g. arch:entries-as-map()) or be defined in a separate
                  namespace (archM:entries()) - in this current draft the second is
                  used. Details are discussed in 10 Functions using XPath3.0 map() type
Support for similar approaches using other map representations, such as [JSONiq] objects may be implementation dependent.
This module defines no specific functions for reading and writing archives from files, as distinct from their binary data. The EXPath File Module [EXPath File] provides two suitable functions to do this:
                  file:read-binary($file as xs:string) as
                     xs:base64Binary. Returns the content of a file in its Base64
                  representation.
                  file:write-binary($file as xs:string,
                  $value as xs:base64Binary) as
                     empty-sequence(). Writes a Base64 item as binary to a file.
               
Note:
There may be some desire for some convenience functions arch:write($file as
                      which does
               creation and file writing as one action.xs:string,....) as empty-sequence()
Returns a description of the type and properties of a given archive.
arch:options($archive as xs:base64Binary) as element(arch:options)*The description is returned as an element <arch:options> with an
            unordered sequence of child elements describing the details. The following are currently
            supported:
arch:format: format of this archivearch:algorithm: the compression algorithm that was used.If the archive format supports a compression algorithm varying on a per-entry basis, and
            more than one algorithm has been used in the archive, mixed is returned for
               arch:algorithm.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Finding the properties of the archive stored in a file located at
               $uri:
arch:options(file:read-binary($uri))
=> <arch:options>
     <arch:format>ZIP</arch:format>
     <arch:algorithm>deflate</arch:algorithm>
   </arch:options>Returns the set of entry descriptors for all the entries found within the archive.
arch:entries($archive as xs:base64Binary) as element(arch:entry)*Each descriptor is an element <arch:entry> whose text value is the
            path of the file within the archive. For more details of this structure see 4.2 Entry descriptions.
The entries are returned in the order in which they encountered serially within the archive.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
There may be a case for providing a sorted version, probably using some form of collation.
Finding the entries of the archive stored in a file located at $uri:
arch:entries(file:read-binary($uri))
=> <arch:entry size="2194" compressed-size="652" last-modified="2013-07-18T11:22:12">build.xml</arch:entry>
   <arch:entry size="84983" compressed-size="84872" last-modified="2009-03-23T11:15:06">lumley.jpg</arch:entry>
   <arch:entry size="10058" compressed-size="1381" last-modified="2013-08-06T13:14:08">tests/qt3/binary/binary.xml</arch:entry>
     Counting the number of apparent XML files in the previous example:
count(arch:entries(file:read-binary($uri))[ends-with(.,'.xml')])
=> 2
     The module does not attempt to discern the 'type' of an entry (such as 'text', 'XML',
            'raw-binary'), leaving that to the programmer. Two forms of reading result are
            supported: raw binary (xs:base64Binary) and decoded text
               (xs:string). 
Returns the sequence of requested entries from the archive as binary data.
| arch:extract-binary( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*) as xs:base64Binary* | 
Returns as binary data each entry in the archive $in that corresponds to
            the entry name input, in sequence.
The entries must be returned in the order corresponding to that of
            the entries requested in $entries, not in the order in which they may exist
            in the archive.
Multiple requests for the same entry will be honoured, with copies of the entry appearing in corresponding multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
There have been suggestions for a signature arch:extract-binary($archive as
               xs:base64Binary) returning all the entries. In the absence of maps in the
            return type, this does not make sense, since the entries are totally unlabelled, and to
            get anything meaningful, a parallel call on arch:entries() would be
            required.
Returning the binary data for an entry in the archive stored in a file located at
                  $uri:
arch:extract-binary(file:read-binary($uri),'build.xml')
=> stuff
     Returns the sequence of requested entries from the archive as strings. If
               $encoding is specified the strings are decoded appropriately, otherwise
            UTF-8 encoding is assumed.
| arch:extract-text( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*) as xs:string* | 
| arch:extract-text( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*, | |
| $encoding |  as xs:string) as xs:string* | 
Returns as a string each entry in the archive $in that corresponds to the
            entry name input, in sequence.
If $encoding is specified the strings are decoded appropriately, otherwise
            UTF-8 encoding is assumed.
The entries must be returned in the order corresponding to that of
            the entries requested in $entries, not in the order in which they may exist
            in the archive.
Multiple requests for the same entry will be honoured, with copies of the entry appearing in corresponding multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:unknown-encoding] is raised if the encoding requested is unknown or unsupported.
[arch:decoding-error] is raised if there was an error in decoding the entry.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
This function should be equivalent to the use of arch:extract-binary() and
            the function bin:decode-string() from [EXPath Binary]:
arch:extract-binary($in,$entries) ! bin:decode-string(.,$encoding) [XPath 3.0]
for $b in arch:extract-binary($in,$entries) return bin:decode-string($b,$encoding)
            [XPath 2.0]Further conversion into XML can be achieved using the XPath3.0 function
               fn:parse-XML() on each of the returned strings.
There have been suggestions for a signature arch:extract-text($archive as
               xs:base64Binary) returning all the entries. In the absence of maps in the
            return type, this does not make sense, since the entries are totally unlabelled, and to
            get anything meaningful, a parallel call on arch:entries() would be
            required.
Returning the text data for an entry in the archive stored in a file located at
                  $uri:
arch:extract-text(file:read-binary($uri),'build.xml','UTF-8')
=> stuff
     There are two atomic actions available to change entries within an archive: complete deletion of an entry, or complete updating (overwriting) of that entry - the latter adds new entries when the given name does not already exist in the archive
Returns an archive with the given entries deleted.
| arch:delete( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*) as xs:base64Binary | 
Returns an archive of the same format as $in with all the entries named in
               $entries deleted.
The relative order of the remaining entries within the archive is preserved.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
Duplicate entries in $entries are ignored.
If $entries is the empty sequence, the original archive shall be
            returned.
[arch:unknown-entry] is raised if an entry requested for deletion does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Whilst the uncompressed entries remaining after deletion should of course be the same
            size and content as those before deletion, depending upon the (lossless) compression
            algorithm used, the compressed sizes and content might not be. In the absence of a
            special check, in these circumstances $in may not be identical to
               arch:delete($in,()). This needs discussion. 
Deleting the entries of the archive stored in a file located at
               $uri:
arch:entries(arch:delete(file:read-binary($uri),'lumley.jpg'))
=> <arch:entry size="2194" compressed-size="652" last-modified="2013-07-18T11:22:12">build.xml</arch:entry>
   <arch:entry size="10058" compressed-size="1381" last-modified="2013-08-06T13:14:08">tests/qt3/binary/binary.xml</arch:entry>
     Returns an archive with each of the given entries in $entries updated to
            the corresponding values in the sequence $new. If an entry is not found, a
            new entry is added to the end of the archive.
| arch:update( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*, | |
| $new |  as xs:base64Binary*) as xs:base64Binary | 
| arch:update( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*, | |
| $new |  as xs:base64Binary*, | |
| $last-modified |  as xs:dateTime) as xs:base64Binary | 
Returns an archive of the same format as $in with each of the given entries
            in $entries updated to the corresponding value in the sequence
               $new. If an entry is not found, a new entry for it is added to the end
            of the archive.
The relative order of all the existing and replaced entries within the archive is preserved. New entries appear at the end of the archive in the order in which they were specified in the call.
If specified, and the format supports it, the last-modified date for each of the updated
            entries will be set to $last-modified. In the absence of such a parameter,
            it is implementation-dependent whether last-modified information will be written on the
            updated entries. If such default last-modification is written, it should be comparable
            to the value of fn:current-dateTime() in an XSLT environment.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
The compression methods of the updated entries shall be preserved.
When duplicate names appear in the entry list, the value of the entry in the resulting
            archive will be that of the value of $new corresponding to the
               last matching entry name. 
[arch:entry-data-mismatch] is raised if count($entries) ne
               count($new).
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
new archives need to be created
Returns a new archive with each of the given entries in $entries set to the
            corresponding values in the sequence $new.
arch:create($entries as xs:string*, $new as xs:base64Binary*) as xs:base64Binary| arch:create( | $entries |  as xs:string*, | 
| $new |  as xs:base64Binary*, | |
| $options |  as element(arch:options)) as xs:base64Binary | 
Returns an archive of format specified by $options with each of the given
            entries in $entries set to the corresponding value in the sequence
               $new.
The relative order of new entries within the archive follows that of the input.
When duplicate names appear in the entry list, the value of the entry in the resulting
            archive will be that of the value of $new corresponding to the
               last matching entry name. 
[arch:entry-data-mismatch] is raised if count($entries) ne
               count($new).
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
Maps proposed for XPath3.0 can increase the coherence of the functions in the module, mainly by retaining the structured connection between the entry name and its properties and content. In addition the properties of the overall archive (and its defaults for new entries) can similarly be defined in a single map.
This section proposes parallel functions to those above using maps.
Note:
map:keys($map as map(*)) as xs:anyAtomicType* returns the keys that are
               present in a map, in unpredictable order. This means that if order within an archive
               is important (either in extraction or updating) other mechanisms may be needed.
In general when using maps for denoting the entries to be manipulated, the arguments might be considered to be a (possibly empty) sequence of maps that are treated as if concatentated. [THIS NEEDS THOUGHT ABOUT OVERWRITING/MERGING COMMON KEYS]
Using a reserved name within the overall map (such as arch:options)
               would allow the options/properties for an archive to be stored alongside the
               entries.
Entries within the archive can be also be accessed or described by entries in a map
                  (map(xs:string,map(xs:string,item()*))). In this case the map key
               gives the (path)name of the archive entry (e.g. build/build-j.xml) and
               the value is a map of the properties of that entry.
The following keys are provided when reporting on entries:
size: the original file size of the entry as
                        xs:integer
compressed-size: the compressed file size of the entry as
                        xs:integer, i.e. the number of bytes it occupies in the
                     archive.
last-modified: the date of last modification of this entry, in
                        xs:dateTime notation
compression-level: an indicator of the level of (lossless?)
                     compression.
content: the value of the entry read from the archive, as
                        xs:base64Binary. This will only be set if
                        $return-content is requested in the call to
                        archM:entries().
When used to extract an entry from an archive, this map may have the following optional key/value pairs:
encoding: the encoding to be used for converting textual items
                     from a byte sequence.
When used to create or update an entry in an archive, this map may have the following optional key/value pairs:
content: the value of the entry to be written in the archive,
                     either as xs:base64Binary or, when encoding is set,
                     as xs:string.
Note:
This is awkward - why not just insist on xs:base64Binary
                           and let the programmer encode?
last-modified: the date of last modification to be written on this
                     entry, in xs:dateTime notation
compression-level: the level of (lossless?) compression to be used
                     in writing the entry into the archive.
encoding: the encoding to be used for converting textual items to
                     a byte sequence, prior to possible compression and writing to the archive.
Returns a description of the type and properties of a given archive as a map.
archM:options($archive as xs:base64Binary) as map(xs:string,item()?)The description is returned as a map map(xs:string,item()?) with entries
            describing the details. The following are currently supported:
format: format of this archivecompression: the compression algorithm that was used.If the archive format supports a compression algorithm varying on a per-entry basis, and
            more than one algorithm has been used in the archive, mixed is returned for
            the compression entry.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Finding the properties of the archive stored in a file located at
               $uri:
archM:options(file:read-binary($uri))
=> {'format' :'zip', 'compression' : 'deflate'}
Returns the entry descriptors for all the entries found within the archive as a map, optionally each with their content.
| archM:entries( | $archive |  as xs:base64Binary) as map(xs:string,map(xs:string,item()*)) | 
| archM:entries( | $archive |  as xs:base64Binary, | 
| $return-content |  as xs:boolean) as map(xs:string,map(xs:string,item()*)) | 
Keys to the returned map are the entry (path) names.
The value for each map entry is a map describing the properties of that entry. For more details of this structure see 10.2 Entry property maps.
If $return-content is defined and equals true(), then the
            content for each entry is returned as the content entry in the property
            map, as a xs:base64Binary item.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
As the returned order of keys from map:keys() is not defined and can be
            implementation-dependant, there may be a need for a simple function
               (archM:entry-names(xs:base64Binary) as xs:string*) which returns purely
            the names in the order in which they appear in the archive.
Using $return-content makes it possible to return a complete archive in a
            single call. (What about the archive options?
Finding the entries of the archive stored in a file located at $uri:
archM:entries(file:read-binary($uri))
=> map{ 
  "build.xml" := map{ "size":=2194, "compressed-size":=652, "last-modified":="2013-07-18T11:22:12"},
  "lumley.jpg" := map{ "size":=84983, "compressed-size":=84872, "last-modified":="2009-03-23T11:15:06"},
  "tests/qt3/binary/binary.xml" := map{ "size":=10058, "compressed-size":=1381, "last-modified":="2013-08-06T13:14:08"}}
     Counting the number of apparent XML files in the previous example:
count(map:keys(archM:entries(file:read-binary($uri)))[ends-with(.,'.xml')])
=> 2
     Returns the entry names for all the entries found within the archive as a sequence of string values in the order in which they appear in the archive.
archM:entry-names($archive as xs:base64Binary) as xs:string*Returns the entry names for all the entries found within the archive as a sequence of string values in the order in which they appear in the archive.
[arch:read-error] is raised if there is an unspecified problem in reading the archive.
Returns a copy of $entries with the content entries set to binary or
            decoded string data for the appropriate entry in the archive.
| archM:extract( | $archive |  as xs:base64Binary, | 
| $entries |  as map(xs:string,map(xs:string,item()?))) as map(xs:string,map(xs:string,item()?)) | 
The map entries in $entries define whether binary or decoded string data is
            to be returned.
The behaviour of this function is defined by equivalent XPath:
map:new(for $k in map:keys($entries) 
   return 
     let $a := $entries($k),
         $text := map:contains($a,'encoding'),
         $encoding := ($a('encoding'),'UTF-8')[1],
         $data := arch:extract-binary($archive,$k) // error if not found
     return 
         map:entry($k,
             map:new(($a,
               map:entry('content',if($text) bin:decode-string($data,$encoding) else $data)
               ))
       )
     [arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
To collect all the XML entries as XML:
let $archive := file:read-binary($uri)
    $entries := archM:entries($archive),
    $xml-names := map:keys($entries)[ends-with(.,'.xml')],
    $get := map:new($xml-names ! map:entry(.,map:entry('encoding','UTF-8'))),
    $content := archM:extract($archive,$get)
return
    $xml-names ! fn:parse-XML($content(.)('content'))
     Returns the sequence of requested entries from the archive as binary data.
| archM:extract-binary( | $archive |  as xs:base64Binary, | 
| $entries |  as map(xs:string,map(xs:string,item()?))) as xs:base64Binary* | 
| archM:extract-binary( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*) as xs:base64Binary* | 
Returns as binary data each entry in the archive $in that corresponds to
            the entry name input, or map:keys($entries), in sequence.
When $entries has type xs:string*, the entries
               must be returned in the order corresponding to that of the entries
            requested in $entries, not in the order in which they may exist in the
            archive.
When $entries has type xs:string*, multiple requests for the
            same entry will be honoured, with copies of the entry appearing in corresponding
            multiple locations in the output sequence .
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Collection of all the entries as binary data can be accomplished using
               archM:entries($archive,true()) and collecting the 'content'
            entry from each of the returned maps.
The signatures with $entries instance of xs:string* are equivalent to
               arch:extract-binary().
Returns the sequence of requested entries from the archive as decoded string data.
| archM:extract-text( | $archive |  as xs:base64Binary, | 
| $entries |  as map(xs:string,map(xs:string,item()?))) as xs:string* | 
| archM:extract-text( | $archive |  as xs:base64Binary, | 
| $entries |  as map(xs:string,map(xs:string,item()?)), | |
| $encoding |  as xs:string) as xs:string* | 
| archM:extract-text( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*) as xs:string* | 
| archM:extract-text( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*, | |
| $encoding |  as xs:string) as xs:string* | 
Returns as decoded string data each entry in the archive $in that
            corresponds to the entry name input, or map:keys($entries), in
            sequence.
When $entries has type xs:string*, the entries
               must be returned in the order corresponding to that of the entries
            requested in $entries, not in the order in which they may exist in the
            archive.
When $entries has type xs:string*, multiple requests for the
            same entry will be honoured, with copies of the entry appearing in corresponding
            multiple locations in the output sequence.
If $encoding is specified, or the field 'decoding' appears in
            the entry in $entries, the strings are decoded according to that encoding,
            otherwise UTF-8 encoding is assumed.
[arch:unknown-entry] is raised if an entry requested does not exist in this archive.
[arch:unknown-encoding] is raised if an encoding requested is unknown or unsupported.
[arch:decoding-error] is raised if there was an error in decoding an entry.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
The signatures with $entries instance of xs:string* are equivalent to
               arch:extract-text().
Returns a new archive with each of the given entries named as a key in
               $entries set to the corresponding value in
               $entries($key)('content').
| archM:create( | $entries |  as map(xs:string,map(xs:string,item()?))*) as xs:base64Binary | 
| archM:create( | $entries |  as map(xs:string,map(xs:string,item()?))*, | 
| $options |  as map(xs:string,item())) as xs:base64Binary | 
Returns an archive of format specified by $options with each of the given
            entries named as a key in $entries set to the corresponding value in
               $entries($key)('content')..
The relative order of new entries within the archive follows that of the input.
If $options is specified, the overall archive properties (and defaults for
            the entries) are set to those specified in the map.
[arch:read-error] is raised if there was an unspecified problem in creating the archive.
Returns an archive with each of the given entries in the keys of $entries
            updated to the corresponding values in the $entries($key)('content') and
            with other properties defined by $entries($key)(*). If an entry is not
            found, a new entry is added to the end of the archive.
| archM:update( | $archive |  as xs:base64Binary, | 
| $entries |  as map(xs:string,map(xs:string,item()?))) as xs:base64Binary | 
| archM:update( | $archive |  as xs:base64Binary, | 
| $entries |  as map(xs:string,map(xs:string,item()?)), | |
| $default |  as map(xs:string,item())) as xs:base64Binary | 
Returns an archive with each of the given entries in the keys of $entries
            updated to the corresponding values in the $entries($key)('content') and
            with other properties defined by $entries($key)(*). If an entry is not
            found, a new entry is added to the end of the archive.
If $options is specified, values will be used for the default properties
            for each entry, which may be overloaded by the property map for each individual
            entry.
The relative order of all the existing and replaced entries within the archive is preserved. New entries appear at the end of the archive in the order in which they were specified in the call.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
The compression methods of the updated entries shall be preserved.
[arch:read-error] is raised if there was an unspecified problem in reading or creating the archive.
Using the $default map a common compression method, last-modification date
            and similar can be set for a set of entries, whose minimal map entries are
               map{"content":=$content}
         
Returns an archive with the given entries deleted.
| archM:delete( | $archive |  as xs:base64Binary, | 
| $entries |  as xs:string*) as xs:base64Binary | 
| archM:delete( | $archive |  as xs:base64Binary, | 
| $entries |  as map(xs:string,map(xs:string,item()))*) as xs:base64Binary | 
Returns an archive of the same format as $in with all the entries named in
               $entries or $entries!map:keys(.) deleted.
The relative order of the remaining entries within the archive is preserved.
The uncompressed content, size and last-modified date of the remaining entries shall be the same as those for those entries before deletion. Compressed sizes may alter.
Duplicate entries in $entries are ignored.
If $entries is the empty sequence, or an empty map, the original archive
            shall be returned.
[arch:unknown-entry] is raised if an entry requested for deletion does not exist in this archive.
[arch:read-error] is raised if there was an unspecified problem in reading the archive.
Whilst the uncompressed entries remaining after deletion should of course be the same
            size and content as those before deletion, depending upon the (lossless) compression
            algorithm used, the compressed sizes and content might not be. In the absence of a
            special check, in these circumstances $in may not be identical to
               arch:delete($in,()). This needs discussion. 
The signature with $entries as xs:string* is defined as a convenience, to
            avoid the creation of a simple map. Otherwise it is completely analagous to
               arch:delete(xs:base64Binary,xs:string*).