EXPath

File Module

EXPath Candidate Module

Candidate 17 May 2010

This version:
http://expath.org/spec/file/20100517
Latest version:
http://expath.org/spec/file
Editor:
Matthias Brantner, 28msec GmbH
Contributor:
Gabriel Petrovay, 28msec GmbH

This document is also available in these non-normative formats: XML.


Abstract

This proposal provides a file system API for XPath 2.0. It defines extension functions to perform file system related operations such as listing, reading, or writing files. It has been designed to be compatible with XQuery 1.0 and XSLT 2.0, as well as any other XPath 2.0 usage.

Table of Contents

1 Introduction
    1.1 Namespace Conventions
    1.2 File Paths vs. URIs
    1.3 Error management
2 Functions
    2.1 file:copy
    2.2 file:exists
    2.3 file:files
    2.4 file:is-directory
    2.5 file:is-file
    2.6 file:last-modified
    2.7 file:mkdir
    2.8 file:mkdirs
    2.9 file:path-separator
    2.10 file:path-to-full-path
    2.11 file:path-to-uri
    2.12 file:read
    2.13 file:read-html
    2.14 file:read-text
    2.15 file:read-xml
    2.16 file:remove
    2.17 file:write

Appendices

A References
B Summary of Error Conditions


1 Introduction

1.1 Namespace Conventions

The module defined by this document defines several functions all contained in the namespace http://expath.org/ns/file. In this document, the file prefix, when used, is bound to this namespace URI.

Error codes are defined in the namespace http://expath.org/ns/error. In this document, the err prefix, when used, is bound to this namespace URI.

1.2 File Paths vs. URIs

In order to make the file API more accessible, paths referring to directories of files are specified as strings. The syntax of such strings is implementation defined. However, we strongly recommend that the following forms are accepted and interpreted as described below.

  • Operating system paths (absolute or relative paths are accepted):

    • C:\Test Dir\my file.xml: An absolute path on Windows platforms.

    • C:/Test Dir\my file.xml: An absolute path on Windows platforms that tolerates slashes instead of backslashes.

    • C:\\\Test Dir//\\my file.xml: An absolute path on Windows platforms that tolerates an arbitrary number of slashes and backslashes.

    • /Test Dir/my file.xml: An absolute path on UNIX-based platforms that tolerates an arbitrary number of slashes.

    • //Test Dir////my file.xml: An absolute path on UNIX-based platforms.

    • my file.xml: A relative path. The file should be searched for starting with the location pointed by the base URI of the current module.

  • File URIs (only absolute paths are accepted):

    • file:///C:/Test%20Dir/my%20file.xml: An absolute path on Windows platforms.

    • file:///C:/Test Dir/my file.xml: An absolute path on Windows platforms. The URI tolerates spaces.

    • file:///C:/Test%20Dir///my%20file.xml: An absolute path on Windows platforms. The URI tolerates an arbitrary number of slashes.

    • file:///Test%20Dir/my%20file.xml: An absolute path on UNIX-based platforms.

    • file://localhost/Test%20Dir/my%20file.xml: A URI that accepts localhost or 127.0.0.1 as the authority of the URI.

In the following, only "paths" is used if referring to a file or directory.

1.3 Error management

Error conditions are identified by an error code (a QName). If such an error condition is reached during the execution of the function, a dynamic error is raised, with the corresponding error code (as if the standard XPath function error had been called).

Error codes are defined throughout the spec. For too many reasons to enumerate here, the file operations can raise an error. In this case, if the error condition is not mentioned explicitly in the spec, the implementation must raise an error with an appropriate message [err:FS001].

2 Functions

2.1 file:copy

sequential file:copy($source as xs:string,
                     $destination as xs:string) as xs:boolean
sequential file:copy($source as xs:string,
                     $destination as xs:string,
                     $overwrite as xs:boolean) as xs:boolean

This function copies a file given a source and a destination. The operation fails by returning false if the $source path does not point to a file. If the $overwrite parameter is missing or evaluates to false, the function returns false if the destination already exists. Otherwise, the destination file, if it exists, will be overwritten. The function returns true if the copy operation was successful.

2.2 file:exists

nondeterministic file:exists($fileOrDir as xs:string) as xs:boolean

Tests if a path is already used in the file system. The function returns true if the file or directory pointed to by the $fileOrDir parameter exists already. Otherwise, the function returns false.

2.3 file:files

nondeterministic file:files($path as xs:string) as xs:string*
nondeterministic file:files($path as xs:string,
                            $pattern as xs:string) as xs:string*
nondeterministic file:files($path as xs:string,
                            $pattern as xs:string,
                            $recursive as xs:boolean) as xs:string*

Lists all files in a given directory. The order of the files in the result is not defined. The special files "." and ".." are never returned. The returned paths are relative to the provided $path. If the optional $pattern parameter is provided, only the file having a name that is matching the given regular expression pattern are returned. An additional $recursive parameter indicates if the search should recurse in the subdirectories.

2.4 file:is-directory

nondeterministic file:is-directory($dir as xs:string) as xs:boolean

Tests if a path/URI points to a directory. The function returns true if the path/URI points to a directory. Otherwise, it returns false. On UNIX-based systems, the root and the volumes roots are considered directories.

2.5 file:is-file

nondeterministic file:is-file($file as xs:string) as xs:boolean

Tests if a path points to a regular file. The function returns true if the path points to a regular file. Otherwise, the function returns false.

2.6 file:last-modified

nondeterministic file:last-modified($fileOrDir as xs:string) as xs:dateTime

Retrieves the timestamp of the last modification of the file system item (e.g. file, directory, or symbolic link) pointed by the given path ($fileOrDir).

2.7 file:mkdir

sequential file:mkdir($dir as xs:string) as xs:boolean
sequential file:mkdir($dir as xs:string,
                      $create as xs:boolean) as xs:boolean

Creates a directory. This function is not recursive. The optional $create parameter indicates that the function should succeed only if the target directory can be created. The function returns true if the operation succeeded. Otherwise, the function returns false.

2.8 file:mkdirs

sequential file:mkdirs($dir as xs:string) as xs:boolean
sequential file:mkdirs($dir as xs:string,
                       $create as xs:boolean) as xs:boolean

Creates directories recursively. The optional $create parameter indicates that the function should succeed only if the target directory can be created. The function returns true if the operation succeeded. Otherwise, the function returns false.

2.9 file:path-separator

file:path-separator() as xs:string

This function returns the file path separator used by the operating system. For example, it returns "/" on Unix-based platforms or "\" on Windows platforms.

2.10 file:path-to-full-path

file:path-to-full-path($path as xs:string) as xs:string

Transforms a path into a full operating system path. The resulting URI must have the file:// scheme. The operation is performed regardless the existence of a file or directory referred to by the provided path.

2.11 file:path-to-uri

file:path-to-uri($path as xs:string) as xs:anyURI

Transforms a file system path into a URI with the file:// scheme. No checks are performed regardless of the existence of a fil or directory referred to by the provided file system path.

2.12 file:read

nondeterministic file:read($file as xs:string) as xs:base64Binary

Reads the content of a file pointed to by the $file parameter and returns a Base64 representation of the content.

2.13 file:read-html

nondeterministic file:read-html($file as xs:string,
                                $tidyOptions as xs:string) as xs:string

Read the content of the HTML file pointed to by the $file parameters and returns it as a string.

2.14 file:read-text

nondeterministic file:read-text($file as xs:string) as xs:string

Reads the content of a file and returns the string representation of the content.

2.15 file:read-xml

nondeterministic file:read-xml($file as xs:string) as node()
nondeterministic file:read-xml($file as xs:string,
                               $tidy as xs:boolean) as node()

Reads a file and returns the parsed an XML document. If the $tidy parameter is present and evaluates to true, the implementation might perform a cleaning step (e.g. as performed by the tidy library) in order to make sure that a valid XML document is obtained. If the $tidy parameter is not present or evaluates to false and the file does not contain a well-formed XML document, an error is raised (TODO error code). Cleaning documents before parsing in order to make them well-formed might be useful for HTML documents.

2.16 file:remove

sequential file:remove($fileOrDir as xs:string) as xs:boolean

Delete a file or a directory from the file system. This operation is not recursive. The function returns false if the operation failed (e.g. if a non-empty directory should be deleted or the file or directory does not exist). Otherwise, the function returns true.

2.17 file:write

sequential file:write($file as xs:string,
                      $content as item()*,
                      $serializer-params as node()*) as xs:boolean
sequential file:write($file as xs:string,
                      $content as item()*,
                      $serializer-params as node()*,
                      $append as xs:boolean) as xs:boolean 
        

Write a sequence of items to a file. This operation creates a new file or appends the serialized content to the file pointed by the given path/URI. If the $append flag is true and the file does not exist, a new one is created.

The $serializer-params parameter is used to set the corresponding serialization parameter defined in [Serialization], as defined for the XPath 2.1 function fn:serialize().

The function returns true if the file was written successfully, or false otherwise.

A References

Serialization
XSLT 2.0 and XQuery 1.0 Serialization . Scott Boag, Michael Kay, Joanne Tong, Norman Walsh, and Henry Zongaro, editors. W3C Recommendation. 23 January 2007.
F&O 1.1
XPath and XQuery Functions and Operators 1.1 . Michael Kay, editor. W3C Working Draft. 15 January 2009.
XSLT 2.0
XSL Transformations (XSLT) Version 2.0. Michael Kay, editor. W3C Recommendation. 23 January 2007.

B Summary of Error Conditions

err:FS001
A file system error occurred.