MongoDB Module w3c-designation EXPath Working Draft 5 March 2015 XML Christian Grün BaseX GmbH Dannes Wessels eXist Solutions GmbH

Copyright © 2015 Christian Grün and Dannes Wessels, published by the EXPath Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.

This specification was published by the EXPath Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply.s Learn more about W3C Community and Business Groups.

This module provides an API for accessing the document database MongoDB. It defines functions to connect to the DBMS, retrieve documents from databases and collections, update resources, perform map-reduce queries, and execute server-side JavaScript functions and database commsands.

The module has been designed to be compatible with XQuery 3.1 and XPath 3.1, and later versions. It has been inspired by existing and the recommendations. Its initial version was based on .

en-US

revisiondesc

Status of this document

This document is in an interim draft stage. Comments are welcomed at public-expath@w3.org mailing list (archive).

Introduction Namespace conventions

The module defined by this document defines functions and errors in the namespace http://expath.org/ns/mongo. In this document, the mongo prefix is bound to this namespace URI. Error codes are defined in the same namespace and are displayed with the same prefix.

The err prefix denotes the namespace for XPath and XQuery errors, http://www.w3.org/2005/xqt-errors, as defined in the specification.

Error management

Error conditions are identified by a code (a QName). When such an error condition is reached in the evaluation of an expression, a dynamic error is thrown, with the corresponding error code (as if the standard XPath function error() had been called).

The following errors apply to most functions of this specification:

is raised if no database connection exists for a given client id.

is raised if a supplied XQuery map cannot be converted to a valid JSON object.

is raised if an unexpected error (connection failure, timeout) occurs while interacting with the database.

is raised if an invalid database name is specified. Database names must have 1-63 characters, and they must not contain any of the following twelve characters: /\. "$*<>:|?

is also raised if an invalid collection name is specified. Collection names must have 1-117 characters, and they must not contain the dollar sign ($).

Remaining error codes are specified along with the functions. For a list of all errors, see the Summary of Error Conditions section of this document.

JSON Data

Since Version 3.1, maps and arrays are available in XQuery and XPath. This specification makes heavy use of the feature: all JSON objects and arrays are represented in the equivalent XQuery data types.

If JSON strings are preferred as input and output, the functions fn:parse-json and fn:serialize of the specification can be used for conversion. The following query parses a JSON string and returns an XQuery map. The "liberal" option accepts deviation from , such as the omission of quotes on keys:

fn:parse-json('{ info: "Hello Universe" }', map { "liberal": true() })

The next example shows how XQuery maps can be serialized as JSON:

)]]>
Query Execution

All functions in this module are ·nondeterministic·. Non-deterministic functions may return different results when executed more than once. This is illustrated by two examples:

The mongo:find may return different results when called more than once, as the contents of the MongoDB database instance may have changed between the first and second call.

Calling mongo:insert may be successful for the first time, but it may fail when called twice, because the document to be added will already exist.

A query processor must ensure that non-deterministic functions are not relocated or rewritten in the query, and that its results are not cached at runtime.

Test suite

A is provided to ensure compatibility across different implementations of the specification. It is based on the QT3 format; see for more details.

Client Operations mongo:connect mongo:connect($uri as xs:string) as xs:string

Establishes a connection to MongoDB and returns a client id as string that identifies the opened connection.

The $uri string follows the MongoDB URI format. It must at least contain one host name, and it may be prefixed with the mongo scheme and suffixed with a port number. Multiple hosts, e.g. for a replica set, are separated with commas.

The format of the returned client id string is implementation-defined, but all returned ids must be unique during the evaluation of a query.

is raised if the connection could not be established, possibly due to a wrong URL or a connection failure.

The following expression creates three connections to local MongoDB instances, using the default port, and returns the client ids as result: mongo:connect("localhost"), mongo:connect("localhost:27017"), mongo:connect("mongo://localhost:27017")

The following function call connects to a replica set with three members, and distributes reads to the secondary: mongo:connect("localhost,localhost:27018/?readPreference=secondary")

mongo:list-databases

mongo:list-databases($client-id as xs:string) as xs:string*

Returns the names of all databases on the connected server.

The connection is identified by the supplied $client-id.

The following query lists all databases on localhost:

mongo:list-databases(mongo:connect("localhost"))

mongo:close

mongo:close($client-id as xs:string) as empty-sequence()

Closes an open database connection. The connection to be closed is identified by the supplied $client-id. When a database connection is closed, the associated id is discarded and invalidated. As a consequence, each database can be closed once.

A connection must be kept open as long as it has not explicitly been closed by the user, and as long as the query has not been fully evaluated. After query evaluation, an implementation must ensure that all remaining connections are automatically closed.

The following expression closes a connection that has just been opened:

mongo:close(mongo:connect("localhost"))

Database Operations

The module provides no function for creating new databases, because non-existing databases will automatically be created by MongoDB with the first write operation.

mongo:list-collections

mongo:list-collections($client-id as xs:string, $database as xs:string) as xs:string*

Returns the names of all collections contained in a databases.

The connection is identified by the supplied $client-id, and the name of the database is supplied via $database.

mongo:command

mongo:command($client-id as xs:string, $database as xs:string, $command as map(*)) as map(*)

Executes a , supplied via $command, and returns the result as a map.

The connection is identified by the supplied $client-id, and the name of the database is supplied via $database.

The object returned by MongoDB contains the field ok, which must be parsed by the implementation to decide if command execution was successful (indicated by the integer value 1) or not (0). The field must be removed from the object before the result is returned as map. If execution failed, the field errmsg can be parsed to return a proper error message.

is raised if command execution failed.

The following query clones a database from a remote MongoDB instance to the current host. The result will either be a map, which contains the result of the command execution, or an error: let $id := mongo:connect("localhost") return try { mongo:command($client-id, map { "clone", 1 }) } catch mongo:exec { "Command execution failed: " || $err:description, }

mongo:eval

mongo:eval($client-id as xs:string, $database as xs:string, $code as xs:string) as item()*

mongo:eval($client-id as xs:string, $database as xs:string, $code as xs:string, $args as item()*) as item()*

Runs server-server JavaScript script code, supplied via $code. Function arguments can be supplied via $args. Arguments can be booleans, strings, numbers, arrays or maps. An error will be raised if any other type is supplied. Items of type xs:untypedAtomic are converted to strings.

The connection is identified by the supplied $client-id, and the name of the database is supplied via $database.

Due to the different type systems of XQuery and JavaScript, it is not possible to losslessly convert all values to one language and back. To ensure compatibility, an implementation must obey the following conversion rules:

Conversion of XQuery arguments to JavaScript:

A value of type xs:boolean is converted to a boolean.

A value of type xs:string and xs:untypedAtomic are converted to a string.

A value of type xs:numeric is converted to a number (i.e. a double-precision floating-point format value).

A value of type map(*) is converted to an object. Its entries must be converted recursively according to the given rules.

A value of type array(*) is converted to an array. Its members must be converted recursively according to the given rules.

The error is raised for any other type.

Conversion of JavaScript results to XQuery:

A boolean is converted to xs:boolean.

A string is converted to xs:string.

A number is converted to xs:double.

An object is converted to map(*). Its entries must be converted recursively according to the given rules.

An array is converted to array(*). Its members must be converted recursively according to the given rules.

The error is raised for any other type.

is raised if an XQuery argument cannot be converted to Javascript, or if a Javascript result cannot be converted to XQuery.

is raised if JavaScript execution failed.

The following query returns the result of an arithmetic expression:

let $id := mongo:connect("localhost") return mongo:eval($client-id, "db", 'function ( x, y ) { return x + y; }', (2, 5) )

mongo:drop-database

mongo:drop-database($client-id as xs:string, $database as xs:string) as empty-sequence()

Drops a database. No operation will be performed if the database does not exist.

The connection is identified by the supplied $client-id, and the name of the database is supplied via $database.

The following query drops five databases (provided they exist):

let $id := mongo:connect("localhost") for $no in 1 to 5 return mongo:drop-database($client-id, "database-" || $no)

Collections: Read Operations mongo:find

mongo:find($client-id as xs:string, $database as xs:string, $collection as xs:string) as map(*)*

mongo:find($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*)) as map(*)*

mongo:find($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*), $options as map(*)) as map(*)*

Returns documents of a collection. If a query is supplied via the $query argument, the documents are filtered by that query. The $options argument can have the following entries:

"fields": map(*): Restricts the returned fields. The field _id will always be returned.

"sort": map(*): Sorts the returned documents.

"limit": xs:integer: Limits the number of returned documents by the specified integer.

"skip": xs:integer: Skips the number of specified documents.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following expression queries an addressbook and selects all persons living in Tokyo. It sorts results by the names and returns the first 50 documents:

let $id := mongo:connect("localhost") return mongo:find($client-id, 'db', 'addressbook', map { "city": "Tokyo" }, map { "sort": map { "name": 1 }, "limit": 50 } )

mongo:find-one

mongo:find-one($client-id as xs:string, $database as xs:string, $collection as xs:string) as map(*)?

mongo:find-one($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*)) as map(*)?

mongo:find-one($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*), $options as map(*)) as map(*)?

Returns the first document of a database that optionally matches a supplied $query. The $options argument can have have the following entries:

"fields": map(*): Restricts the returned fields. The field _id will always be returned.

"sort": map(*): Sorts the documents before returning the first result.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following query returns the first document that matches the specified query:

let $id := mongo:connect("localhost") return mongo:find-one($client-id, 'db', 'addressbook', map { "name": "John Taylor" })

mongo:count

mongo:count($client-id as xs:string, $database as xs:string, $collection as xs:string) as xs:integer

mongo:count($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*)) as xs:integer

Counts documents in a collection. If a query is supplied via the $query argument, the documents are filtered by that query.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following expression counts the number of documents in the "addressbook" collection:

mongo:count(mongo:connect("localhost"), 'db', 'addressbook')

mongo:aggregate

mongo:aggregate($client-id as xs:string, $database as xs:string, $collection as xs:string, $pipeline as map(*)*) as map(*)*

Calculates aggregate values for the documents in a collection and returns the results. The operations to be performed are supplied via the $pipeline argument.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following query selects all documents with "Tokyo" as city and returns their names:

let $id := mongo:connect("localhost") return mongo:aggregate($client-id, "db","addressbook", (map { "$match" : map { "city": "Tokyo" } }, map { "$project": map { "name": 1 } }) )

mongo:group

mongo:group($client-id as xs:string, $database as xs:string, $collection as xs:string, $fields as map(*), $reduce as xs:string, $initial as map(*)) as map(*)*

mongo:group($client-id as xs:string, $database as xs:string, $collection as xs:string, $fields as map(*), $reduce as xs:string, $initial as map(*), $options as map(*)) as map(*)*

Groups documents in a collection by the supplied $fields, aggregates the documents via the $reduce function, and returns the results. $initial provides an initial result document, which will be modified by the reduce function. The $options argument can have the following entries:

"cond": map(*): Filters the documents before being processed.

"finalize": xs:string: Follows the reduce function and modifies the output.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following groups documents with age > 60 by the city field and returns the summed up orders field:

let $id := mongo:connect("localhost") return mongo:group($client-id, "db", "addressbook", map { "city": 1 }, "function (curr, result) { result.orders += curr.orders; }", map { "orders": 0 }, map { "cond": map { "age": map { "$gt": 60 } } } )

mongo:map-reduce

mongo:map-reduce($client-id as xs:string, $database as xs:string, $collection as xs:string, $map as xs:string, $reduce as xs:string) as map(*)*

mongo:map-reduce($client-id as xs:string, $database as xs:string, $collection as xs:string, $map as xs:string, $reduce as xs:string, $options as map(*)) as map(*)*

Runs a map-reduce aggregation operation over the documents of a collection. The map and reduce functions are supplied via the $map and $reduce arguments. The $options argument can have the following entries:

"query": map(*): Filters the documents before being processed.

"output": xs:string: Specifies the output target of the result.

"type": xs:string: Specifies the output type. Allowed values are INLINE (default), REPLACE, MERGE and REDUCE.

"finalize": xs:string: Follows the reduce function and modifies the output.

"sort": map(*): Sorts the returned documents.

"limit": xs:integer: Limits the number of returned documents by the specified integer.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following query sums up the numbber of orders from all documents of the "addressbook" collection:

let $id := mongo:connect("localhost") return mongo:map-reduce($client-id, "db", "addressbook", 'function () { emit(this._id, this.orders) };', 'function (id, ordersArray) { return Array.sum(ordersArray); };' )

Collections: Update Operations mongo:find-and-modify

mongo:find-and-modify($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*), $update as map(*)) as map(*)?

mongo:find-and-modify($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*), $update as map(*), $options as map(*)) as map(*)?

Finds a document that has been selected via the $query argument, modifies it according to the $update argument, and returns the document as it was before the modifications. An empty sequence is returned if no document was found. The $options argument can have the following entries:

"fields": map(*): Restricts the returned fields. The field _id will always be returned.

"sort": map(*): Sorts the documents before choosing the first one as candidate for modification.

"new": xs:boolean: Returns the modified document rather than the original.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following query modifies a document with the specified id:

let $id := mongo:connect("localhost") return mongo:find-and-modify($client-id, 'db', 'addressbook', map { "_id": 123 }, map { "name": "Naomi Matsuo", "city": "Tokyo", "country": "Japan" } )

mongo:find-and-remove

mongo:find-and-remove($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*)) as map(*)?

Finds a document that has been selected via the $query argument and returns it after removing it from the database. An empty sequence is returned if no document was found.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following query removes a document with the specified id:

let $id := mongo:connect("localhost") return mongo:find-and-remove($client-id, 'db', 'addressbook', map { "_id": 123 })

mongo:insert

mongo:insert($client-id as xs:string, $database as xs:string, $collection as xs:string, $documents as map(*)*) as empty-sequence()

Inserts documents into a collection. If the collection does not exists on the server, it will be created. If the new document does not contain an _id field, it will be added.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

is raised if a document cannot be inserted. This can e.g. happen if document with an identical id already exists.

The following expression inserts two new documents into a collection:

let $id := mongo:connect("localhost") let $docs := (map { 'name': 'John Daniels' }, map { 'name': 'Jack Walker' }) return mongo:insert($client-id, 'db', 'addressbook', $docs)

mongo:save

mongo:save($client-id as xs:string, $database as xs:string, $collection as xs:string, $document as map(*)) as empty-sequence()

Updates an existing document in a collection or inserts a new document. If the supplied document has no _id field, or if the id does not exist in the collection, the document will be inserted. Otherwise, the existing document will be replaced.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following expression updates or inserts a single document:

let $id := mongo:connect("localhost") let $doc := map { '_id': 123, name': 'Hans Schmid' } return mongo:save($client-id, 'db', 'addressbook', $doc)

mongo:update

mongo:update($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*), $update as map(*)) as empty-sequence()

mongo:update($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*), $update as map(*), $options as map(*)) as empty-sequence()

Finds one or more document that have been selected via the $query argument and modifies them according to the $update argument. The $options argument can have have the following entries:

"upsert": xs:boolean: Inserts a new document if no document matches the query criteria.

"multi": xs:boolean: Updates all documents in the collection that match the update query. Otherwise, updates only one document.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

is raised if a document cannot be inserted. This can e.g. happen if a user tries to change the id of the document.

The following update applies to all documents in the addressed collection. It replaces the value of the info field with null, or adds a new name/value pair if the field does not exist:

let $id := mongo:connect("localhost") return mongo:update($client-id, 'db', 'addressbook', map { }, map { '$set': map { 'info': () } }, map { 'upsert': true(), 'multi': true() } )

mongo:remove

mongo:remove($client-id as xs:string, $database as xs:string, $collection as xs:string, $query as map(*)) as empty-sequence()

Removes documents from a collection that are selected by the $query argument.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following expression removes all documents from a collection:

mongo:remove(mongo:connect("localhost"), 'db', 'addressbook', map { })

mongo:drop-collection

mongo:drop-collection($client-id as xs:string, $database as xs:string, $collection as xs:string) as empty-sequence()

Drops a collection. No operation will be performed if the collection does not exist.

The connection is identified by the supplied $client-id, and the name of the database and collection are supplied via $database and $collection.

The following query drops five collections in a database:

let $id := mongo:connect("localhost") for $no in 1 to 5 return mongo:drop-collection($client-id, "database", "collection-" || $no)

References MongoDB Database Commands. URL: http://docs.mongodb.org/manual/reference/command/ MongoDB Drivers. URL: http://docs.mongodb.org/ecosystem/drivers MongoDB Meta Driver. URL: http://docs.mongodb.org/meta-driver/latest MongoDB Test Suite. URL: http://github.com/expath/expath-cg, directory: tests/qt3/mongo/. Mongrel, the eXist-db MongoDB extension. URL: https://github.com/dizzzz/Mongrel XML Query Test Suite. URL: http://dev.w3.org/2011/QT3-test-suite XQuery 3.1: An XML Query Language. Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium. URL: http://www.w3.org/TR/xquery-31 XPath and XQuery Functions and Operators 3.1. Michael Kay, Editor. World Wide Web Consortium. URL: http://www.w3.org/TR/xpath-functions-31 RFC 7159 – The JavaScript Object Notation (JSON) Data Interchange Format. Internet Engineering Task Force (IETF). URL: http://www.rfc-editor.org/rfc/rfc7159.txt Summary of Error Conditions A new database connection could not be established. Command execution failed. No open database connection exists for the supplied client id. An unexpected error occurred while interacting with the database. A supplied XQuery map could be converted to a JSON object. The name of a database or collection is invalid. An XQuery argument cannot be converted to Javascript, or a Javascript result cannot be converted to XQuery. A write operation failed.