Published by the
This specification was published by the
This proposal defines extension functions and data models to enable Faceted navigation/search support in XQuery.
revisiondesc
This document is in an initial submission stage. Comments are welcomed at
Faceted search has proven to be enormously popular in the real world applications. Faceted search allows user to navigate and access information via a structured facet classification system. Combined with full text search, it provides user with enormous power and flexibility to discover information.
This proposal defines a standardized approach to support the Faceted search in XQuery. It has been designed to be compatible with XQuery 3.0, and is intended to be used in conjunction with XQuery and XPath Full Text 3.0.
The module defined by this document defines functions and elements in the
namespace
. In this document, the
facet
prefix is bound to this namespace URI.
Facet: refers to an object attribute (in a generic sense, not to be confused with xml attribute) that will be aggregated. For example, "color" is a facet of "car" object.
Facet-value: refers to a value of the facet. For example, "blue" is a facet-value of the facet "color" for "car" object.
The facet aggregation: counting the occurrence of each facet-value in the results.
The facet drills:
drill-down: filter the search results by matching a selected facet value. Once a facet-value is drilled down, the facet is no longer available for selection by the user, thus only one facet-value in the same facet can be drilled-down at a time. Example:
drill-sideway: also known as multi-select facets. Filter the search results by matching against multiple facet values of the same facet. Example:
Hierarchical facets. Organizing multiple facets in a hierarchical structure. When a facet is a part of hierarchy, it must be aggregated in relation to its parent facet. This concept is also sometimes referred to as pivot facet.
For example:
The faceted search support consists of the definitions of the facet data models, and the XQuery functions
that manipulate the data models to perform facet aggregation and drills. The following sections contain
the detailed specification of the data models and the XQuery functions. The
Below is the RelaxNG Compact grammar for the facet data models:
Example:
attribute name : The facet name
element key : Contains all information pertaining to a facet grouping key. See the
AnyElement* : For customization.
attribute value : The facet value
attribute count : Number of occurrences counted for this facet value in the results
attribute type : Facet value data type, always one of xs:anyAtomicType. Optional attribute: when not specified, the default value is "xs:string".
AnyElement* : Any elements that do not belong to the facet namespace, for customization.
element facet* : Optional nested facets, for supporting hierarchical facets.
element facet* : Zero or more facet elements.
AnyElement* : For customization.
attribute name : Defines the facet name.
element group-by : Parameters for obtaining facet values from a sequence of items. Facet value is similar to the
concept of grouping key defined in the
element max-values : Optional limit for maximum number of facet values to be returned, after ordering is applied.
element order-by : Optional ordering parameters to specify the order of returned facet values. See this
AnyElement* : For customization.
element facet-definition* : Optional nested facet definitions for supporting hierarchical facets.
attribute direction : One of "ascending" or "descending". See
attribute empty : One of "greatest" or "least". See
content is one of "value" : Order by /facet/key/@value "count" : Order by /facet/key/@count, as xs:integer
When ordering by "value", where the facet values are of xs:string type, and attribute collation is specified in the group-by element, implementation must order the string values using the specified collation.
If order-by is not specified, then implementation must by default order by "count", with direction "descending".
attribute function : An optional string containing the QName of a function that returns customized facet values
attribute collation : An optional string defined as
attribute type : An optional string containing a xs:QName plus an optional
element sub-path As defined by If attribute function is provided, then more than 1 sub-path may be specified , otherwise only 1 sub-path is
allowed
AnyElement* : For customization.
The facet value grouping rules to obtain and count the facet values are mostly identical to the rules specified by
For every item of the results sequence, the sub-path expressions are evaluated with the item as the context item.
The results are then atomized to a sequence of zero or more atomic values.
Apply the group-by function if specified. If no group-by function exists, the atomized results from step 2 are the facet values. If group-by function exists, the results from step 2 are passed to the group-by function. The returned result
from the group-by function are the facet values.
If attribute type is specified, strong type and cardinality checks are performed on the facet values. Error
Facet values are then counted using the same equality rule as defined by
There exists a key difference between the grouping-key defined in
For facet, it's perfectly logical to have a result item to be counted towards multiple facet values or none at all, thus there may exist zero or more facet values per result item.
For conciseness, the following examples show the equivalent group-by clauses for some group-by elements, assuming there is exactly one facet value (grouping key) per result item.
Example 1:
Example 2:
Example 3:
Example 4:
This function is: deterministic, context-independent, focus-independent
Given a result sequence, and a sequence of facet definitions, count the facet-values for each facet defined by the facet definition(s).
This function is: deterministic, context-independent, focus-independent
Given a result sequence, a facet definition, and a selected facet value contained in the facet element, return the results that match the selected facet value. This function can be used by both drill-down and drill-sideway queries.
In the case of hierarchical facets, the $selected-facet must have compatible hierarchical structure as $facet-definition. This should be true by default if an application constructed $selected_facet from the facets element returned by facet:count function using the same $facet-definition). See use cases for more details.
This function is: deterministic, context-independent, focus-independent
The group-by function is supplied by the application, called by both facet:count and facet:drill.
The group-by function is a function that generates facet values from the original values. Each item in the returned sequence is a facet value that must be counted by facet:count, or compared against by facet:drill. An empty return sequence is also allowed.
As facet-definition's group-by element may define multiple sub-path child elements, the group-by function has arity ranging from 2 to infinity. The atomized sub-path values are passed to the group-by function in the same order as defined in the group-by element.
Element facet-definition is also passed to the group-by function to allow an application to pass customized parameters to the group-by function.
In the case of hierarchical facet definition, facet:count and facet:drill must pass in the matching facet-definition element in the hierarchical structure to the group-by function. For example:
For the use cases, we use the sample "employee" data in Appendix B. Here is what one employee element looks like:
The XQuery using this facet proposal:
The equivalent XQuery using group-by-clause, since employee belongs to one and only one organization:
Expected result:
The XQuery using this facet proposal:
The equivalent XQuery using group-by-clause:
Expected result:
The XQuery using this facet proposal:
There is no equivalent XQuery using group-by-clause, because skill is a repeatable element.
Following XQuery will throw
Expected result:
The XQuery using this facet proposal:
The equivalent XQuery using group-by-clause:
Expected result:
The XQuery using this facet proposal:
Expected result:
The application displays:
The application then constructs following facet element, based on user's selection:
The application then calls facet:drill, which is able to map the selected facet element to the facet-definition, and effectively applies following XQuery filter expression to the result set:
Continuing from previous case, if the application allows user to simultaneously select multiple skills:
The application then constructs following two facet elements, based on user's selection:
And constructs the following XQuery:
Which is effectively the same as:
As an alternative to specifying a function QName in /facet-definition/group-by/@function, annotation may be used to associate a grouping function to a facet-definition.
Example:
In the example above, annotation "age-range" is used to associated the facet-definition to the function local:age-range.
Annotation introduces an additional level of indirection to link a facet-definition to a group-by function. The indirection does not appear to improve or simplify the API. Thus currently we chose the straight forward function QName association via attribute.
When facet:drill encounters a facet-definition that defines a group-by function, it's effectively filtering results using the following XQuery expression:
One main draw back of the above filtering implementation is that it voids any indexing optimization. To illustrate this, lets use a common numeric range facet as an example.
Suppose we've customized the following age range facet based on the same
Default facet:filter implementation translates to the following equivalent XQuery expression:
Application may choose to perform facet drill by directly constructing following XQuery expression:
The above XQuery constraint expression can be optimized by the engine to use the index, which is clearly faster than using the default facet:drill implementation.
Another example is the 'dynamic-age-range' facet, where the age range is determined by the minimum and
the maximum age values in the result set. In this case the result set must be pre-scanned before facet:count in order
to determine the dynamic range, and the range information must be persisted into
To completely support all possible real world facet requirements may be too complex for this proposal. For this
reason we've decided to make
The complete sample data used by the use cases, presented as an XQuery.