Often when implementing a search interface, developers use facets to allow a user to narrow down search results. In this post, we will be discussing how MarkLogic’s Search features enable us to do this by looking at how to narrow search results by page categories. MarkLogic’s Search API makes it very easy to support search constraints, name/value pairs that a user can enter into a search field, which look something like this:
The above search text will look for the word “xquery” in all tutorials, which is to say, all documents that are in the “tutorial” category. In this example, I’ve defined “cat” (short for “category”) as a constraint. That’s what a constraint is: a name (“cat”) that can be paired with a value (“tutorial”), to constrain your search in some way. The default search grammar uses a colon, although even that can be overridden.
Constraints in and of themselves are nice, but how are users supposed to know how to use them, or even that they exist? One way to weave them naturally into your search UI is to use faceted navigation. In this case, the links with the constraints are provided automatically, not requiring users to type them in. In this case, we just have one constraint, with several possible values. Running a search for “xquery” by itself will display the current break-down by category of documents that mention XQuery:
As seen from the above example, there are 21 tutorials that mention XQuery. Clicking on that link adds the search constraint to the resulting query results page.
In this example, not only is “category” a constraint, but it also functions as a facet. When I first came across these terms, “constraint” and “facet,” I wasn’t sure what the difference was. After learning a bit more, I realized that they’re slightly different: Every facet is a constraint, but not every constraint is a facet.
For a constraint to also function as a facet, you have to be able to retrieve all its values (in the case of the “category” facet: “function”, “xcc”, “tutorial”, etc.) In other words, all of the unique values present in the database for a given constraint must be stored in a lexicon. That’s what allows us to quickly generate the breakdown by category. It also can help with getting a quick count of documents for each value (e.g., “21” in the case of the “tutorial” value).
Now that we know what constraints and facets are, how do we actually implement them? The Search API (particularly the search:search() function) makes it convenient to retrieve facet values and counts in the resulting XML. The simplest call to search:search() only requires passing in some query text. First, we need to import the Search API library.
import module namespace search="http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";
Then make a simple call:
To customize the behavior, we need to pass in an <options> node. And to make this code look more like production code, we’ll make the query text dependent on an HTTP request parameter (“q”). Our final call to search:search() looks something like this:
declare variable $q := xdmp:get-request-field("q",""); declare variable $options := <options xmlns="http://marklogic.com/appservices/search"> ... </options>; search:search($q, $options)
Now let’s drill down into the <options> node to define our constraint, leaving all the other options at their defaults.
Each constraint is defined by a <constraint> element, and the type of constraint (i.e., how the data behind the constraint is represented) is determined by what you put inside that <constraint> element. There are several choices here. Figure 1 summarizes the different constraint types, what element you use to represent them, and whether or not the constraint can also function as a facet:
|Constraint type element
(child of <constraint>)
|Type of constraint||Can function as facet?|
|<value>||value of a specific element, attribute, or field||No|
|<word>||word in a specific element, attribute, or field||No|
|<range>||value or range of values in a specific element, attribute, or field||Yes|
|<element-query>||word query restricted to the specified element||No|
|<properties>||word query restricted to the properties document||No|
|geospatial queries||Yes, if it has a <heatmap> child too|
|<custom>||custom XQuery-defined mapping between constraint value and underlying XML||Yes, if you have an appropriate lexicon|
Figure 1: Constraint type vs. facet. Full details can be found in the documentation for the search:search() function.
How did we choose a constraint type in our example? Well, we knew we needed facets and we knew this wasn’t going to be a geospatial query, so that narrowed the list down to three choices of constraint type:
- <collection>, or
Range constraints are probably the most common choice used as a basis for faceted navigation. However, they require your documents to have some resemblance to each other. Specifically, each document must contain a common element name, element/attribute, or applicable field definition. Range constraints also require that the constraint’s value(s) appear directly in the document. If we had planned all of the site’s content from the start with faceted navigation in mind, then we probably would have created something like a <category> element in every document, and then created a range index on it so we could provide faceted navigation based on that element.
In this example, we had a number of different heterogeneous document types, none of which explicitly list a category value. Perhaps we needed to use a <custom> constraint to customize the exact mapping between a constraint query (such as “cat:function”) and the query against the underlying XML representation. This also required creating a lexicon so that all the values (“function”, “tutorial”, “blog”, etc.) could be quickly extracted, that is so the custom constraint could function as a facet. But to avoid putting those values in the document content, we considered storing them in collection URIs. Before we knew it, we were reinventing collection constraints!
<collection> constraints are so easy in comparison to a custom constraint! Here’s the final $options node:
Now, the only step left to do was to associate all documents with collection URIs. Here’s the specific XQuery function to do that:
As you can see, we had various ways of mapping documents to their category, sometimes based on the document element name, and other times based on the document URI. This can evolve over time and we can use any arbitrary expression to determine the document category. You no longer have to waste time thinking about how to alter document structure.
The invoking code then associates the given document (using xdmp:document-add-collections()) with the appropriate collection URI: “category/xcc”, “category/event”, “category/tutorial”, etc. The Search API has special support for using common prefixes within collection URIs. The prefix (“category/”) acts as the constraint name, and everything after the prefix (e.g., “tutorial”) acts as the constraint’s value. Behind the scenes, the Search API calls cts:collection-match(“category/*”) to efficiently retrieve all the values for my given constraint. Cool stuff!
The only catch (if one exists at all) is that you must maintain these collection URI associations. There are several ways of doing this. One way is to ensure that every time a document is updated, the above function is called to correctly (re-)associate it with the right category. Another option is to have a global script that does a brute-force update of all documents. Finally, and possibly the best approach, we can set up a CPF pipeline so that documents automatically get their category updated whenever they’re updated.
As a finale, here’s the relevant portion of the response that search:search() returns for us, making it easy to generate the faceted navigation menu:
<search:facet name="cat"> <search:facet-value name="blog" count="53">blog</search:facet-value> <search:facet-value name="code" count="35">code</search:facet-value> <search:facet-value name="event" count="11">event</search:facet-value> <search:facet-value name="function" count="1294">function</search:facet-value> <search:facet-value name="guide" count="21">guide</search:facet-value> <search:facet-value name="news" count="8">news</search:facet-value> <search:facet-value name="other" count="29">other</search:facet-value> <search:facet-value name="tutorial" count="21">tutorial</search:facet-value> <search:facet-value name="xcc" count="128">xcc</search:facet-value> <search:facet-value name="xccn" count="72">xccn</search:facet-value> </search:facet>
As you can see, each of the relevant category values is returned, along with a count of how many matching documents are in the given category. Now do you see why the Search API is so cool?
Just getting started? Try out the 5-minute Guide to the Search API. It’s how I first got up-to-speed, and I highly recommend it.