Searching with constraints and facets

Searching with constraints and facets

Posted on November 23, 2011 0 Comments

Often when implementing a search interface, developers use facets to allow a user to narrow down search results. In this post, we will be discussing how MarkLogic’s Search features enable us to do this by looking at how to narrow search results by page categories. MarkLogic’s Search API makes it very easy to support search constraints, name/value pairs that a user can enter into a search field, which look something like this:

Example search bar using name/value pairs

The above search text will look for the word “xquery” in all tutorials, which is to say, all documents that are in the “tutorial” category. In this example, I’ve defined “cat” (short for “category”) as a constraint. That’s what a constraint is: a name (“cat”) that can be paired with a value (“tutorial”), to constrain your search in some way. The default search grammar uses a colon, although even that can be overridden.

Constraints in and of themselves are nice, but how are users supposed to know how to use them, or even that they exist? One way to weave them naturally into your search UI is to use faceted navigation. In this case, the links with the constraints are provided automatically, not requiring users to type them in. In this case, we just have one constraint, with several possible values. Running a search for “xquery” by itself will display the current break-down by category of documents that mention XQuery:

resulting breakdown of categories when searching "xquery" in Search API

As seen from the above example, there are 21 tutorials that mention XQuery. Clicking on that link adds the search constraint to the resulting query results page.

query at the top of the resulting page when selecting category of search results-- constraint added to query

In this example, not only is “category” a constraint, but it also functions as a facet. When I first came across these terms, “constraint” and “facet,” I wasn’t sure what the difference was. After learning a bit more, I realized that they’re slightly different: Every facet is a constraint, but not every constraint is a facet.

For a constraint to also function as a facet, you have to be able to retrieve all its values (in the case of the “category” facet: “function”, “xcc”, “tutorial”, etc.) In other words, all of the unique values present in the database for a given constraint must be stored in a lexicon. That’s what allows us to quickly generate the breakdown by category. It also can help with getting a quick count of documents for each value (e.g., “21” in the case of the “tutorial” value).

Now that we know what constraints and facets are, how do we actually implement them? The Search API (particularly the search:search() function) makes it convenient to retrieve  facet values and counts in the resulting XML. The simplest call to search:search() only requires passing in some query text. First, we need to import the Search API library.

import module namespace search="http://marklogic.com/appservices/search"
       at "/MarkLogic/appservices/search/search.xqy";

Then make a simple call:

search:search("xquery")

Running the above query in Query Console will give you a <search:response> element, listing the first ten results for documents in your database containing the word “xquery”. In this case, all of the Search API’s default options are in effect. These options determine how the query text is interpreted, how many results to return, what format to return them in, etc.

To customize the behavior, we need to pass in an <options> node. And to make this code look more like production code, we’ll make the query text dependent on an HTTP request parameter (“q”). Our final call to search:search() looks something like this:

declare variable $q := xdmp:get-request-field("q","");
declare variable $options :=
  <options xmlns="http://marklogic.com/appservices/search">
    ...
  </options>;
 
search:search($q, $options)

Now let’s drill down into the <options> node to define our constraint, leaving all the other options at their defaults.

declare variable $options :=
  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="cat">
      <!-- constraint type element goes here -->
    </constraint>
  </options>;

Each constraint is defined by a <constraint> element, and the type of constraint (i.e., how the data behind the constraint is represented) is determined by what you put inside that <constraint> element. There are several choices here. Figure 1 summarizes the different constraint types, what element you use to represent them, and whether or not the constraint can also function as a facet:

Constraint type element (child of <constraint>)Type of constraintCan function as facet?
<value>value of a specific element, attribute, or fieldNo
<word>word in a specific element, attribute, or fieldNo
<collection>collection URIYes
<range>value or range of values in a specific element, attribute, or fieldYes
<element-query>word query restricted to the specified elementNo
<properties>word query restricted to the properties documentNo
<geo-elem-pair>,

<geo-attr-pair>,

<geo-elem>

geospatial queriesYes, if it has a <heatmap> child too
<custom>custom XQuery-defined mapping between constraint value and underlying XMLYes, if you have an appropriate lexicon

Figure 1: Constraint type vs. facet. Full details can be found in the documentation for the search:search() function.

How did we choose a constraint type in our example? Well, we knew we needed facets and we knew this wasn’t going to be a geospatial query, so that narrowed the list down to three choices of constraint type:

  • <range>,
  • <collection>, or
  • <custom>

Range constraints are probably the most common choice used as a basis for faceted navigation. However, they require your documents to have some resemblance to each other. Specifically, each document must contain a common element name, element/attribute, or applicable field definition. Range constraints also require that the constraint’s value(s) appear directly in the document. If we had planned all of the site’s content from the start with faceted navigation in mind, then we probably would have created something like a <category> element in every document, and then created a range index on it so we could provide faceted navigation based on that element.

In this example, we had a number of different heterogeneous document types, none of which explicitly list a category value. Perhaps we needed to use a <custom> constraint to customize the exact mapping between a constraint query (such as “cat:function”) and the query against the underlying XML representation. This also required creating a lexicon so that all the values (“function”, “tutorial”, “blog”, etc.) could be quickly extracted, that is so the custom constraint could function as a facet. But to avoid putting those values in the document content, we considered storing them in collection URIs. Before we knew it, we were reinventing collection constraints!

<collection> constraints are so easy in comparison to a custom constraint! Here’s the final $options node:

declare variable $options :=
  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="cat">
      <collection prefix="category/">
    </constraint>
  </options>;

Now, the only step left to do was to associate all documents with collection URIs. Here’s the specific XQuery function to do that:

declare function ml:category-for-doc($doc) as xs:string {
       if (contains(base-uri($doc), "/javadoc/")) then "xcc"
  else if (contains(base-uri($doc), "/dotnet/" )) then "xccn"
  else if ($doc/api:function-page               ) then "function"
  else if ($doc/*:guide                         ) then "guide"
  else if ($doc/ml:Announcement                 ) then "news"
  else if ($doc/ml:Event                        ) then "event"
  else if ($doc/ml:Article                      ) then "tutorial"
  else if ($doc/ml:Post                         ) then "blog"
  else if ($doc/ml:Project                      ) then "code"
                                                  else "other"
};

As you can see, we had various ways of mapping documents to their category, sometimes based on the document element name, and other times based on the document URI. This can evolve over time and we can use any arbitrary expression to determine the document category. You no longer have to waste time thinking about how to alter document structure.

The invoking code then associates the given document (using xdmp:document-add-collections()) with the appropriate collection URI: “category/xcc”, “category/event”, “category/tutorial”, etc. The Search API has special support for using common prefixes within collection URIs. The prefix (“category/”) acts as the constraint name, and everything after the prefix (e.g., “tutorial”) acts as the constraint’s value. Behind the scenes, the Search API calls cts:collection-match(“category/*”) to efficiently retrieve all the values for my given constraint. Cool stuff!

The only catch (if one exists at all) is that you must maintain these collection URI associations. There are several ways of doing this. One way is to ensure that every time a document is updated, the above function is called to correctly (re-)associate it with the right category. Another option is to have a global script that does a brute-force update of all documents. Finally, and possibly the best approach, we can set up a CPF pipeline so that documents automatically get their category updated whenever they’re updated.

As a finale, here’s the relevant portion of the response that search:search() returns for us, making it easy to generate the faceted navigation menu:

<search:facet name="cat">
  <search:facet-value name="blog" count="53">blog</search:facet-value>
  <search:facet-value name="code" count="35">code</search:facet-value>
  <search:facet-value name="event" count="11">event</search:facet-value>
  <search:facet-value name="function" count="1294">function</search:facet-value>
  <search:facet-value name="guide" count="21">guide</search:facet-value>
  <search:facet-value name="news" count="8">news</search:facet-value>
  <search:facet-value name="other" count="29">other</search:facet-value>
  <search:facet-value name="tutorial" count="21">tutorial</search:facet-value>
  <search:facet-value name="xcc" count="128">xcc</search:facet-value>
  <search:facet-value name="xccn" count="72">xccn</search:facet-value>
</search:facet>

As you can see, each of the relevant category values is returned, along with a count of how many matching documents are in the given category. Now do you see why the Search API is so cool?

Just getting started? Try out the 5-minute Guide to the Search API. It’s how I first got up-to-speed, and I highly recommend it.

Evan Lenz

View all posts from Evan Lenz on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments

Comments are disabled in preview mode.
Topics

Sitefinity Training and Certification Now Available.

Let our experts teach you how to use Sitefinity's best-in-class features to deliver compelling digital experiences.

Learn More
Latest Stories
in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Loading animation