Progress Acquires MarkLogic! Learn More
BLOG ARTICLE

Anchor Dates for Finding Recent Documents

Back to blog
07.19.2017
2 minute read
Back to blog
07.19.2017
2 minute read

In a previous post, I had written a recipe for finding documents containing recent dates and used the following as part of the query:

cts.elementRangeQuery(
  fn.QName("", "pubdate"), "<=", fn.currentDateTime(),
  "score-function=reciprocal")
cts:element-range-query(
  xs:QName("pubdate"), "<=", current-dateTime(),
  "score-function=reciprocal")

One reviewer asked, “is there a disadvantage to specifying pubdate>=xs:dateTime(xs:date(‘0001-01-01’)) score-rating=linear?” It turns out that there indeed is.

When using score-function=reciprocal or score-function=linear, values near the anchor value will be more differentiated (and thus more useful for scoring) than values that are far away.

To illustrate this, let’s generate some sample data. Using the code below, we can generate 100 simple documents, each containing a date that is some months behind the current date.

for $i in (1 to 100)
return
  xdmp:document-insert(
    '/content/doc' || $i || ".xml",
    <doc>
      <pubdate>{ fn:current-dateTime() - xs:yearMonthDuration("P" || $i || "M") }</pubdate>
    </doc>
  )

We’re going to use a range query, so add a date element range index.

The first query uses score-function=reciprocal to see how far the dates in the documents are from today:

var jsearch = require('/MarkLogic/jsearch.sjs');
jsearch.documents()
  .where([
    cts.elementRangeQuery(
      fn.QName("", "pubdate"), "<=", fn.currentDate(),
      "score-function=reciprocal")
  ])
  .slice(0, 100)
  .result()

When we run this, documents come back in the correct order. The search items with indexes 15 & 16 (zero-based index) show the first score collision, with clumps of gradually increasing size coming after. We’re getting some reasonable differentiation based on how far back the documents dates go; when combined with other relevancy factors, this should produce a good ordering.

Now let’s take a look at the opposite approach: how far away are the documents from an ancient time?

var jsearch = require('/MarkLogic/jsearch.sjs');
jsearch.documents()
  .where([
    cts.elementRangeQuery(
      fn.QName("", "pubdate"), ">=", xs.date("0001-01-01"),
      "score-function=linear")
  ])
  .slice(0, 100)
  .result()

All my documents have dates later than year 0001, and the further they are from that year, the higher the score should be. Sounds good, but the math behind the scenes emphasizes dates close to the anchor. In this case, the dates are far enough away that all documents got the same score. Thus, this score contribution is not useful for ordering recent results.

I also ran the experiment with replacing dates with dateTimes and the results were even more dramatic. With the difference in granularity, the equations expect small differences to be significant. Therefore, big differences are poorly differentiated.

Conceptually, you might think you can approach distance scoring from either direction. In practice, if there’s an endpoint you care more about, use that as your anchor.

Further Reading

Recipe — Sort results to promote recent documents

Documentation — Relevance Scores: Understanding and Customizing

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Tutorial

Poker Fun with XQuery

In this post, we dive into building a full five-card draw poker game with a configurable number of players. Written in XQuery 1.0, along with MarkLogic extensions to the language, this game provides examples of some great programming capabilities, including usage of maps, recursions, random numbers, and side effects. Hopefully, we will show those new to XQuery a look at the language that they may not get to see in other tutorials or examples.

All Blog Articles
Tutorial

Protecting passwords in ml-gradle projects

If you are getting involved in a project using ml-gradle, this tip should come in handy if you are not allowed to put passwords (especially the admin password!) in plain text. Without this restriction, you may have multiple passwords in your gradle.properties file if there are multiple MarkLogic users that you need to configure. Instead of storing these passwords in gradle.properties, you can retrieve them from a location where they’re encrypted using a Gradle credentials plugin.

All Blog Articles
Tutorial

Getting Started with Apache Nifi: Migrating from Relational to MarkLogic

Apache NiFi introduces a code-free approach of migrating content directly from a relational database system into MarkLogic. Here we walk you through getting started with migrating data from a relational database into MarkLogic

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo