We’ve joined forces with Smartlogic to reveal smarter decisions—together.

Avoiding Eval with First-Class Functions

The Evils of Eval

Most dynamic languages allow you to evaluate a string of code, for example eval, in JavaScript or Python. Eval is powerful (and mandatory) if you’re building an IDE. However, the benefits are usually greatly outweighed by the risks.

Evaluated code is much more difficult to write than inline code. In JavaScript, you have to escape things like quotes and line breaks and your editor probably won’t help you with syntax highlighting or type-ahead. Code hidden in strings also makes the code much more difficult to read, not to mention debug. However, these are minor compared to the potential security problems that eval introduces.

Injection attacks are a type of security vulnerability when data supplied by a user is interpreted or executed in a malicious or unexpected way. SQL injection is one of the most common occurrences (“Little Bobby Tables” anyone?), but any code that is evaluated is susceptible.

For example, the following code from a naïve calculator application takes a mathematical expression and returns the answer.

function calculate(expression) {
  return eval(expression);
'The answer is ' + calculate(request['expression']);

This works great for expressions like 1 + 1 or even Math.acos(3 * Math.PI). However, what if the user passed in System.shutdown() or database.clear() or users.findByID(1234).creditAccount(9999999999, '£')? The calculate() function would blindly execute these as well, with potentially dire consequences. Even if a user does not know what specific functionality is available in the target evaluation context, it is not very difficult to guess or get up to no good with just the core language. To implement our calculator safely, we should implement our own expression parser that can sanitize and validate inputs to make sure they are valid math expressions and not arbitrary code.

Evaluating Code in a Different Context in MarkLogic

MarkLogic provides built-in APIs to evaluate code. This is most useful as a means to run code in a context different than the request from which it was called, for example in a different transaction, as another user, or asynchronously on the task server.

This is useful in many ways:

  • Query, update, or insert documents into another database, for example, to write a schema into the Modules database or move documents from a staging to a separate production database.
  • Orchestrate multiple transactions in a single request. By default MarkLogic queues up all database updates and applies them atomically at the end of a request. If you need to store or view intermediate results, you’ll need to execute those in a separate transaction.
  • Run a query at a particular database timestamp. By specifying a an explicit timestamp to a query you can effectively get a consistent snapshot of the database, even across separate transactions.

Take a look at the options to xdmp.eval() for other ways to affect the context of evaluated code.

Like JavaScript’s built-in eval, xdmp.eval() takes a string of JavaScript and configuration options and runs the passed in code in the context of the options. For all of the reasons above, xdmp.eval() is generally to be avoided. A better option is to use xdmp.invoke(). Unlike xdmp.eval(), with xdmp.invoke() you specify a path to an existing module. Like xdmp.eval(), you can use the $vars argument to safely pass in dynamic parameters to the stored module. That’s a much safer way to parametrize evaluated code than building strings to eval. However, unlike xdmp.eval(), there’s no chance that an invoked module will unsafely evaluate an input. xdmp.invoke() uses the same set of context options that xdmp.eval() uses, so you can invoke a module in a separate transaction or as a different user.

Enter xdmp.invokeFunction()

Unfortunately, it’s not always feasible or convenient to isolate your dynamic code into its own main module. xdmp.invokeFunction() allows you to invoke any in-context function, even anonymous ones that you build on the fly. Think of it as a MarkLogic-enhanced version of Function.prototype.apply(). Moreover, xdmp.invokeFunction() allows you to separate the concerns of what the function does from the context in which it’s evaluated. This makes for cleaner code and easier testing.

Take, for example, the following trivial illustration. The xdmp.transaction() function gives the ID of the current transaction. Because the xdmp.invokeFunction() call specifies that the second call to xdmp.transaction() be run in a separate transaction you’ll get a different ID.

  xdmp.invokeFunction(xdmp.transaction, { isolation: 'different-transaction' })

The first call returns the transaction assigned to the current request. The second, using xdmp.invokeFunction() explicitly calls the xdmp.transaction() function in a different transaction. Note the use of xdmp.transaction sans parentheses. xdmp.transaction() calls the xdmp.transaction function. xdmp.transaction, no parens, is a reference to the function itself. The actual identifiers in the output below are not important. The fact that they’re different because of the evaluation context is important.


Beyond xdmp.invokeFunction()

xdmp.invokeFunction() is the best way to run code in a different context with Server-Side JavaScript in MarkLogic. However, it requires that you pass it a zero-arity function, i.e. one that has no inputs, and always returns a ValueIterator, even if the invoked function returns an atomic value. With the magic of first-class functions in JavaScript, we can provide a friendlier version.

 * Return a function proxy to invoke a function in another context.
 * The proxy can be called just like the original function, with the
 * same arguments and return types. Example uses: to run the input 
 * as another user, against another database, or in a separate 
 * transaction. 
 * @param {function} fct     The function to invoke
 * @param {object} [options] The `xdmp.eval` options. 
 *                           Use `options.user` as a shortcut to 
 *                           specify a user name (versus an ID). 
 *                           `options.database` can take a `string` 
 *                           or a `number`.
 * @param {object} [thisArg] The `this` context when calling `fct`
 * @return {function}        A function that accepts the same arguments as
 *                           the originally input function.
function applyAs(fct, options, thisArg) {
  return function() {
    var args = Array.prototype.slice.call(arguments);
    // Curry the function to include the params by closure.
    // `xdmp.invokeFunction` requires that invoked functions have
    // an arity of zero.
    var f = function () {
      // Nested ValueIterators are flattened. Thus if `fct` returns a ValueIterator
      // there’s no way to differentiate it from the ValueIterator that 
      // `xdmp.invokeFunction` (or `xdmp.eval` or `xdmp.invoke` or `xdmp.spawn`)
      // returns. However, by wrapping the returned Sequence in something else—
      // an array here—we can “pop” the stack to get the actual return value.
      return [fct.apply(thisArg, args)]; 
    options = options || {};
    // Allow passing in database name, rather than id
    if('string' === typeof options.database) { options.database = xdmp.database(options.database); }
    // Allow passing in user name, rather than id
    if(options.user) { options.userId = xdmp.user(options.user); delete options.user; }
    // Allow the functions themselves to declare their transaction mode
    if(fct.transactionMode && !(options.transactionMode)) { options.transactionMode = fct.transactionMode; }

    return fn.head(xdmp.invokeFunction(f, options)).pop();

applyAs() takes a function and the same options argument as xdmp.invokeFunction() and returns a new function that behaves just like the input, but will be invoked in the context determined by the options. Thus, downstream consumers don’t need to be aware that the function is being invoked in a different context and can call the function as if it were the original function. For example, the (contrived) insert() function below takes a URI and string message, saves a document to the database, and returns a string.

function insert(uri, message) {
  xdmp.documentInsert(uri, { message: message }, xdmp.defaultPermissions(), xdmp.defaultCollections());
  return message;

var myInsert = applyAs(insert, { database: 'Modules', transactionMode: 'update-auto-commit' });

myInsert('/hello.json', 'Hello, world!');

myInsert() has the same “signature” as the insert function but hides its evaluation context, simplifiying usage, very similar to applying around advice in aspect-oriented programming.

This approach is a lot cleaner and has a clearer separation of the logic and the orchestration than something like the following:

function myInsert(uri, message) {
  return fn.head(
    xdmp.invokeFunction(function() {
      xdmp.documentInsert(uri, { message: message }, xdmp.defaultPermissions(), xdmp.defaultCollections());
    }, { database: '3616783675111452341', transactionMode: 'update-auto-commit' })


To summarize, it’s almost always a bad idea to eval strings of code. This leaves you open to injection attacks and makes code more difficult to read and write. Instead, use xdmp.invokeFunction() in MarkLogic Server-Side JavaScript to run a function in another context, such as in a separate transaction, against another database, or as another user. First-class functions in JavaScript can help you write a better xdmp.invokeFunction() that can be used to wrap existing functions, hiding the change of context from consumers.

Stay safe out there.

Start a discussion

Connect with the community




Most Recent

View All

Unifying Data, Metadata, and Meaning

We're all drowning in data. Keeping up with our data - and our understanding of it - requires using tools in new ways to unify data, metadata, and meaning.
Read Article

How to Achieve Data Agility

Successfully responding to changes in the business landscape requires data agility. Learn what visionary organizations have done, and how you can start your journey.
Read Article

Scaling Memory in MarkLogic Server

This not-too-technical article covers a number of questions about MarkLogic Server and its use of memory. Learn more about how MarkLogic uses memory, why you might need more memory, when you need more memory, and how you can add more memory.
Read Article
This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.