Progress Acquires MarkLogic! Learn More
BLOG ARTICLE

Avoiding Eval with First-Class Functions

Back to blog
05.24.2016
7 minute read
Back to blog
05.24.2016
7 minute read

The Evils of Eval

Most dynamic languages allow you to evaluate a string of code, for example eval, in JavaScript or Python. Eval is powerful (and mandatory) if you’re building an IDE. However, the benefits are usually greatly outweighed by the risks.

Evaluated code is much more difficult to write than inline code. In JavaScript, you have to escape things like quotes and line breaks and your editor probably won’t help you with syntax highlighting or type-ahead. Code hidden in strings also makes the code much more difficult to read, not to mention debug. However, these are minor compared to the potential security problems that eval introduces.

Injection attacks are a type of security vulnerability when data supplied by a user is interpreted or executed in a malicious or unexpected way. SQL injection is one of the most common occurrences (“Little Bobby Tables” anyone?), but any code that is evaluated is susceptible.

For example, the following code from a naïve calculator application takes a mathematical expression and returns the answer.

function calculate(expression) {
  return eval(expression);
}
'The answer is ' + calculate(request['expression']);

This works great for expressions like 1 + 1 or even Math.acos(3 * Math.PI). However, what if the user passed in System.shutdown() or database.clear() or users.findByID(1234).creditAccount(9999999999, '£')? The calculate() function would blindly execute these as well, with potentially dire consequences. Even if a user does not know what specific functionality is available in the target evaluation context, it is not very difficult to guess or get up to no good with just the core language. To implement our calculator safely, we should implement our own expression parser that can sanitize and validate inputs to make sure they are valid math expressions and not arbitrary code.

Evaluating Code in a Different Context in MarkLogic

MarkLogic provides built-in APIs to evaluate code. This is most useful as a means to run code in a context different than the request from which it was called, for example in a different transaction, as another user, or asynchronously on the task server.

This is useful in many ways:

  • Query, update, or insert documents into another database, for example, to write a schema into the Modules database or move documents from a staging to a separate production database.
  • Orchestrate multiple transactions in a single request. By default MarkLogic queues up all database updates and applies them atomically at the end of a request. If you need to store or view intermediate results, you’ll need to execute those in a separate transaction.
  • Run a query at a particular database timestamp. By specifying a an explicit timestamp to a query you can effectively get a consistent snapshot of the database, even across separate transactions.

Take a look at the options to xdmp.eval() for other ways to affect the context of evaluated code.

Like JavaScript’s built-in eval, xdmp.eval() takes a string of JavaScript and configuration options and runs the passed in code in the context of the options. For all of the reasons above, xdmp.eval() is generally to be avoided. A better option is to use xdmp.invoke(). Unlike xdmp.eval(), with xdmp.invoke() you specify a path to an existing module. Like xdmp.eval(), you can use the $vars argument to safely pass in dynamic parameters to the stored module. That’s a much safer way to parametrize evaluated code than building strings to eval. However, unlike xdmp.eval(), there’s no chance that an invoked module will unsafely evaluate an input. xdmp.invoke() uses the same set of context options that xdmp.eval() uses, so you can invoke a module in a separate transaction or as a different user.

Enter xdmp.invokeFunction()

Unfortunately, it’s not always feasible or convenient to isolate your dynamic code into its own main module. xdmp.invokeFunction() allows you to invoke any in-context function, even anonymous ones that you build on the fly. Think of it as a MarkLogic-enhanced version of Function.prototype.apply(). Moreover, xdmp.invokeFunction() allows you to separate the concerns of what the function does from the context in which it’s evaluated. This makes for cleaner code and easier testing.

Take, for example, the following trivial illustration. The xdmp.transaction() function gives the ID of the current transaction. Because the xdmp.invokeFunction() call specifies that the second call to xdmp.transaction() be run in a separate transaction you’ll get a different ID.

[
  xdmp.transaction(),
  xdmp.invokeFunction(xdmp.transaction, { isolation: 'different-transaction' })
]

The first call returns the transaction assigned to the current request. The second, using xdmp.invokeFunction() explicitly calls the xdmp.transaction() function in a different transaction. Note the use of xdmp.transaction sans parentheses. xdmp.transaction() calls the xdmp.transaction function. xdmp.transaction, no parens, is a reference to the function itself. The actual identifiers in the output below are not important. The fact that they’re different because of the evaluation context is important.

[
  "4394203566847635840", 
  "8340410512199485627"
]

Beyond xdmp.invokeFunction()

xdmp.invokeFunction() is the best way to run code in a different context with Server-Side JavaScript in MarkLogic. However, it requires that you pass it a zero-arity function, i.e. one that has no inputs, and always returns a ValueIterator, even if the invoked function returns an atomic value. With the magic of first-class functions in JavaScript, we can provide a friendlier version.

/**
 * Return a function proxy to invoke a function in another context.
 * The proxy can be called just like the original function, with the
 * same arguments and return types. Example uses: to run the input 
 * as another user, against another database, or in a separate 
 * transaction. 
 *
 * @param {function} fct     The function to invoke
 * @param {object} [options] The `xdmp.eval` options. 
 *                           Use `options.user` as a shortcut to 
 *                           specify a user name (versus an ID). 
 *                           `options.database` can take a `string` 
 *                           or a `number`.
 * @param {object} [thisArg] The `this` context when calling `fct`
 * @return {function}        A function that accepts the same arguments as
 *                           the originally input function.
 */
function applyAs(fct, options, thisArg) {
  return function() {
    var args = Array.prototype.slice.call(arguments);
    // Curry the function to include the params by closure.
    // `xdmp.invokeFunction` requires that invoked functions have
    // an arity of zero.
    var f = function () {
      // Nested ValueIterators are flattened. Thus if `fct` returns a ValueIterator
      // there’s no way to differentiate it from the ValueIterator that 
      // `xdmp.invokeFunction` (or `xdmp.eval` or `xdmp.invoke` or `xdmp.spawn`)
      // returns. However, by wrapping the returned Sequence in something else—
      // an array here—we can “pop” the stack to get the actual return value.
      return [fct.apply(thisArg, args)]; 
    };
    
    options = options || {};
    // Allow passing in database name, rather than id
    if('string' === typeof options.database) { options.database = xdmp.database(options.database); }
    // Allow passing in user name, rather than id
    if(options.user) { options.userId = xdmp.user(options.user); delete options.user; }
    // Allow the functions themselves to declare their transaction mode
    if(fct.transactionMode && !(options.transactionMode)) { options.transactionMode = fct.transactionMode; }

    return fn.head(xdmp.invokeFunction(f, options)).pop();
  }
}

applyAs() takes a function and the same options argument as xdmp.invokeFunction() and returns a new function that behaves just like the input, but will be invoked in the context determined by the options. Thus, downstream consumers don’t need to be aware that the function is being invoked in a different context and can call the function as if it were the original function. For example, the (contrived) insert() function below takes a URI and string message, saves a document to the database, and returns a string.

function insert(uri, message) {
  xdmp.documentInsert(uri, { message: message }, xdmp.defaultPermissions(), xdmp.defaultCollections());
  return message;
}

var myInsert = applyAs(insert, { database: 'Modules', transactionMode: 'update-auto-commit' });

myInsert('/hello.json', 'Hello, world!');

myInsert() has the same “signature” as the insert function but hides its evaluation context, simplifiying usage, very similar to applying around advice in aspect-oriented programming.

This approach is a lot cleaner and has a clearer separation of the logic and the orchestration than something like the following:

function myInsert(uri, message) {
  return fn.head(
    xdmp.invokeFunction(function() {
      xdmp.documentInsert(uri, { message: message }, xdmp.defaultPermissions(), xdmp.defaultCollections());
    }, { database: '3616783675111452341', transactionMode: 'update-auto-commit' })
  );
}

Summary

To summarize, it’s almost always a bad idea to eval strings of code. This leaves you open to injection attacks and makes code more difficult to read and write. Instead, use xdmp.invokeFunction() in MarkLogic Server-Side JavaScript to run a function in another context, such as in a separate transaction, against another database, or as another user. First-class functions in JavaScript can help you write a better xdmp.invokeFunction() that can be used to wrap existing functions, hiding the change of context from consumers.

Stay safe out there.

Justin Makeig

Read more by this author

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Developer Insights

Multi-Model Search using Semantics and Optic API

The MarkLogic Optic API makes your searches smarter by incorporating semantic information about the world around you and this tutorial shows you just how to do it.

All Blog Articles
Developer Insights

Create Custom Steps Without Writing Code with Pipes

Are you someone who’s more comfortable working in Graphical User Interface (GUI) than writing code? Do you want to have a visual representation of your data transformation pipelines? What if there was a way to empower users to visually enrich content and drive data pipelines without writing code? With the community tool Pipes for MarkLogic […]

All Blog Articles
Developer Insights

Part 3: What’s New with JavaScript in MarkLogic 10?

Rest and Spread Properties in MarkLogic 10 In this last blog of the series, we’ll review over the new object rest and spread properties in MarkLogic 10. As mentioned previously, other newly introduced features of MarkLogic 10 include: The addition of JavaScript Modules, also known as MJS (discussed in detail in the first blog in this […]

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo