Progress Acquires MarkLogic! Learn More
BLOG ARTICLE

Scaling Your Database Doesn’t Have to be Hard

Back to blog
10.27.2016
2 minute read
Back to blog
10.27.2016
2 minute read

A friend of mine tweeted this out, looking for feedback:

Move complex queries out of the database and into service logic because it’s way easier to scale services horizontally?

A couple of the follow-ups showed a pain many of us have felt:

  • Absolutely. Have proven it myself many times. Databases don’t scale.
  • of course databases can scale, but pushing a button in Marathon is easier than sharding strategies

Is it true that databases don’t scale?

Is it easier to scale services than the database? As with many things, I’ll go with “it depends”.

Relational databases are hard to scale. You either spend a lot to scale up, or you deal with sharding as you scale out. With those choices, I can see the appeal of moving complex queries into a service layer. But there’s another way.

MarkLogic stores data as documents. A database is broken down into forests (the physical data storage), which includes the content of documents stored in the forest as well as any index values related to those documents. One of the benefits of this approach relates to how MarkLogic scales out: you don’t have to figure out a sharding strategy.

When you add a new server to a MarkLogic cluster, MarkLogic automatically shifts documents around — transactionally — to ensure an even distribution. MarkLogic Server is designed to scale to 100s of nodes or beyond. What does that mean for queries?

Suppose you have a cluster of five MarkLogic servers, each with about one-fifth of the content. A MarkLogic server that hosts data is known as a data node (d-node); a MarkLogic server that responds to queries is known as an evaluator node (e-node). Commonly, a server will be both an e-node and a d-node. Your query goes to one of the servers, which will act as the e-node for that query. The e-node sends the query to each of the data nodes (d-nodes) in the cluster. Each d-node will use its own indexes to determine which documents that it owns match the query, then returns an identifier and a score for each. The e-node picks the top scores from among the responses, then asks the owning d-nodes for the content. Note that each server in the cluster was able to determine which of its documents matched the query without input from others.

Let’s come back to the original question: should we move complex queries out of the database and into service logic because it’s way easier to scale services horizontally? With the MarkLogic approach, scaling the database is easy. But is there benefit to having complex queries in the database? I can think of two.

First, by keeping computation as close to the data as possible, we reduce the round-trips between the database and the application’s middle tier. The communication that does happen consists of the request and a final answer, instead of a larger amount of data that still needs processing.

Second, keeping complex queries in the database allows the middle tier to remain simpler. The middle tier makes simple calls to the database, without regard to how many nodes are in the database cluster.

Bottom line: if your database is hard to scale, maybe it’s worth looking into a different database.

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Architect Insights

What Is a Data Platform – and Why Do You Need One?

A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.

All Blog Articles
Architect Insights

Unifying Data, Metadata, and Meaning

We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.

All Blog Articles
Architect Insights

When a Knowledge Graph Isn’t Enough

A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo