Data Platform

ProgressBlogs Founder’s Online: A Lesson in Performance

Founder’s Online: A Lesson in Performance

by Matt Allen

Posted on April 17, 2014 0 Comments

foundersonline This post is a snapshot of the talk given at MarkLogic World, titled, “Planning for Growth with and without Performance Metering,” given by David Sewell, Editorial and Technical Manager for University of Virginia Press, with support from Tim Finney, Lead Programmer out of Perth, Australia.

MarkLogic is great for large enterprises running large applications, but MarkLogic is also great for small shops that want to do great things. Founder’s Online launched in summer 2013, and providing public access to almost 150,000 searchable documents from six of the founding fathers: George Washington, Benjamin Franklin, John Adams, Thomas Jefferson, Alexander Hamilton, and James Madison. The site, a joint venture between University of Virginia Press in cooperation with National Archives, and powered by MarkLogic, is incredibly fast and scalable, delivering sub-second response times to thousands of concurrent users. Surprisingly, however, Founder’s Online was developed by an amazingly small team of people – on a relatively small budget.

Here are some quick facts about the project:

Small Team: 1.5 dedicated FTE to develop the site
Big Data: 150,000 searchable documents with an average size of 2MB
Fast Queries: 15,300 documents in 0.02 seconds
Serious Scale: 120 ms response time with five thousand concurrent users

So, how did the Founder’s Online team achieve such high performance?

According to David Sewell, there were three key elements that helped Founder’s Online achieve the great performance results:

1. Leverage the XML Data Model

All of the text from the letters was transcribed and transformed to XML. Each letter was then stored as an individual document within MarkLogic, making up a collection of 150,000 documents. For querying the XML, the team avoided using XPath node traversal, which was too slow and created hard-coded links and expansions. An example of the simple code in production for search queries is below:

search:search(
	$q-full,
	c:map-search-options($map),
	$start,
	$length
)

2. Rethink the Code

The team had to get away from legacy code and strategies and embrace new approaches. To help, they relied heavily on MarkLogic’s documentation onQuery Performance and Tuning Guide. The team also used the XQMVC framework, and is like many of the other MVC frameworks for languages such as Java, Python, PHP, Ruby, etc., except XQMVC is designed specifically for building complex applications in XQuery. Some of the other key things that the team did included:

Using maps instead of session fields
Used run-time switches
Ignored bottlenecks possibly deriving from search internals

With the new architecture, they were able to query 15,300 documents in 0.02 seconds.

An example of the application code showing a lexicon function is below:

let $publ :='JSMN'
let $duplicate :=
	cts:element-attribute-values(
		xs:QName('FGEA:mapData'),
		fn:QName(",'id'),
		(),
		(),
		cts:collection-query($publ)
	)[cts:frequency(.) gt 1]
return count($duplicates)

3. Rely Heavily on Caching

The team moved from dynamic to static wherever possible, both in rendering and search results, by relying on caching. They did this by developing a front-end caching proxy called Nginx; creating an HTML cache in MarkLogic to avoid the need for run-time XSLT rendering; and, developing a cache output from searches, facets, and result pages in the database for potential re-use. The documents in the search cache are stored as binaries in MarkLogic to avoid index overhead. By avoiding indexes, a document call simply pulls it in as XHTML, which is very efficient. An example of the code is below:

Binary {
  xs:hexBinary(
    xs:base64Binary(
       xdmp:base^64-encode(
          xdmp:quote($HTML-node)
       )
     )
   )

Using this approach to caching, the site showed serious improvements in query speeds. A 90-page document that originally took 19 seconds of query time on the old platform could be delivered in as little as 1.86 ms. IBM’s Global Technology Services even did some testing on the application and found that even with 5,00 concurrent users, average response time was still only 120 ms.

foundersonlineperformancetesting

*Load testing by IBM Global Technology Services using SOASTA, Inc.

Using these tactics to optimize performance, the Founder’s Online team was able to build a successful app that eventually will go on to support 90 volumes of over 175,000 of founder’s letters.

MarkLogic

Matt Allen

Matt Allen is a VP of Product Marketing Manager responsible for marketing all the features and benefits of MarkLogic across all verticals. In this role, Matt interfaces with the product and engineering team and with sales and marketing to create content and events that educate and inspire adoption of the technology. Matt is based at MarkLogic headquarters in San Carlos, CA and in his free time he is an artist who specializes in large oil paintings.

Comments

Comments are disabled in preview mode.

Topics

More From Progress

Shadow Analytics: Why You Can’t Afford to Leave It Unchecked

Then, Now and Beyond: The Future of Back Office Software

2022 Progress Data Connectivity Report

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Country/Territory

Blog

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio

MOVEit

WS_FTP