American Psychological Association
The APA has 150,000 members and 54 divisions in the subfields of psychology. One of the association’s primary objectives has been the exchange of scientific information. To that end, the Office of Publications and Databases, a major division of the APA, publishes an array of scholarly journals, dozens of books annually, and several academic databases including PsycINFO and PsycARTICLES.
As the premier provider of information in the discipline of psychology, APA’s website gets more than one million hits a day. In 2008, APA moved its content from a legacy system to MarkLogic. Currently, the single operational database stores a complicated collection of Big Data including 160,000 full journal articles, 3 million abstracts, and 54 million cited references dating back as far as the late 19th century. The most challenging aspect of leveraging this content is the variety and complexity of the documents. The articles often have ambiguous connections that must be linked together for researchers to properly perform their jobs.
While APA acknowledges Lucene is a popular open source search engine and a good technology, there were several challenges in using it. One challenge was that Lucene was simply an additional layer of architecture which required additional code, infrastructure, and deployment tasks – none of which are necessary with an all-in-one database and search solution. In addition, search response times were inconsistent which created a disappointing user experience. However, the biggest challenge was in the timely delivery of content to APA users. With the Lucene-based implementation, depending on the size and complexity of the content package, it could take up to 48 hours to prepare, deliver, and post content on APA websites.
Since APA is the authoritative source for this content, timely online availability is important to the authors, readers, and business partners who depend on it. The situation was unacceptable to Beverly Jamison, senior director of IT Architecture and Publishing Solutions, “Search has a measurable impact on how our customers use our digital properties. The business benefits of having a product that goes to market faster are quite significant.”
Jamison needed a solution that could quickly deliver. The APA development team would have to invest significant time and resources to tune, test, and retune Lucene in order to improve performance, but even then the improvement would not be as thorough and sustainable as what APA would have with MarkLogic.
APA didn’t have to look far. Already familiar with the speed and flexibility of MarkLogic as a database, Jamison knew several major companies relied on MarkLogic for search and analysis as well. Beyond that, she had come to believe in MarkLogic and its community, “The culture really helped to back me up. I have never met a bad engineer from MarkLogic.”
Early proofs of concept showed MarkLogic would quickly alleviate APA’s pain points. Query speeds consistently hit under one second in testing. Less than 4 months later, APA had completely moved search from Lucene to MarkLogic with no downtime. Customers are already praising the faster response times. Index maintenance is now a small portion of one person’s job which has freed up two developers to focus on new features and performance improvements instead of keeping up a legacy system. That enabled the team to add spell suggest and typeahead features just a few weeks after deploying the search. The APA also was also able to reduce the server count from ten to three.
With MarkLogic, it is taking one hour to stage content, a few minutes to push live, and less than a second to run queries and analysis. This is a tremendous coup for our business as well as our technology.
Jamison says MarkLogic’s architecture and the new infrastructure at APA has continuing benefits. Foremost, the development team has newfound freedom to build and test new applications and front-end features in as little as a few weeks. The APA is now focusing on adding semantics services for customers and building new visualization options. It is also continuously simplifying its architecture as it takes down legacy systems one at a time to create better book and journal workflows.
By switching from Lucene to MarkLogic for search, APA not only met its initial goal of getting content out to customers more quickly, it also cut maintenance and hardware costs, and is consistently delivering subsecond search response. The end result is an edge in negotiations for new business as well as a more adept technology infrastructure in tune with the innovative work of the APA.