The MarkLogic Tiered Storage add-on feature lets you store and manage data in different tiers based on cost and performance trade-offs—whether it’s flash storage, traditional local or shared disk storage, HDFS, or Amazon cloud storage. With Tiered Storage, your data is easily migrated between tiers without any ETL, additional software, or expensive infrastructure changes. This allows you to easily balance performance and capacity through the lifecycle of your data—meeting performance SLAs, making data governance easier, and meeting compliance requirements.
Move data across tiers without having to take the data offline, perform any ETL, or even re-index it—allowing you to move data without concerns, knowing that it’s always available when and where you need it
Easily migrate and resize data partitions. For example, if you have 18 forests across 3 hosts, you can change it to 20 forests across 4 hosts and have Tiered Storage manage the change
Partition data to different storage tiers using a set policy in database administration. For example, a policy can be created to automatically archive data if it is older than a certain date
Storage costs can vary widely from around $1 to $25 per gigabyte. With our Tiered Storage, you can avoid over-provisioning expensive storage for data that can be easily stored on a cheaper tier
Use Amazon S3 or HDFS as distributed file systems for cheaply storing large volumes of archival data, without losing the ability to bring that data back into an active, operational storage tier quickly, and without any ETL or re-indexing
You’ve got plenty of options with MarkLogic Tiered Storage. For each option, we’ve included an example for a configuration designed to store a few hundred terabytes of data. But, we recommend you consult with MarkLogic professionals to discuss the storage options that would work best for your unique use case.
Solid state drives can be used for MarkLogic Fast Data Directories. A configuration might include a few SSDs to handle a few gigabytes of active data. When the limit of capacity is reached, slower data directories pick up the workload
Local disk storage can be used for active, operational data. A configuration might include local 10K Serial Attached SCSI (SAS) RAID10 hard drives for a small number of hosts and a few dozen terabytes
Storage Area Networks and Network Attached Storage can be used for active data, but is more commonly used for older historical or archived data that is rarely updated. A configuration might include a few dozen hosts
The Hadoop Distributed File System is well-designed as an inexpensive tier for historical or archived data. A configuration might include a large cluster of dozens, or even hundreds of hosts to handle hundreds of terabytes of data
MarkLogic has pre-configured AMIs to quickly get going on Amazon Web Services with Amazon EBS Storage Volumes or Amazon S3 buckets. Amazon S3 is similar to HDFS, providing a cheap storage mechanism for older, non-transactional data
ABN Amro, a leading bank based in the Netherlands, uses MarkLogic to integrate trade data and get a single source of truth for reporting purposes such as MiFID II, as well as to provide alerts when behavior is outside normal parameters. They use a number of advanced features and options, including Bitemporal, Real-Time Alerting, Geospatial alerting, Tiered Storage, and Flexible Replication.
Deutsche Bank uses MarkLogic to integrate multiple front office trading systems and do trade processing. The trade store handles anywhere between 40-60 million trade events per day and is adding more and more data sources each year. With Tiered Storage, the bank can achieve improved TCO by using policies to send data to cheaper virtualized storage (VMWare).
Dow Jones runs their flagship application, Factiva, on MarkLogic, integrating structured and unstructured financial data, including everything from stock ticker prices to news stories. The system handles over 1.5B documents, including over 28 languages and 600 PR newswires constantly updated in real-time, and returns search queries in milliseconds.