MarkLogic 5 is an operational database for Big Data. It provides the agility you need to build and deploy Big Data Applications with structured, semi-structured, and unstructured information. As a key part of your infrastructure, MarkLogic 5 gives your organization the functionality and flexibility required to adapt quickly to changing market conditions and new requirements. See what’s new in MarkLogic 5.
Key Characteristics
Schema-agnostic
Big Data and unstructured information is available in a wide variety of distinct formats. It may or may not have a schema, and even if it does, the schema may be poorly defined or well-defined, but not strictly followed. The schema may change over time, and in some cases, is not fully knowable.
MarkLogic’s ability to load information “as is” allows it to load unstructured information despite these complex characteristics. Its universal indexes recognize the many components of unstructured information (e.g., words, images, video, numbers, dates, markup, metadata) without having to know or plan for the underlying format. Development teams experience significant design time reductions with MarkLogic, as compared with the significant design effort required for managing unstructured information into relational databases.
Fast and Scalable
Unstructured information is a growing challenge which already makes up an estimated 80% of your organization’s information. Traditional tools, built decades ago, were not intended to effectively deal with Big Data or unstructured information.
MarkLogic is built on highly optimized native C++ code and implements high performance algorithms and indexes to run fast, even under heavy loads. Its shared-nothing architecture allows it to cluster across commodity hardware to handle massive amounts of information. MarkLogic customers regularly run advanced queries across terabytes of information in millions, and even billions, of documents with sub-second response time.
Flexible and Extensible
Unstructured information tends to be updated frequently, at both the content and structure levels. Unstructured information is often repackaged and repurposed for other uses, so picking and choosing specific pieces of unstructured information is critical to any organization that wants to fully leverage its information.
MarkLogic uses XML documents as the data model, which allows easy updates to the structure and format of stored information. Even when information needs to be modified significantly, no schema needs to be re-assessed and updated to make the change. With XML, MarkLogic is able to access information at a granular level to easily update information and to also transform information into a wide variety of output formats.
Feature Highlights
Geospatial
By combining geographic data and textual information, users can more easily analyze, exploit, or assimilate the information. MarkLogic provides the capability to natively index geospatial metadata, allowing users to define geospatial polygons to further refine query results.
Alerting
Real-time alerting notifies users instantly when relevant information arrives and avoids the delays inevitable on systems that rely on periodically polling the repository for updates.
Replication
Replication is a system for distributing documents to remote MarkLogic clusters. This is a critical component of a disaster recovery solution, and also improves responsiveness by pushing information geographically closer to users to reduce network latency. And it enhances information sharing by allowing transformations during distribution, thus sharing only relevant information to remote users.