15.07.2016 Views

MARKLOGIC SERVER

Inside-MarkLogic-Server

Inside-MarkLogic-Server

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

storage media but still query those documents as a single unit. For queries to work, the<br />

super- and sub-databases must have the same configuration. Querying multiple data<br />

sources at once and then aggregating the results is also known as federated search.<br />

For example, let's say that you have a Database A of active documents on fast,<br />

expensive storage and a Database B of archival documents on slower, lower-cost<br />

storage. You can create a super-database and associate it with Database A and Database<br />

B. (The super-database itself can also contain forests of documents, although that is not<br />

recommended since those forests cannot be queried independently of the associated<br />

sub-databases.) With this arrangement, you can query just the high-value documents<br />

in Database A, just the archival documents in Database B, or the entire corpus by<br />

querying the super-database.<br />

Query<br />

“cat OR dog”<br />

SUPER DATABASE<br />

DATABASE A<br />

DATABASE B<br />

F<br />

F<br />

F F F<br />

F<br />

F<br />

F<br />

F<br />

F<br />

Figure 25: A super-database lets you query multiple databases as a single unit.<br />

There are limitations on super-databases. You cannot perform updates on documents<br />

that reside in a sub-database via a super-database, and you must perform the updates<br />

directly on the sub-database where the documents reside. You also cannot have a superdatabase<br />

with more than one level of sub-databases beneath it. This is to avoid circular<br />

references—i.e., two databases referencing one another.<br />

HDFS AND AMAZON S3<br />

A key to tiered storage is the ability to store less-active data on lower-cost storage media.<br />

Two low-cost options are Hadoop Distributed File System (HDFS) and Amazon S3<br />

(Simple Storage Service).<br />

HDFS is the storage part of Hadoop. Because HDFS is open source and runs on<br />

commodity hardware, it can store data less expensively than traditional shared storage<br />

solutions such as SAN (storage area network) and NAS (network-attached storage).<br />

Once you configure MarkLogic to use HDFS, you can specify HDFS as a storage<br />

directory for a database using the "hdfs:" prefix.<br />

Amazon S3 is cloud-based storage service offered through Amazon Web Services. S3<br />

customers can interact with documents stored on the service through various interfaces,<br />

117

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!