01.05.2017 Views

348957348957

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

estrictive. MPP is quicker and easier to use, however, than standard MapReduce jobs. That’s<br />

because MPP can be queried using Structured Query Language (SQL), but native MapReduce jobs<br />

are controlled by the more complicated Java programming language.<br />

Introducing NoSQL databases<br />

A traditional RDBMS isn’t equipped to handle big data demands. That’s because it is designed to<br />

handle only relational datasets constructed of data that’s stored in clean rows and columns and<br />

thus is capable of being queried via SQL. RDBMSs are not capable of handling unstructured and<br />

semistructured data. Moreover, RDBMSs simply don’t have the processing and handling<br />

capabilities that are needed for meeting big data volume and velocity requirements.<br />

This is where NoSQL comes in. NoSQL databases are non-relational, distributed database systems<br />

that were designed to rise to the big data challenge. NoSQL databases step out past the traditional<br />

relational database architecture and offer a much more scalable, efficient solution. NoSQL systems<br />

facilitate non-SQL data querying of non-relational or schema-free, semistructured and unstructured<br />

data. In this way, NoSQL databases are able to handle the structured, semistructured, and<br />

unstructured data sources that are common in big data systems.<br />

NoSQL offers four categories of non-relational databases: graph databases, document databases,<br />

key-values stores, and column family stores. Because NoSQL offers native functionality for each<br />

of these separate types of data structures, it offers very efficient storage and retrieval functionality<br />

for most types of non-relational data. This adaptability and efficiency makes NoSQL an<br />

increasingly popular choice for handling big data and for overcoming processing challenges that<br />

come along with it.<br />

The NoSQL applications Apache Cassandra and MongoDB are used for data storage and real-time<br />

processing. Apache Cassandra is a popular type of key-value store NoSQL database, and<br />

MongoDB is a document-oriented type of NoSQL database. It uses dynamic schemas and stores<br />

JSON-esque documents. MongoDB is the most popular type of document store on the NoSQL<br />

market.<br />

Some people argue that the term NoSQL stands for Not Only SQL, and others argue that it<br />

represents Non-SQL databases. The argument is rather complex, and there is no cut-and-dried<br />

answer. To keep things simple, just think of NoSQL as a class of non-relational systems that<br />

do not fall within the spectrum of RDBMSs that are queried using SQL.<br />

Data Engineering in Action: A Case Study<br />

A Fortune 100 telecommunications company had large datasets that resided in separate data silos<br />

— data repositories that are disconnected and isolated from other data storage systems used<br />

across the organization. With the goal of deriving data insights that lead to revenue increases, the<br />

company decided to connect all of its data silos and then integrate that shared source with other

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!