14.12.2023 Views

DM Nov-Dec 2023

  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Dm MANAGEMENT: DATA ARCHITECTURE<br />

Embracing open data architecture<br />

Matt Peachey, Vice President, International at Dremio, argues that open is the smart way<br />

forward for data management<br />

For the past decades, data has been<br />

propelling business operations.<br />

Whether an organisation offers<br />

tangible goods or intangible services,<br />

crucial information about partners,<br />

workforce, processes, and clients forms<br />

the backbone of a company's wellbeing.<br />

At the heart of any computing system is<br />

data storage, therefore the selection of<br />

the appropriate solution to store it will<br />

significantly impact how efficiently an<br />

organisation's network and<br />

accompanying infrastructure cater to the<br />

business requirements.<br />

The primary expectation from a data<br />

storage system is to safely keep valuable<br />

data while allowing users and applications<br />

to retrieve it seamlessly and swiftly when<br />

required. However, with the volume of<br />

data growing exponentially and never<br />

deleted, businesses started to add more<br />

storage capacity.<br />

The issue deepens where data warehouse<br />

vendors store data in a proprietary format.<br />

Data gets locked into the platform,<br />

making it difficult and costly to extract if<br />

and when a business wants to. Further,<br />

maintaining and troubleshooting issues<br />

often require teams with deep subject<br />

matter expertise in the ecosystem - an<br />

expensive outlay.<br />

Given the multitude of data storage<br />

alternatives and system setups,<br />

organisations can get dragged into the<br />

rabbit hole whilst adding more data to<br />

their systems - a very inefficient approach.<br />

Organisations must embrace open-source<br />

standards, technologies and formats to<br />

ensure fast and cost-effective analytics<br />

with the best engine for each workload.<br />

This provides the agility to innovate with<br />

the next wave of technology without<br />

draining resources or time.<br />

EVOLUTION OF DATA ARCHITECTURE<br />

Previously, companies depended on<br />

conventional databases or warehouses for<br />

their Business Intelligence (BI) demands.<br />

However, these systems presented certain<br />

difficulties. The typical data warehouse<br />

setup requires investing in expensive onpremises<br />

hardware, maintaining structured<br />

data in proprietary formats, and<br />

dependence on a centralised IT and data<br />

department for analysis. Other obstacles<br />

included technical interoperability, system<br />

orchestration, and, more critically, scalability.<br />

However, things changed in 2006 with the<br />

launch of Hadoop, built on the Map-Reduce<br />

paradigm capable of parallel processing and<br />

producing enormous data sets over large<br />

clusters of commoditised hardware. This<br />

framework facilitated handling vast datasets<br />

distributed over computer clusters, making<br />

it immensely appealing for businesses<br />

accumulating more data with each passing<br />

day. Still, databases like Teradata and Oracle<br />

encapsulated storage, computation, and<br />

data within a single, interconnected system,<br />

offering no separation of compute and<br />

storage components.<br />

Between 2015 and 2020, however, the<br />

widespread usage of the public cloud<br />

altered this approach, enabling the<br />

separation of compute and storage. Cloud<br />

data vendors like AWS and Snowflake<br />

facilitated this separation in cloud<br />

warehouses, enhancing scalability and<br />

efficiency. Nevertheless, data still had to be<br />

ingested, loaded, and duplicated into a<br />

single proprietary system, which was<br />

attached to a solitary query engine.<br />

Employing multiple databases or data<br />

warehouses necessitated the storage of<br />

multiple data copies. Moreover, companies<br />

were still charged for transferring their data<br />

into and out of the proprietary system,<br />

which resulted in excessive costs.<br />

Enter more contemporary and open data<br />

architecture, where data exists as an<br />

independent layer. This includes highlighting<br />

a clear division between data and compute.<br />

Data is stored in open-source file formats<br />

and table formats and accessed by<br />

decoupled and elastic compute engines.<br />

Consequently, different engines can access<br />

the same data in a loosely tied architecture.<br />

In these architectures, data is stored as its<br />

own independent tier source in open<br />

formats within the company's cloud account<br />

and made accessible to downstream<br />

consumers through various services.<br />

This transformation parallels the shift in<br />

applications from monolithic architectures<br />

to microservices. A comparable transition<br />

is presently occurring in data analytics,<br />

with companies migrating from<br />

proprietary data warehouses and ceaseless<br />

ETL (Extract, Transform, Load) processes to<br />

open data architectures like cloud data<br />

lakes and lakehouses.<br />

SEPARATING COMPUTE AND STORAGE<br />

FOR EFFICIENCY<br />

Over the years, there have been many<br />

discussions around the detachment of<br />

compute from storage within the industry,<br />

primarily due to its contribution to<br />

enhancing efficiency, which resulted in<br />

several advantages.<br />

Firstly, the reduction in raw storage costs<br />

was so significant that they practically<br />

disappeared from IT budget spreadsheets.<br />

Secondly, compute costs became<br />

segregated, leading to customers paying<br />

only for what they utilised during data<br />

processing, which lowered overall expenses.<br />

30 <strong>Nov</strong>ember/<strong>Dec</strong>ember <strong>2023</strong> www.document-manager.com<br />

@<strong>DM</strong>MagAndAwards

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!