ST Nov-Dec 2023

More documents

Recommendations

Info

Dremio 16.qxd 01-Dec-23 10:58 AM Page 2 MANAGEMENT: DATA ARCHITECTURE af EMBRACING OPEN DATA ARCHITECTURE MATT PEACHEY, VICE PRESIDENT, INTERNATIONAL AT DREMIO, ARGUES THAT OPEN IS THE SMART WAY FORWARD FOR DATA MANAGEMENT For the past decades, data has been propelling business operations. Whether an organisation offers tangible goods or intangible services, crucial information about partners, workforce, processes, and clients forms the backbone of a company's wellbeing. At the heart of any computing system is data storage, therefore the selection of the appropriate solution to store it will significantly impact how efficiently an organisation's network and accompanying infrastructure cater to the business requirements. The primary expectation from a data storage system is to safely keep valuable data while allowing users and applications to retrieve it seamlessly and swiftly when required. However, with the volume of data growing exponentially and never deleted, businesses started to add more storage capacity. The issue deepens where data warehouse vendors store data in a proprietary format. Data gets locked into the platform, making it difficult and costly to extract if and when a business wants to. Further, maintaining and troubleshooting issues often require teams with deep subject matter expertise in the ecosystem - an expensive outlay. Given the multitude of data storage alternatives and system setups, organisations can get dragged into the rabbit hole whilst adding more data to their systems - a very inefficient approach. Organisations must embrace open-source standards, technologies and formats to ensure fast and cost-effective analytics with the best engine for each workload. This provides the agility to innovate with the next wave of technology without draining resources or time. EVOLUTION OF DATA ARCHITECTURE Previously, companies depended on conventional databases or warehouses for their Business Intelligence (BI) demands. However, these systems presented certain difficulties. The typical data warehouse setup requires investing in expensive on-premises hardware, maintaining structured data in proprietary formats, and dependence on a centralised IT and data department for analysis. Other obstacles included technical interoperability, system orchestration, and, more critically, scalability. However, things changed in 2006 with the launch of Hadoop, built on the Map-Reduce paradigm capable of parallel processing and producing enormous data sets over large clusters of commoditised hardware. This framework facilitated handling vast datasets distributed over computer clusters, making it immensely appealing for businesses accumulating more data with each passing day. Still, databases like Teradata and Oracle encapsulated storage, computation, and data within a single, interconnected system, offering no separation of compute and storage components. Between 2015 and 2020, however, the widespread usage of the public cloud altered this approach, enabling the separation of compute and storage. Cloud data vendors like AWS and Snowflake facilitated this separation in cloud warehouses, enhancing scalability and efficiency. Nevertheless, data still had to be ingested, loaded, and duplicated into a single proprietary system, which was attached to a solitary query engine. Employing multiple databases or data warehouses necessitated the storage of multiple data copies. Moreover, companies were still charged for transferring their data into and out of the proprietary system, which resulted in excessive costs. Enter more contemporary and open data architecture, where data exists as an independent layer. This includes highlighting a clear division between data and compute. Data is stored in open-source file formats and table formats and accessed by decoupled and elastic compute engines. Consequently, different engines can access the same data in a loosely tied architecture. In these architectures, data is stored as its own independent tier source in open formats within the company's cloud account and made accessible to downstream consumers through various services. This transformation parallels the shift in applications from monolithic architectures to microservices. A comparable transition is presently occurring in data analytics, with companies migrating from proprietary data warehouses and ceaseless ETL (Extract, Transform, Load) processes to open data architectures like cloud data lakes and lakehouses. SEPARATING COMPUTE AND STORAGE FOR EFFICIENCY Over the years, there have been many discussions around the detachment of compute from storage within the industry, primarily due to its contribution to enhancing efficiency, which resulted in several advantages. Firstly, the reduction in raw storage costs was so significant that they practically disappeared from IT budget spreadsheets. Secondly, compute costs became segregated, leading to customers paying only for what they utilised during data processing, which lowered overall expenses. Lastly, the independent scalability of both storage and compute facilitated ondemand, elastic resource precision, adding flexibility to architecture designs. However, these changes took time to materialise. Expensive Storage Area Networks (SANs) and less costly but often complex Network Attached Storage (NAS) systems have existed for quite a while. Both storage models were limited due to administrative and 16 STORAGE Nov/Dec 2023 @STMagAndAwards www.storagemagazine.co.uk MAGAZINE
Dremio 16.qxd 01-Dec-23 10:58 AM Page 3 MANAGEMENT: DATA ARCHITECTURE "Decoupling compute and storage in public clouds is more straightforward to administer and relatively inexpensive. Besides, these compute and storage cloud services are virtually unlimited in scalability, eliminating legacy hardware procurement issues. They also offer supreme levels of availability and performance." procurement overheads. Mass adoption of separating compute and storage only became feasible with public cloud computing. Decoupling compute and storage in public clouds is more straightforward to administer and relatively inexpensive. Besides, these compute and storage cloud services are virtually unlimited in scalability, eliminating legacy hardware procurement issues. They also offer supreme levels of availability and performance. Therefore, the separation of compute from data brings forth three immediate benefits: A significant reduction in complicated and expensive data copies and movements as the data warehouse as the sole source of truth gets replaced by accessing data in open formats in the data lake, eliminating data silos. Open data standards and formats provide universal data access from infinite services and applications, creating the freedom to pick the best solutions. An open architecture ensures that future cloud services can directly access the data, avoiding going through a data warehouse vendor's proprietary format or moving/copying data from the data warehouse. THE OPPORTUNITIES OF OPEN ARCHITECTURE Cloud data warehouse providers enticed firms with the allure of scalability and cost-efficiency that was unsustainable with on-premises solutions. However, after uploading their data into the warehouse, organisations were restricted entirely to the vendor's ecosystem or denied access to other promising technologies that could extract more value from their data. Open architecture is a significant advantage of cloud data lake/lakehouse over the data warehouse. As a result, organisations are reassessing their strategies to use an open architecture that promotes flexibility and reestablishes ownership of their data. This shift signifies three things: The flexibility to utilise various superior services and engines on the company's data. This allows the use of diverse technologies like superior SQL, Databricks or any other data-processing tool. Given that companies have numerous use cases and requirements, utilising the best-suited tool yields higher productivity - especially for data teams - and lower cloud costs. It's also important to remember that no single vendor can offer all the processing capabilities a company requires. Not being confined to one vendor. Platform changes become profoundly challenging when dealing with a data warehouse holding up to a million tables and hundreds of complex ingestion pipelines. Comparatively, if an organisation uses a superior SQL on its cloud data lake today and a new tool emerges tomorrow, it's possible to query the existing data with the new system without migrating it. The ability to benefit from future technological advancements. Avoiding becoming locked-in is crucial, as it keeps vendors from exploiting a company financially. But more significant is the capacity to adopt and benefit from emerging technology, even if the current vendor remains favourable. If a superior machine learning service or a better batch processing engine is invented, organisations can have peace of mind that they can use the tool freely. Application architectures have demonstrated that a service-oriented approach allows maximum scale, flexibility, and agility. While separating compute and storage marked an essential first step in reducing analytic costs, it doesn't offer the kind of benefits visible in modern application architectures. However, by disengaging compute from data, the benefits of application design can now be used for data analytics, especially given the critical importance of data for all businesses. As a result, open data architecture brings forth many benefits, from flexibility, independence, and future-proofing to creating new avenues for gaining valuable business insights. In the rapidly evolving digital era, embracing open data architectures is more than a strategic choice; it's a decisive move towards a more flexible, scalable, and insightful future. More info: www.dremio.com www.storagemagazine.co.uk @STMagAndAwards Nov/Dec 2023 STORAGE MAGAZINE 17
Page 1 and 2: Front Cover 1.qxd 30-Nov-23 12:35 P
Page 3 and 4: ST Contents NovDec.qxd 01-Dec-23 10
Page 5 and 6: DON’T SaaSSS GET YOUR KICKED! ! T
Page 7 and 8: CAE 6.qxd 01-Dec-23 10:53 AM Page 3
Page 9 and 10: Veeam 09.qxd 01-Dec-23 10:54 AM Pag
Page 11 and 12: Quantum 10.qxd 01-Dec-23 10:55 AM P
Page 13 and 14: DCIG 12.qxd 01-Dec-23 10:56 AM Page
Page 15: Infinidat 14.qxd 01-Dec-23 10:57 AM
Page 19 and 20: Spectra 18.qxd 01-Dec-23 10:59 AM P
Page 21 and 22: The future is here. Tiered Backup S
Page 23 and 24: Hornetsecurity 22.qxd 01-Dec-23 11:
Page 25 and 26: Roundtable 24.qxd 01-Dec-23 11:02 A
Page 27 and 28: Roundtable 24.qxd 01-Dec-23 11:02 A
Page 29 and 30: Adsignal 28.qxd 01-Dec-23 11:05 AM
Page 31 and 32: Syniti 30.qxd 01-Dec-23 11:06 AM Pa
Page 33 and 34: 11-11 32.qxd 01-Dec-23 11:07 AM Pag
Page 35 and 36: Advancing EPYC Performance and Dens

ST Nov-Dec 2023

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?