DM Jul-Aug 2021

More documents

Recommendations

Info

Dm TECHNOLOGY: STORAGE Unstructured data: where to store it next Aron Brand, CTO of CTERA, explains the key differences between distributed file systems and object storage for organisations looking for the optimum solution for their unstructured data requirements Unstructured data is growing rapidly: Gartner estimates that over the next four years, enterprises will triple the volume of unstructured data they hold. Many organisations are therefore adopting Distributed File Systems (DFS) and object storage solutions to scale linearly in an economical manner whilst meeting their performance and capacity requirements. This article aims to explore the differences between object storage and DFS, including the two flavours of DFS solutions. SPOT THE DIFFERENCE Both DFS and object storage distribute data over multiple nodes in order to provide self-healing and linear scaling in capacity and throughput. This, however, is where the similarities end. There are three main differences between the two unstructured data storage solutions: 1. In a file system, files are arranged in a hierarchy of folders, while in object storage systems, objects are arranged in flat buckets in a similar way to a "key value store". 2. File systems are designed to allow for random writes anywhere in the file. Object storage systems only allow atomic replacement of entire objects. 3. Object storage systems provide eventual consistency. Depending on the vendor, DFS can support strong consistency or eventual consistency, which will be discussed further below. THEORETICAL TURNS PRACTICAL While both object storage and DFS are well-suited for storing substantial volumes of unstructured data, they suit different use cases. As object storage exposes a Representational State Transfer (REST) API, it is only suitable for applications that are specifically intended to interact with this type of storage. In contrast, DFS expose a traditional filesystem API which is suitable for any application, including legacy applications that work over a hierarchical filesystem. DFS provide a deeper and more general-purpose interface to applications, allowing them to perform certain activities that are not suitable for object storage. Examples of these operations include acting as the backend for a database, or handling workloads that are heavy on random reads/writes. In contrast, object storage is more suited to serving as a repository or archive for enormous volumes of 10 @DMMagAndAwards July/August 2021 www.document-manager.com
TECHNOLOGY: STORAGE Dm "WHILE BOTH OBJECT STORAGE AND DFS ARE WELL-SUITED FOR STORING SUBSTANTIAL VOLUMES OF UNSTRUCTURED DATA, THEY SUIT DIFFERENT USE CASES. AS OBJECT STORAGE EXPOSES A REST API, IT IS ONLY SUITABLE FOR APPLICATIONS THAT ARE SPECIFICALLY INTENDED TO INTERACT WITH THIS TYPE OF STORAGE. IN CONTRAST, DFS EXPOSE A TRADITIONAL FILESYSTEM API WHICH IS SUITABLE FOR ANY APPLICATION, INCLUDING LEGACY APPLICATIONS THAT WORK OVER A HIERARCHICAL FILESYSTEM. DFS PROVIDE A DEEPER AND MORE GENERAL-PURPOSE INTERFACE TO APPLICATIONS, ALLOWING THEM TO PERFORM CERTAIN ACTIVITIES THAT ARE NOT SUITABLE FOR OBJECT STORAGE." huge files and is less expensive per gigabyte than DFS. IT teams considering implementing a DFS for their unstructured data must decide between two different types: clustered or federated. CAP THEOREM As mentioned above, DFS might support strong or eventual consistency. This is where computer science theory comes in, as a distributed data store can have no more than two out of three properties according to CAP theorem. These three properties are: Consistency: Every read receives the most recent write or an error Availability: Every request receives a (non-error) response - without the guarantee that it contains the most recent write Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes As a result, there are two types of DFS currently available: CLUSTERED DFS Clustered DFS are made up of a closely connected cluster of nodes. They focus strictly on data consistency and are especially suitable for large-scale computing use cases at the enterprise core, such as big data analytics, highperformance computing, or databases. The consistency and availability aspects of CAP theorem are the subject of clustered DFS. But strong consistency assurances do not come cheap. They impose significant constraints on system operation and performance, especially when nodes are separated by high latency or unreliable connections. FEDERATED DFS The goal of federated DFS is to make data available over long distances while maintaining partition tolerance. Federated DFS are well-suited for weakly linked edge-to-cloud use cases, including unstructured data storage and management for remote and branch offices. Federated DFS focus on the availability and partition tolerance properties of the CAP theorem, rather than the strict consistency guarantee. In federated DFS, read and write operations on an open file are routed to a locally cached copy. When a modified file is closed, the modified sections are copied back to a central file server from the edge. Update conflicts may arise, and are resolved automatically. It could be claimed that federated DFS combines the semantics of a file system with the eventual consistency model of object storage. In this way federated DFS are optimised for use cases including archiving, backup, media libraries, mobile data access, content distribution to edge locations, content ingestion from edge to cloud, remote and branch office storage, and hybrid cloud storage. Both clustered and federated DFS have applications in the enterprise. To reap the full benefits of a DFS, IT teams must be familiar with how clustered and federated DFS differ in order to choose the option most suited to their application requirements. The market is undergoing a significant shift towards DFS and object storage, at the same time as organisations are looking for more efficient methods to not only cope with but thrive from the explosion of unstructured data. The optimal decision, whether object storage, clustered or federated DFS, or a combination, lies in careful consideration of the organisations' requirements and use cases. More info: www.ctera.com www.document-manager.com July/August 2021 @DMMagAndAwards 11
Page 1 and 2: DOCUMENT M A N A G E R Dm www.docum
Page 3 and 4: Dm COMMENT Editor: Dave Tyler david
Page 5 and 6: The multi-award winning solution fr
Page 7 and 8: INTERVIEW: HYLAND Dm time. As we go
Page 9: MANAGEMENT: DIGITAL MAILROOM Dm "NO
Page 14 and 15: Dm EVENT: DM AWARDS 2021 Back where
Page 16 and 17: Dm MARKET FOCUS: HEALTHCARE Protect
Page 18 and 19: Dm INDUSTRY FOCUS: LOGISTICS Keepin
Page 20 and 21: Dm COMPANY UPDATE: DOCUWARE Collabo
Page 22 and 23: Dm CASE STUDY: CAMDEN CENTRE FOR LE
Page 24 and 25: Dm TECHNOLOGY: FAX Why fax has a ro
Page 26 and 27: Dm CASE STUDY: NORSE GROUP The smar
Page 28 and 29: Dm OPINION: FLEXIBLE WORKING Prepar
Page 30: Dm PRODUCT FOCUS Omnidox 5 Remarkab
Page 33 and 34: CASE STUDY: WEAVER & BOMFORDS Dm "W
Page 35 and 36: AIIM FORUM EUROPE INTELLIGENT INFOR

DM Jul-Aug 2021

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?