smart timely decisions at scale

julitojul101

6013BAMJR

A Principled Technologies report: Hands-on testing. Real-world results.

Modernize your SAS analytics infrastructure to get

smart, timely decisions at scale

Handle many concurrent applications, jobs, and users with the

new Dell EMC DSSD D5 rack‐scale flash appliance and Intel Xeon

processor‐powered Dell EMC PowerEdge R730 servers running SAS 9

Organizations looking to accelerate analytics capabilities need high performance at scale. Storage

appliances with excellent bandwidth offer high throughput and concurrency (the ability to support

many streams of analysis simultaneously) to meet this demand. These storage appliances can

complement existing datacenter architectures as data usage and analytics consumption grow.

In our testing, the Dell Technologies and SAS ® solution ran hundreds of analytics jobs

simultaneously as we scaled from one to four to eight Dell EMC PowerEdge R730 servers

powered by Intel ® Xeon ® processors E5-2699 v4. Not only that, the Dell EMC ® DSSD D5

appliance achieved impressive bandwidth, moving data at a peak speed of 22.8 GB per second and

achieving a SAS CPU/real-time ratio greater than 1.0 at every step in our scaling, indicating that the

CPU was almost never waiting on the storage (for more information on the CPU/real-time ratio, see

Appendix A). This scalable, powerful storage solution can help your organization take advantage of its

current SAS investment while accelerating data‐driven decisions drawn from speedy data exploration,

an expanded base of analytics users, and the ability to run more simultaneous analytics jobs.

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016


Fast data flow to and from the DSSD D5 rack-scale flash appliance

According to EMC ® , “The Dell EMC DSSD D5 rack-scale flash appliance was specifically designed to bridge the legacy

performance gap between compute and storage. DSSD D5 provides workloads like SAS with unprecedented storage

performance that can increase the speed, scale, and value of critical applications. DSSD D5 can help you run more

analyses, work with transactional data, and leverage larger data sets to generate business insights faster and improve

your bottom line. DSSD D5 allows customers to:

• Accelerate high-performance, analytic-intensive workloads that require large working data sets or predictable

ultra-low response times

• Decrease false positives with higher frequency analysis on more granular data

• Enable more business users to make analytic-driven decisions

• Consolidate SAS marts to deliver a single analytics solution for

all roles”

Accelerate data flow for business growth

and goals

Data access has to be as efficient as possible to deliver fast analytics. If storage resources can’t handle high

concurrency and keep data accessible as you add analysis jobs, then gaining valuable insights will take longer. If

your infrastructure can analyze data quickly with fast storage, however, your business can see a host of benefits,

including fast decision-making, thorough analyses, and development of a smarter, more innovative organization.

Finance: Making smarter business decisions

Consider a financial services company that offers loans and credit. In addition to complying with strict financial

regulations from federal and state governments, the company must understand the risk involved in each loan

and the overall health of the business at any given time.

This company takes in an enormous amount of transactional and customer data every day. Effectively mining

that data would help them improve everyday lending practices, understand market trends, and plan for

the future. For that reason, the company could benefit from incorporating the Dell EMC DSSD D5 storage

appliance into their datacenter to support a SAS 9 environment. The appliance’s ability to support scalability,

concurrent analysis, and CPU/real-time ratios greater than 1.0 when processing SAS workloads could enable the

organization to:

• Create more higher-quality forecasts that can help them predict growth opportunities, minimize losses, and

respond faster to market trends

• Quickly score their transactions to find and stop fraud faster and more effectively, meet compliance

requirements, and avoid fines

• Make timely, smart decisions about their customers’ loans and defaults to help grow the company and

increase profits

With the Dell EMC DSSD D5, the company could gain new insights to help them make better decisions, comply

with regulations, and evaluate risks while growing the business and increasing profit margins.

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 2


Retail: Selling to customers more effectively

Consider a modern retail corporation that has jumped into many digital ventures over the past few years and

uses SAS software to analyze its data. Focused on growing its customer base, the sales, marketing, and customer

service departments have been using applications that generate significant volumes of data on customers and

their transactions. As a result, the business’ various data stores are increasing quickly.

Continuing to rely solely on traditional storage could create a bottleneck that would slow the retailer’s decisionmaking

about its brand, customer experience, and even inventory, in turn affecting growth. Instead, the company

could leverage the Dell EMC DSSD D5 storage appliance to support its SAS 9 infrastructure. Powering SAS

analytics, the appliance could enable the business to take advantage of its quickly growing data stores to:

• Gain a more complete understanding of their customers to personalize communication with them and sell

to them more effectively

• Optimize their product pricing and distribution and react to changes in the market quickly

• Use demand forecasting to optimize marketing and promotions, ultimately creating a more profitable

relationship with their customers

The Dell EMC DSSD D5 appliance can help the corporation fully take advantage of its fast-growing stores of

data. At the same time, with its large datacenter, the company can rely on their other hardware to perform SAS

Enterprise Grid functions such as high availability, disaster recovery, and more.

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 3


Public agencies: Utilizing data to help communities

Envision a public agency that must address citizens’ concerns about a wide variety of issues that affect their

quality of life. The agency takes in vast amounts of data from sources throughout the community, including crime

data, tax records, transportation data, economic data, and more. As its region grows, the agency will continue to

amass more data.

It uses SAS for data analysis, but traditional storage is limiting how much it can process—and in turn, how well

the agency can help its citizens. By incorporating the Dell EMC DSSD D5 storage appliance to support SAS 9,

the agency can gain higher throughput, support more concurrent analysis jobs, and scale by adding servers.

Based on these advantages, the appliance could enable the agency to:

• Increase public safety by planning informed, effective ways to address criminal activity

• Quickly detect and avoid fraud in procurement, taxes, health care, unemployment, benefits, and more

• Find new ways to address its community’s needs by analyzing the newest data around public sentiment and

citizen engagement

By utilizing the Dell EMC DSSD D5, the public agency can find new ways to enhance its community’s quality of

life, with the knowledge that they’re using up-to-date information at the base of their solutions.

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 4


Get a shorter time to insight with these advantages

Make more analytics-based decisions with near-linear

scaling of throughput

We tested the Dell EMC DSSD D5 storage appliance with

a SAS analytic workload running on Dell EMC PowerEdge

R730 servers powered by Intel Xeon processors E5‐2699

v4 and hosting SAS 9 instances. We first created a relative

performance baseline using a weighted average of

throughput (a measure of how much data a system can

move) for the single R730 as reported by the operating

system. We weighted the average based on the number

of SAS processes running at each 15-second measurement

interval. Based on that, the four R730 servers delivered

3.96 times the average throughput; in the final solution

configuration featuring eight R730 servers, we saw 7.82

times the average throughput. For more information on

how we calculated results, see Appendix A.

Analyze more data at the same time with

fast throughput

We also looked at peak throughput reported by the

storage as we added servers to the solution. We saw a

peak throughput of 22.8 GB/s with eight R730 servers.

The aggregate throughput of the DSSD D5, and thus the

completed SAS jobs, increased at each step and enabled

the analysis of larger amounts of data in the same period.

Shorten analysis time by reducing bottlenecks

According to SAS, infrastructure relying on traditional

storage appliances struggles to achieve a CPU/real-time

ratio close to 1.0 (for more information on CPU/real-time

ratio, see Appendix A). Achieving a ratio greater than 1.0

is a challenge for most currently available storage systems.

These levels of CPU/real-time ratio usually require multiple

racks of storage, which occupy valuable datacenter space

and increase power consumption.

Backed by the 5U DSSD D5 appliance, the solution

achieved a CPU/real-time ratio greater than 1.0 in all three

of our configurations. The single-server configuration

achieved a ratio of 1.12; the four-server configuration

achieved a ratio of 1.10; and the eight‐server configuration

achieved a ratio of 1.04. These CPU/real-time ratios

demonstrate the appliance’s ability to support multiple

nodes of high-performing SAS instances, each

simultaneously running jobs.

Support concurrent analytics jobs in parallel

Running more SAS 9 instances by adding Intel Xeon

processor-powered PowerEdge R730 servers to

the configuration allowed the Dell EMC DSSD D5

appliance‐based solution to support more concurrent

jobs running in parallel. Each server ran one instance

of SAS 9. Each instance ran the SAS analytic workload,

which executed 102 total jobs. The storage effectively

maintained job runtimes as we scaled to 816 concurrent

jobs. Even in the highest-scaled configuration, the wall

clock time of the entire workload stayed within one

percent of the original baseline test run.

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 5


Gain insight into your business with SAS

According to SAS, “SAS helps businesses answer important data-driven questions related to customer data or

sales analysis, supply chain monitoring, fraud detection, and more. Specifically, SAS delivers the following to

your business:

• Tested algorithms to analyze past and near real-time data, and to predict future data and trends

• Varied approaches to analyzing your data including analysis of variance, categorical data analysis,

regression, psychometric analysis, survival analysis, cluster analysis, mixed-models analysis, survey data

analysis, and nonparametric analysis

• Customized output including task-specific graphics and analytical-style maps, charts, and graphs

The SAS 9 architecture uses multicore technologies to deliver processing capabilities through in-database

analytics. On-site SAS deployments and private and public cloud environments alike could benefit from the SAS

9 architecture.”

The Dell EMC PowerEdge R730 server

According to Dell, the Dell EMC PowerEdge R730 is a “mainstream 2S/2U rack server that delivers highly

functional flexibility to datacenter customers. The combination of powerful processors, large memory, and I/O

flexibility gives the R730 the versatility to perform well in a number of demanding application environments. [It

was named] server of the year by CRN for 2014.” 1

Testing the storage and servers with the SAS analytic workload

We used this SAS-provided tool to stress the DSSD D5 and R730 solution. It is a mixed-analytics scenario

consisting of 102 jobs per node in a graduated job release. During peak workload, up to 53 jobs ran at once.

SAS designed the workload to simulate real-world SAS batch users and SAS interactive users. The only exception

is during initial ramp up and ramp down periods at the beginning and end of execution.

Some periods were heavier than others during execution, which simulates a typical work environment when

some periods are batch-only jobs and other periods consist of batch and interactive jobs consecutively. This

ultimately simulates a real-world workload environment with peaks and valleys in system resource utilization.

1 http://www.crn.com/slide-shows/components-peripherals/300075002/the-crn-test-centers-2014-products-of-the-year.

htm/pgno/0/5

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 6


Conclusion

Accelerating data flow in your organization can speed up and effectively scale analytics, thus shortening time to

insight and helping your business grow and achieve its goals. Your business could see the following benefits:

• Becoming a more data-driven company with faster decision-making

• Broader and deeper analytics by analyzing more data, more frequently, and at more granular levels

• An expanded base of concurrent analytics users making up a larger share of your organization

The data your business uses will continue to accumulate from sources such as transactional systems, applications,

and customers. To continue running SAS software for insightful analytics of this data, your datacenter could

benefit from fast storage capable of supporting an increase in servers and multiple SAS jobs running at once.

The Dell EMC DSSD D5 let us scale a SAS workload from one to four to eight Intel Xeon processor-powered Dell

EMC PowerEdge R730 servers with minimal slowdown. This meant more SAS instances running simultaneously,

with each instance running multiple concurrent jobs as well. Eight SAS instances delivered 7.82 times the

average throughput of a single instance. Ultimately, the DSSD D5 solution let data flow quickly, providing up to

22.8 GB/s in throughput, and maintained a CPU/real-time ratio above 1.0 for every configuration we tested.

Adding the Dell EMC DSSD D5 appliance for SAS analytics in

your datacenter can help produce insights quickly to enable

your business to focus on growth.

According to Dell Technologies, “Dell and EMC are

positioned to offer a broad range of pre-configured solutions,

built with Dell servers and the Dell EMC DSSD D5, that

support a variety of applications, including SAS.”

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 7


On July 28, 2016, we finalized the hardware and software configurations we tested. Updates for current and

recently released hardware and software appear often, so unavoidably these configurations may not represent

the latest versions available when this report appears. For older systems, we chose configurations representative

of typical purchases of those systems. We concluded hands-on testing on August 2, 2016.

Appendix A – How we calculated results

Scaling throughput

To show how throughput scaled, we applied a weighted average to the throughput the operating

system reported.

We weighted the average by the number of SAS processes reported at each 15-second interval.

We then normalized the throughput weighted average to demonstrate near linear scaling. We normalized the

amplitude of scaling to the single-server configuration.

CPU/real-time ratio

The CPU/real-time ratio divides the total CPU time (user and system) by the total elapsed time to complete every

job in the workload. We report the average CPU/real-time ratio of the servers for each level of scaling.

Some SAS procedures are threaded, so jobs can actually use more CPU cycles than real time. A CPU/real-time

ratio less than 1.0 indicates that the CPU is waiting on resources to finish processes. In most of those cases where

the ratio is less than 1.0, the CPU is waiting on I/O resources from storage. Our solution delivered average CPU/

real-time ratios greater than 1.0 at each level of scaling.

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 8


Appendix B – What we tested

The table below provides detailed configuration information for the server under test.

Server configuration information

8 x Dell EMC PowerEdge R730

BIOS name and version Dell 2.17

Non-default BIOS settings

Operating system name and version/build number

Hyperthreading disabled

Red Hat ® Enterprise Linux ® 7.2 x64 3.10.0-327.18.2.el7.x86_64

Date of last OS updates/patches applied 07/06/16

Power management policy

Performance

Processor

Number of processors 2

Vendor and model

Intel Xeon E5-2699 v4

Core count (per processor) 22

Core frequency (GHz) 2.20

Stepping 1

Memory module(s)

Total memory in system (GB) 384

Number of memory modules 8/8

Vendor and model

Samsung ® M393A4K40BB1-CRC/Hynix HMA42GR7MFR4N-TF

Size (GB) 32/16

Type

PC4-2400T/PC4-2133T

Speed (MHz) 2,400/2,133

Speed running in the server (MHz) 2,133

Storage controller

Vendor and model

Cache size

Dell PERC H730p Mini

2GB

Firmware version 25.4.0.0017

Driver version

06.807.10.00-rh1

Local storage

Number of drives 2

Drive vendor and model

Seagate ® ST300MM0006

Drive size (GB) 300

Drive information (speed, interface, type)

10K, 6Gb SAS, HDD

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 9


Server configuration information

8 x Dell EMC PowerEdge R730

Network adapter

Vendor and model

Number and type of ports

Broadcom ® Gigabit Ethernet BCM5720

4 x 1GbE

Driver version 3.137

Additional card

2 x Client PCIe DSSD HSX-2 cards

Cooling fans

Vendor and model

NMB ® 06038DA-12S-E2H

Number of cooling fans 6

Power supplies

Vendor and model

Dell L750E-S0

Number of power supplies 2

Wattage of each (W) 750

The table below provides detailed configuration information for the storage solution.

Storage configuration information

Controller firmware revision

Vendor and model

201602.3.0-7+670dbcd.R

Number of storage controllers 2

Number of storage shelves 1

Number of drives per shelf 36

Drive vendor and model number

DSSD D5 Flash Module

Drive size (TB) 4.6

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 10


Appendix C – How we tested

In our testing, we created and ran scripts which executed the SAS-provided workload, dropped caches, and

cleaned up old test results. Next, we initiated performance-monitoring tools on the servers through our scripts.

When we tested multiple servers, we started the SAS workload on every server simultaneously. We waited for the

SAS workload scripts to finish running, and then stopped all performance monitoring. We then transferred those

results and parsed the data.

Installing Red Hat Enterprise Linux 7.2

1. Connect the installation media to the server. We used the virtual optical drive available on all servers’ out-of-band

management consoles.

2. Boot to the installation media.

3. At the splash screen, select Install Red Hat Enterprise Linux 7.2, and press Enter.

4. For language, choose your desired language, and click Continue. We chose English (United States).

5. At the Installation Summary screen, configure the Date & Time to match your time zone.

6. Set software-selection to Minimal Install.

7. Set the Installation Destination to Automatic partitioning.

8. Configure the Network & Hostname for your testing network.

9. Click Begin Installation.

10. During the installation process, set the Root Password. We elected not to create another user during installation.

11. Once the installation is completed, disconnect the installation media, and click Reboot.

Configuring the Dell EMC PowerEdge R730

Run the following commands for each process.

Installing updates and additional packages

subscription-manager register

subscription-manager list --available --all

subscription-manager attach --pool=

subscription-manager repos --disable=*

subscription-manager repos --enable=rhel-7-server-rpms --enable=rhel-7-serveroptional-rpms

--enable=rhel-7-server-extras-rpms

yum install -y deltarpm

yum install -y kernel-3.10.0-327.18.2.el7.x86_64 kernel-tools-3.10.0-

327.18.2.el7.x86_64 kernel-tools-libs-3.10.0-327.18.2.el7.x86_64

yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.

noarch.rpm

yum install -y crudini

crudini --set /etc/yum.conf main exclude “kernel* redhat-release*”

yum update -y

yum install -y chrony time xfsprogs tuned numactl wget vim nfs-utils opensshclients

man zip unzip numactl ipmitool OpenIPMI sysstat bc pigz lzop

reboot

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 11


Installing nmon

wget https://sourceforge.net/projects/nmon/files/nmon16e_x86_rhel72/download -O

/usr/local/bin/nmon

chmod 755 /usr/local/bin/nmon

Disabling SELINUX

setenforce 0

sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/’ /etc/selinux/config

Disabling the firewall

systemctl stop firewalld

systemctl disable firewalld

Synching the time

sed -i ‘/server .*/d’ /etc/chrony.conf

echo ‘server 10.41.0.5 iburst prefer’ >> /etc/chrony.conf

systemctl restart chronyd

systemctl enable chronyd

Creating users

groupadd -g 500 sas

useradd -u 500 -g 500 sasdemo

useradd -u 400 -g 500 sas

echo ‘export PATH=$PATH:/usr/local/SASHome/SASFoundation/9.4’ >> /home/

sasdemo/.bashrc

echo ‘export ASUITE=/dssd/sasdata/asuite’ >> /home/sasdemo/.bashrc

echo ‘export PATH=$PATH:/usr/local/SASHome/SASFoundation/9.4:/opt/dssd/bin’ >>

/root/.bashrc

echo ‘export ASUITE=/dssd/sasdata/asuite’ >> /root/.bashrc

echo “sasdemo ALL=(root) NOPASSWD:ALL” | tee -a /etc/sudoers.d/sasdemo

chmod 0440 /etc/sudoers.d/sasdemo

passwd sasdemo

passwd sas

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 12


Installing DSSD client software

unzip client_media.zip

cd client_media/

./install-client.sh

/opt/dssd/bin/fw-update -u

echo “options vpci vpci_sgl_enable=2 vpci_nvme_thread_policy=2” > /etc/

modprobe.d/dssd.conf

Enabling PCIe AER

sed -i ‘s/rhgb quiet/hest_disable=y/’ /etc/default/grub

grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

reboot

Enabling DSSD block service

crudini --set --format=sh /etc/sysconfig/dssd-blkdev ‘’ DSSD_BLKDEV_VOLUME_NAME

\”sasvol_$(hostname -s)\”

crudini --set --format=sh /etc/sysconfig/dssd-blkdev ‘’ DSSD_BLKDEV_LOGFILE_

NAME \”/var/log/dssd-blkdev.log\”

systemctl restart dssd

systemctl restart dssd-blkdev

Creating, formatting, and mounting block devices

DATAROOT=/dssd

FSOPTS=”defaults,noatime,nodiratime,discard,inode64,_netdev”

mkdir -p ${DATAROOT}/{sasdata,saswork,utilloc}

systemctl stop dssd-blkdev

flood create -V sasvol_$(hostname -s) -t block -F 4096 -l 2T sasdata

flood create -V sasvol_$(hostname -s) -t block -F 4096 -l 3T saswork

flood create -V sasvol_$(hostname -s) -t block -F 4096 -l 2T utilloc

sleep 3

systemctl start dssd-blkdev

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 13


sleep 30

mkfs.xfs -f -l size=2037m,lazy-count=1 -i

size=2048,align=1,attr=2,projid32bit=0 -L sasdata /dev/dssd0000

mkfs.xfs -f -l size=2037m,lazy-count=1 -i

size=2048,align=1,attr=2,projid32bit=0 -L saswork /dev/dssd0001

mkfs.xfs -f -l size=2037m,lazy-count=1 -i

size=2048,align=1,attr=2,projid32bit=0 -L utilloc /dev/dssd0002

echo -e “LABEL=”sasdata”\t${DATAROOT}/sasdata\txfs\t${FSOPTS}\t0 0” >> /etc/

fstab

echo -e “LABEL=”saswork”\t${DATAROOT}/saswork\txfs\t${FSOPTS},nobarrier\t0 0”

>> /etc/fstab

echo -e “LABEL=”utilloc”\t${DATAROOT}/utilloc\txfs\t${FSOPTS},nobarrier\t0 0”

>> /etc/fstab

mount -v ${DATAROOT}/sasdata

mount -v ${DATAROOT}/saswork

mount -v ${DATAROOT}/

chown sasdemo:sas ${DATAROOT}/*

Creating and applying a tuned profile

cp -rp /usr/lib/tuned/throughput-performance /usr/lib/tuned/sas-performance

cat


[disk]

devices=!dm-*

EOF

tuned-adm profile sas-performance

reboot

This project was commissioned by Dell, EMC, and SAS.

Principled Principled

Facts matter.®

Technologies ®

Technologies ®

Facts matter.®

Principled Technologies is a registered trademark of Principled Technologies, Inc.

All other product names are the trademarks of their respective owners.

DISCLAIMER OF WARRANTIES; LIMITATION OF LIABILITY:

Principled Technologies, Inc. has made reasonable efforts to ensure the accuracy and validity of its testing, however, Principled Technologies, Inc.

specifically disclaims any warranty, expressed or implied, relating to the test results and analysis, their accuracy, completeness or quality, including any

implied warranty of fitness for any particular purpose. All persons or entities relying on the results of any testing do so at their own risk, and agree that

Principled Technologies, Inc., its employees and its subcontractors shall have no liability whatsoever from any claim of loss or damage on account of any

alleged error or defect in any testing procedure or result.

In no event shall Principled Technologies, Inc. be liable for indirect, special, incidental, or consequential damages in connection with its testing, even if

advised of the possibility of such damages. In no event shall Principled Technologies, Inc.’s liability, including for direct damages, exceed the amounts paid in

connection with Principled Technologies, Inc.’s testing. Customer’s sole and exclusive remedies are as set forth herein.

Modernize your SAS analytics infrastructure to get smart, timely decisions at scale September 2016 | 15

Similar magazines