12.07.2015 Views

Performance on Sybase Devices Using Direct I/O, Dsync and Raw ...

Performance on Sybase Devices Using Direct I/O, Dsync and Raw ...

Performance on Sybase Devices Using Direct I/O, Dsync and Raw ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

WHITE PAPER<str<strong>on</strong>g>Performance</str<strong>on</strong>g> <strong>on</strong> <strong>Sybase</strong> ® <strong>Devices</strong> <strong>Using</strong><strong>Direct</strong> I/O, <strong>Dsync</strong> <strong>and</strong> <strong>Raw</strong> Partiti<strong>on</strong>DHIMANT CHOKSHISERVER PERFORMANCE ENGINEERING AND DEVELOPMENT GROUPSYBASE, INC.


TABLE OF CONTENTS1 1.0 Introducti<strong>on</strong>1 2.0 Asynchr<strong>on</strong>ous I/O2 3.0 Synchr<strong>on</strong>ous I/O2 4.0 <strong>Dsync</strong> writes3 DBCC Tablealloc test3 5.0 <strong>Direct</strong> I/O4 Reorg Rebuild test4 Create Database Test5 Summary of IO Operati<strong>on</strong>s5 6.0 Supported Platforms5 7.0 <str<strong>on</strong>g>Performance</str<strong>on</strong>g>5 8.0 C<strong>on</strong>figurati<strong>on</strong>6 9.0 Results8 10.0 C<strong>on</strong>clusi<strong>on</strong>8 C<strong>on</strong>clusi<strong>on</strong>i


1.0 INTRODUCTION<strong>Sybase</strong> Adaptive Server® Enterprise 15.0 substantially enhances a database platform already known for its superiorperformance, reliability, <strong>and</strong> low total cost of ownership (TCO). Prior to ASE 15, it was recommended that the dsyncsetting be turned <strong>on</strong> for devices initialized <strong>on</strong> UNIX operating system files. The dsync flag ensures that writes to thedevice file occur directly <strong>on</strong> the physical storage media, <strong>and</strong> Adaptive Server can recover data <strong>on</strong> the device in theevent of a system failure. This functi<strong>on</strong>ality, however, yields much slower performance during write operati<strong>on</strong>s, whencompared to using raw partiti<strong>on</strong>s.In ASE 15, the directio parameter allows you to c<strong>on</strong>figure Adaptive Server to transfer data directly to disk, bypassingthe operating system buffer cache. directio performs IO in the same manner as raw devices <strong>and</strong> provides the sameperformance benefit as raw devices (significantly better than dsync), but has the ease of use <strong>and</strong> manageability of filesystem devices.Note: The directio <strong>and</strong> dsync parameters are mutually exclusive. If a device has dsync set to “true,” you cannot setdirectio to “true” for this device. To enable directio for a device, you must first reset dsync to “false.”Following few secti<strong>on</strong>s explains Asynchr<strong>on</strong>ous, <strong>and</strong> Synchr<strong>on</strong>ous system calls, as well as direct I/O <strong>and</strong> write with<strong>Dsync</strong> opti<strong>on</strong>.2.0 ASYNCHRONOUS I/OAsynchr<strong>on</strong>ous I/O allows the process issuing the I/O to c<strong>on</strong>tinue executing while OS completes the request.Dataserver process does not block <strong>on</strong> this request till its completi<strong>on</strong>. Following figure shows how Adaptive serverworks with asynchr<strong>on</strong>ous I/O.dataserverissuewritecall to aiowrite()aiowrite() returnsO/SwritestartedPerform I/O in adifferent c<strong>on</strong>textDataserver doesother work whileI/O is in progresswritePoll for I/Ocompleti<strong>on</strong>aiowait()return I/Ostatus1


3.0 SYNCHRONOUS I/OSynchr<strong>on</strong>ous I/O blocks the I/O issuing process from c<strong>on</strong>tinuing executing while OS completes the request. Theprocess is blocked till the request is completed. This generally results in poor throughput. Following figure shows howAdaptive Server works with synchr<strong>on</strong>ous I/O.dataserverissuewritecall to write()O/SwritestartedDataserver blockswaiting for I/OPerform I/Oc<strong>on</strong>tinuewrite() returnswritecomplete4.0 DSYNC WRITESFor devices initialized <strong>on</strong> UNIX operating system files, the dsync setting c<strong>on</strong>trols whether or not writes to thosefiles are buffered. When the dsync setting is <strong>on</strong>, Adaptive Server opens a database device file using the UNIX dsyncflag. The dsync flag ensures that writes to the device file occur directly <strong>on</strong> the physical storage media, <strong>and</strong> AdaptiveServer can recover data <strong>on</strong> the device in the event of a system failure.Use of dsync opti<strong>on</strong> with database device files incurs the following performance trade-offs:• If database device files <strong>on</strong> these platforms use the dsync opti<strong>on</strong>, the Adaptive Server engine writing to the devicefile blocks until the write operati<strong>on</strong> completes. This can cause poor performance during update operati<strong>on</strong>s• Resp<strong>on</strong>se time for read operati<strong>on</strong>s is generally better for devices stored <strong>on</strong> UNIX operating system files ascompared to devices stored <strong>on</strong> raw partiti<strong>on</strong>s. Data from device files can benefit from the UNIX file system cacheas well as the Adaptive Server cache, <strong>and</strong> more reads may take place without requiring physical disk access.Following figure shows how Adaptive server works when dsync setting is turned <strong>on</strong>.dataserverissuewritecall to aiowrite()aiowrite() returnsO/SwritestartedPerform I/O in adifferent c<strong>on</strong>textDataserver doesother work whileI/O is in progressPoll for I/Ocompleti<strong>on</strong>aiowait()writecompletereturn I/OstatusOnly thing DSYNC changes is what “write complete” means.The I/O is still async from the perspective of ASE.2


Note: <strong>Dsync</strong> opti<strong>on</strong> is useful in Intel 32bit platforms where system can support up to 64G memory but <strong>on</strong>ly 4G can beaddress by ASE. In such scenario, customers can use file system to cache the data in the remaining memory not used byASE. But with DIRECTIO <strong>and</strong> Extended cache we can circumvent the problem <strong>and</strong> use the memory given to the FS by ASE.This is not applicable to 64bit platform hence <strong>Direct</strong> I/O is always the preferred way.DBCC Tablealloc testFollowing chart shows advantage of using dsync opti<strong>on</strong> <strong>on</strong> file system devices for read operati<strong>on</strong>. In this test, Dbcctablealloc comm<strong>and</strong> is executed against <strong>Sybase</strong> file system devices with direct IO, <strong>and</strong> devices with dsync opti<strong>on</strong>senabled. Dbcc Tablealloc reads <strong>and</strong> checks the pages from specified table or data partiti<strong>on</strong> to see that all the pages arecorrectly allocated <strong>and</strong> that no page that is allocated is not used. With dsync opti<strong>on</strong> enabled, resp<strong>on</strong>se time is 96sec<strong>on</strong>ds, while with direct IO opti<strong>on</strong>, resp<strong>on</strong>se time is 386 sec<strong>on</strong>ds.Sec<strong>on</strong>ds400350300250200150100500TableAlloc<strong>Dsync</strong><strong>Direct</strong> IO5.0 DIRECT I/O<strong>Direct</strong> I/O is the feature of the file system in which writes directly goes to the storage device, bypassing theoperating system buffer cache. <strong>Direct</strong> I/O is invoked by opening a file with O_DIRECT flag. On Solaris platform,directio() system call is used after the normal open() call to open file system device. The direct I/O parameter is usedwith disk init, disk reinit, <strong>and</strong> sp_deviceattr comm<strong>and</strong>s. <strong>Direct</strong> I/O performs IO in the same manner as raw devices <strong>and</strong>provides the same performance benefit as raw devices.1. RequestwriteASE DataCache3. Write returnssuccess2. Issue write<strong>on</strong> dirtyASEEngineFile <strong>on</strong> Disk1. ASE requires dirty page in ASE data cache be written written written.2. ASE issues a write <strong>on</strong> that page. Write goes directly to disk, bypassing the FS cache3. Write returns when data hits the disk.Note: <strong>Direct</strong> IO, as well as dsync opti<strong>on</strong>s should not be used for tempdb devices, as recovery is not important fortempdb.3


Following few charts shows performance improvement of various comm<strong>and</strong>s using <strong>Direct</strong> IO over <strong>Dsync</strong> opti<strong>on</strong>.Summary of all the results are shown in secti<strong>on</strong> 9.0.Reorg Rebuild testFollowing chart shows the performance of Reorg-Rebuild test <strong>on</strong> devices with <strong>Direct</strong> IO opti<strong>on</strong>, <strong>and</strong> <strong>Dsync</strong> opti<strong>on</strong>enabled. Rebuilding the index to compact the index space <strong>and</strong> improve the clustering ratio is a typical operati<strong>on</strong> inany database envir<strong>on</strong>ment. <strong>Direct</strong> IO provides 90% performance improvement when executing the reorg-rebuild <strong>on</strong> atable of 24 milli<strong>on</strong> rows with clustered index <strong>and</strong> a n<strong>on</strong>-clustered index. This performance improvement isattributable to write operati<strong>on</strong>s bypassing the operating system buffer cache.100008000Sec<strong>on</strong>ds6000400020000<strong>Dsync</strong><strong>Direct</strong> IOReorg-RebuildCreate Database TestCreate Database operati<strong>on</strong> initializes 256 pages of writes at a time, which amounts to 512k for page size 2k <strong>and</strong> upto 4 MB for page size of 16K. This allows for full use of the I/O b<strong>and</strong>width of the underlying devices. With <strong>Direct</strong> IOopti<strong>on</strong>, create database comm<strong>and</strong> for a 25 GB of database takes 7 minutes compare to 4.5 hours with <strong>Dsync</strong> opti<strong>on</strong>.2000015000Sec<strong>on</strong>ds1000050000<strong>Dsync</strong><strong>Direct</strong> IOCreate DB4


Summary of IO Operati<strong>on</strong>sAsynchr<strong>on</strong>ous I/OSynchr<strong>on</strong>ous I/O<strong>Dsync</strong> writes<strong>Direct</strong> I/OFeature/Major benefitsAllows OS c<strong>on</strong>trol/no databaseinterferingIO issuing process is blocked while OSis executing the IOWrites to the device occur directly tomedia storage. Recovery is ensured incase of failureBypasses the OS buffer cache.Provides better write performanceVersi<strong>on</strong>ASE 12.5, ASE15.0ASE 12.5, ASE 15.0ASE 12.5, ASE 15.0/UNIX <strong>on</strong>lyASE 15.06.0 SUPPORTED PLATFORMSFollowing chart shows supported platforms for <strong>Direct</strong> IO.<strong>Direct</strong> I/O <strong>Dsync</strong> <strong>Raw</strong> Partiti<strong>on</strong>Solaris SPARC 2-bit/64 bit yes yes yesIBM® AIX 64 bit yes yes yesWindows® x86 yes yes yesLinux® x86 yes yes yesHP-UX PA-RISC 64 bit No yes yes7.0 PERFORMANCETo analyze the I/O performance <strong>on</strong> various <strong>Sybase</strong> devices, TPCH benchmark suite with dbscale of 15 has been usedto populate the database. Total database size is 24 GB. Most of the comm<strong>and</strong>s used in this benchmark test are I/Ooriented.8.0 CONFIGURATION8.1 System C<strong>on</strong>figurati<strong>on</strong>Server MachineTotal MemorySun-Fire-V49016 GB# of CPUS 8 X 1050 MHzServer Software ASE 15.05


8.2 ASE 15.0 c<strong>on</strong>figurati<strong>on</strong>ASE 15.0 server is built with 4k page size. In all three tests, ASE 15.0 server c<strong>on</strong>figurati<strong>on</strong> is kept identical. Most ofthe c<strong>on</strong>figurati<strong>on</strong> parameters are kept to default value except shown in the table.C<strong>on</strong>figurati<strong>on</strong> Parameter NameValueMax memoryCache size2k Buffer Pool16k Buffer Pool4 GB3 GB1.5 GB1.5 GBDisk I/O Structures 2048Max network packet size 4096Default network packet size 4096Max async i/os per engine 2048Max async i/os per server 2048Max <strong>on</strong>line engines 4Procedure cache size 30000Number of pre-allocated extents 31Number of sort buffers 20009.0 RESULTSAll the results are measured in sec<strong>on</strong>ds. All the comm<strong>and</strong>s used in this benchmark are shown in Appendix A.9.1 <str<strong>on</strong>g>Performance</str<strong>on</strong>g> comparis<strong>on</strong> between <strong>Direct</strong> I/O <strong>and</strong> devices with dsync <strong>on</strong>Results are categorized in three secti<strong>on</strong>s. In first secti<strong>on</strong>, results of the read <strong>on</strong>ly comm<strong>and</strong>s are shown. In sec<strong>on</strong>dsecti<strong>on</strong>, results of the comm<strong>and</strong>s that perform read <strong>and</strong> write operati<strong>on</strong>s are shown. And in the third secti<strong>on</strong>, resultsof the comm<strong>and</strong>s that perform write operati<strong>on</strong>s are illustrated.9.1.1 Read operati<strong>on</strong>s Results<str<strong>on</strong>g>Performance</str<strong>on</strong>g> for read operati<strong>on</strong> is better when devices are opened with dsync opti<strong>on</strong>. Reads benefit from the UNIXfile system cache as well as the Adaptive Server cache, <strong>and</strong> more reads may take place without requiring physical diskaccess.Comm<strong>and</strong>s <strong>Dsync</strong> <strong>Direct</strong> IO Percentage Gain CommentDbcc Tablealloc 96 Sec<strong>on</strong>ds 381 Sec<strong>on</strong>ds -296% 24 milli<strong>on</strong> rows tableUpdate Statistics 249 Sec<strong>on</strong>ds 323 Sec<strong>on</strong>ds -29.72% 24 milli<strong>on</strong> rows tableNote: Resp<strong>on</strong>se time for dbcc tablealloc comm<strong>and</strong> was 373 sec<strong>on</strong>ds, when executed after clearing file system cache.Data from file system cache was cleared by rebooting the machine.6


9.1.2 Read Write operati<strong>on</strong>s ResultsComm<strong>and</strong>s <strong>Dsync</strong> <strong>Direct</strong> IO Percentage Gain CommentAlter Table 10635 Sec<strong>on</strong>ds 3969 Sec<strong>on</strong>ds 167.95% 24 milli<strong>on</strong> rows tablewith cl index, <strong>and</strong> nlindexCreate Clustered 16233 Sec<strong>on</strong>ds 3103 Sec<strong>on</strong>ds 423.14% 24 milli<strong>on</strong> rows, 4.3 GBindextable sizeCreate N<strong>on</strong> Clustered 4928 Sec<strong>on</strong>ds 2296 Sec<strong>on</strong>ds 114.63% 24 milli<strong>on</strong> rows table,index4.3 GB of sizeReorg Rebuild 9686 Sec<strong>on</strong>ds 5102 Sec<strong>on</strong>ds 89.85% Same as above9.1.3 Write operati<strong>on</strong>s resultsResults clearly dem<strong>on</strong>strates that comm<strong>and</strong>s performing <strong>on</strong>ly write operati<strong>on</strong>s to the storage device showssignificant performance gain with direct IO, as it bypasses operating system buffer cache.Comm<strong>and</strong>s <strong>Dsync</strong> <strong>Direct</strong> IO Percentage Gain CommentCreate Database 16343 Sec<strong>on</strong>ds 440 Sec<strong>on</strong>ds 3614.32% Database size is 25 GBBCP In 1073 Sec<strong>on</strong>ds 334 Sec<strong>on</strong>ds 221.26% 6 milli<strong>on</strong> rowsInsert 88.3 Sec<strong>on</strong>ds 18.8 Sec<strong>on</strong>ds 369.68% 100000 rowsUpdate 11 Sec<strong>on</strong>ds 10.4 Sec<strong>on</strong>ds 5.77% 10000 rows9.2 <str<strong>on</strong>g>Performance</str<strong>on</strong>g> comparis<strong>on</strong> between <strong>Direct</strong> I/O <strong>and</strong> <strong>Raw</strong> partiti<strong>on</strong>Comm<strong>and</strong>s <strong>Direct</strong> I/O <strong>Raw</strong> Partiti<strong>on</strong>s % difference CommentsCreate Database 440 Sec<strong>on</strong>ds 457 Sec<strong>on</strong>ds 3.86% Database size is 25 GBUpdate Statistics 323 Sec<strong>on</strong>ds 343 Sec<strong>on</strong>ds 6.19% 24 milli<strong>on</strong> rows tableAlter Table 3969 Sec<strong>on</strong>ds 3699 Sec<strong>on</strong>ds 7.3% 24 milli<strong>on</strong> rows tablewith cl index, <strong>and</strong> nlindexReorg Rebuild 5102 Sec<strong>on</strong>ds 5181 Sec<strong>on</strong>ds 1.55% Same as aboveBcp In 334 Sec<strong>on</strong>ds 338 Sec<strong>on</strong>ds 1.2% 6 milli<strong>on</strong> rowsInsert 18.8 Sec<strong>on</strong>ds 19.2 Sec<strong>on</strong>ds 2.13% 100000 rowsUpdate 10.4 Sec<strong>on</strong>ds 9.2 Sec<strong>on</strong>ds 13.04%Create clustered index 3103 Sec<strong>on</strong>ds 2799 Sec<strong>on</strong>ds -10.86% 24 milli<strong>on</strong> rows tableCreate N<strong>on</strong> Clusteredindex2296 Sec<strong>on</strong>ds 2224 Sec<strong>on</strong>ds -3.24% 24 milli<strong>on</strong> rows table,4.3 GB of size7


10.0 CONCLUSION<strong>Direct</strong> I/O is <strong>on</strong>e of the many features in ASE 15.0 that improves the performance significantly. Above results showsthat direct I/O provides same performance benefit as raw devices, but has the ease of use <strong>and</strong> manageability of thefile system devices. On Solaris platform, direct IO provides excellent performance gain over dsync opti<strong>on</strong>. Theperformance gain provided by direct IO varies from platform to platform.APPENDIX A1) Create database comm<strong>and</strong>Create database tpcd <strong>on</strong> tpcd_dev1 = 10000, tpcd_dev2 = 10000 log <strong>on</strong> tpcd_log1 = 50002) Update Statistics comm<strong>and</strong>Update statistics lineitem3) Alter table comm<strong>and</strong>Alter table lineitem lock datarows4) Reorg comm<strong>and</strong>Reorg rebuild lineitem5) Create clustered index comm<strong>and</strong>Create unique clustered index cl <strong>on</strong> lineitem(l_orderkey, l_linenumber)6) Create n<strong>on</strong> clustered index comm<strong>and</strong>Create n<strong>on</strong> clustered index nl <strong>on</strong> lineitem(l_suppkey, l_orderkey,l_extendedprice, l_discount,l_shipdate)7) Update comm<strong>and</strong>set rowcount 100000update lineitem set l_comment = “xxxxxxxxxxxxxxxxxxxxxxxxxx”8) BCP IN comm<strong>and</strong>Bcp tpcd..line in line.out –Usa –P –b 20000 –c –A40969) Checkalloc comm<strong>and</strong>Dbcc checkalloc(“tpcd”)10) Tablealloc comm<strong>and</strong>Dbcc tablealloc(“lineitem”)8


www.sybase.comSYBASE, INC. WORLDWIDE HEADQUARTERS, ONE SYBASE DRIVE, DUBLIN, CA 94568 USA 1 800 8 SYBASE Copyright © 2006 <strong>Sybase</strong>, Inc.All rights reserved. Unpublished rights reserved under U.S. copyright laws. <strong>Sybase</strong>, the <strong>Sybase</strong> logo, <strong>and</strong>Adaptive Server are trademarks of <strong>Sybase</strong>, Inc. or itssubsidiaries. All other trademarks are the property of their respective owners. ® indicates registrati<strong>on</strong> in the United States. Specificati<strong>on</strong>s are subject tochange without notice. 08-06

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!