Performance on Sybase Devices Using Direct I/O, Dsync and Raw ...
Performance on Sybase Devices Using Direct I/O, Dsync and Raw ...
Performance on Sybase Devices Using Direct I/O, Dsync and Raw ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
WHITE PAPER<str<strong>on</strong>g>Performance</str<strong>on</strong>g> <strong>on</strong> <strong>Sybase</strong> ® <strong>Devices</strong> <strong>Using</strong><strong>Direct</strong> I/O, <strong>Dsync</strong> <strong>and</strong> <strong>Raw</strong> Partiti<strong>on</strong>DHIMANT CHOKSHISERVER PERFORMANCE ENGINEERING AND DEVELOPMENT GROUPSYBASE, INC.
TABLE OF CONTENTS1 1.0 Introducti<strong>on</strong>1 2.0 Asynchr<strong>on</strong>ous I/O2 3.0 Synchr<strong>on</strong>ous I/O2 4.0 <strong>Dsync</strong> writes3 DBCC Tablealloc test3 5.0 <strong>Direct</strong> I/O4 Reorg Rebuild test4 Create Database Test5 Summary of IO Operati<strong>on</strong>s5 6.0 Supported Platforms5 7.0 <str<strong>on</strong>g>Performance</str<strong>on</strong>g>5 8.0 C<strong>on</strong>figurati<strong>on</strong>6 9.0 Results8 10.0 C<strong>on</strong>clusi<strong>on</strong>8 C<strong>on</strong>clusi<strong>on</strong>i
1.0 INTRODUCTION<strong>Sybase</strong> Adaptive Server® Enterprise 15.0 substantially enhances a database platform already known for its superiorperformance, reliability, <strong>and</strong> low total cost of ownership (TCO). Prior to ASE 15, it was recommended that the dsyncsetting be turned <strong>on</strong> for devices initialized <strong>on</strong> UNIX operating system files. The dsync flag ensures that writes to thedevice file occur directly <strong>on</strong> the physical storage media, <strong>and</strong> Adaptive Server can recover data <strong>on</strong> the device in theevent of a system failure. This functi<strong>on</strong>ality, however, yields much slower performance during write operati<strong>on</strong>s, whencompared to using raw partiti<strong>on</strong>s.In ASE 15, the directio parameter allows you to c<strong>on</strong>figure Adaptive Server to transfer data directly to disk, bypassingthe operating system buffer cache. directio performs IO in the same manner as raw devices <strong>and</strong> provides the sameperformance benefit as raw devices (significantly better than dsync), but has the ease of use <strong>and</strong> manageability of filesystem devices.Note: The directio <strong>and</strong> dsync parameters are mutually exclusive. If a device has dsync set to “true,” you cannot setdirectio to “true” for this device. To enable directio for a device, you must first reset dsync to “false.”Following few secti<strong>on</strong>s explains Asynchr<strong>on</strong>ous, <strong>and</strong> Synchr<strong>on</strong>ous system calls, as well as direct I/O <strong>and</strong> write with<strong>Dsync</strong> opti<strong>on</strong>.2.0 ASYNCHRONOUS I/OAsynchr<strong>on</strong>ous I/O allows the process issuing the I/O to c<strong>on</strong>tinue executing while OS completes the request.Dataserver process does not block <strong>on</strong> this request till its completi<strong>on</strong>. Following figure shows how Adaptive serverworks with asynchr<strong>on</strong>ous I/O.dataserverissuewritecall to aiowrite()aiowrite() returnsO/SwritestartedPerform I/O in adifferent c<strong>on</strong>textDataserver doesother work whileI/O is in progresswritePoll for I/Ocompleti<strong>on</strong>aiowait()return I/Ostatus1
3.0 SYNCHRONOUS I/OSynchr<strong>on</strong>ous I/O blocks the I/O issuing process from c<strong>on</strong>tinuing executing while OS completes the request. Theprocess is blocked till the request is completed. This generally results in poor throughput. Following figure shows howAdaptive Server works with synchr<strong>on</strong>ous I/O.dataserverissuewritecall to write()O/SwritestartedDataserver blockswaiting for I/OPerform I/Oc<strong>on</strong>tinuewrite() returnswritecomplete4.0 DSYNC WRITESFor devices initialized <strong>on</strong> UNIX operating system files, the dsync setting c<strong>on</strong>trols whether or not writes to thosefiles are buffered. When the dsync setting is <strong>on</strong>, Adaptive Server opens a database device file using the UNIX dsyncflag. The dsync flag ensures that writes to the device file occur directly <strong>on</strong> the physical storage media, <strong>and</strong> AdaptiveServer can recover data <strong>on</strong> the device in the event of a system failure.Use of dsync opti<strong>on</strong> with database device files incurs the following performance trade-offs:• If database device files <strong>on</strong> these platforms use the dsync opti<strong>on</strong>, the Adaptive Server engine writing to the devicefile blocks until the write operati<strong>on</strong> completes. This can cause poor performance during update operati<strong>on</strong>s• Resp<strong>on</strong>se time for read operati<strong>on</strong>s is generally better for devices stored <strong>on</strong> UNIX operating system files ascompared to devices stored <strong>on</strong> raw partiti<strong>on</strong>s. Data from device files can benefit from the UNIX file system cacheas well as the Adaptive Server cache, <strong>and</strong> more reads may take place without requiring physical disk access.Following figure shows how Adaptive server works when dsync setting is turned <strong>on</strong>.dataserverissuewritecall to aiowrite()aiowrite() returnsO/SwritestartedPerform I/O in adifferent c<strong>on</strong>textDataserver doesother work whileI/O is in progressPoll for I/Ocompleti<strong>on</strong>aiowait()writecompletereturn I/OstatusOnly thing DSYNC changes is what “write complete” means.The I/O is still async from the perspective of ASE.2
Note: <strong>Dsync</strong> opti<strong>on</strong> is useful in Intel 32bit platforms where system can support up to 64G memory but <strong>on</strong>ly 4G can beaddress by ASE. In such scenario, customers can use file system to cache the data in the remaining memory not used byASE. But with DIRECTIO <strong>and</strong> Extended cache we can circumvent the problem <strong>and</strong> use the memory given to the FS by ASE.This is not applicable to 64bit platform hence <strong>Direct</strong> I/O is always the preferred way.DBCC Tablealloc testFollowing chart shows advantage of using dsync opti<strong>on</strong> <strong>on</strong> file system devices for read operati<strong>on</strong>. In this test, Dbcctablealloc comm<strong>and</strong> is executed against <strong>Sybase</strong> file system devices with direct IO, <strong>and</strong> devices with dsync opti<strong>on</strong>senabled. Dbcc Tablealloc reads <strong>and</strong> checks the pages from specified table or data partiti<strong>on</strong> to see that all the pages arecorrectly allocated <strong>and</strong> that no page that is allocated is not used. With dsync opti<strong>on</strong> enabled, resp<strong>on</strong>se time is 96sec<strong>on</strong>ds, while with direct IO opti<strong>on</strong>, resp<strong>on</strong>se time is 386 sec<strong>on</strong>ds.Sec<strong>on</strong>ds400350300250200150100500TableAlloc<strong>Dsync</strong><strong>Direct</strong> IO5.0 DIRECT I/O<strong>Direct</strong> I/O is the feature of the file system in which writes directly goes to the storage device, bypassing theoperating system buffer cache. <strong>Direct</strong> I/O is invoked by opening a file with O_DIRECT flag. On Solaris platform,directio() system call is used after the normal open() call to open file system device. The direct I/O parameter is usedwith disk init, disk reinit, <strong>and</strong> sp_deviceattr comm<strong>and</strong>s. <strong>Direct</strong> I/O performs IO in the same manner as raw devices <strong>and</strong>provides the same performance benefit as raw devices.1. RequestwriteASE DataCache3. Write returnssuccess2. Issue write<strong>on</strong> dirtyASEEngineFile <strong>on</strong> Disk1. ASE requires dirty page in ASE data cache be written written written.2. ASE issues a write <strong>on</strong> that page. Write goes directly to disk, bypassing the FS cache3. Write returns when data hits the disk.Note: <strong>Direct</strong> IO, as well as dsync opti<strong>on</strong>s should not be used for tempdb devices, as recovery is not important fortempdb.3
Following few charts shows performance improvement of various comm<strong>and</strong>s using <strong>Direct</strong> IO over <strong>Dsync</strong> opti<strong>on</strong>.Summary of all the results are shown in secti<strong>on</strong> 9.0.Reorg Rebuild testFollowing chart shows the performance of Reorg-Rebuild test <strong>on</strong> devices with <strong>Direct</strong> IO opti<strong>on</strong>, <strong>and</strong> <strong>Dsync</strong> opti<strong>on</strong>enabled. Rebuilding the index to compact the index space <strong>and</strong> improve the clustering ratio is a typical operati<strong>on</strong> inany database envir<strong>on</strong>ment. <strong>Direct</strong> IO provides 90% performance improvement when executing the reorg-rebuild <strong>on</strong> atable of 24 milli<strong>on</strong> rows with clustered index <strong>and</strong> a n<strong>on</strong>-clustered index. This performance improvement isattributable to write operati<strong>on</strong>s bypassing the operating system buffer cache.100008000Sec<strong>on</strong>ds6000400020000<strong>Dsync</strong><strong>Direct</strong> IOReorg-RebuildCreate Database TestCreate Database operati<strong>on</strong> initializes 256 pages of writes at a time, which amounts to 512k for page size 2k <strong>and</strong> upto 4 MB for page size of 16K. This allows for full use of the I/O b<strong>and</strong>width of the underlying devices. With <strong>Direct</strong> IOopti<strong>on</strong>, create database comm<strong>and</strong> for a 25 GB of database takes 7 minutes compare to 4.5 hours with <strong>Dsync</strong> opti<strong>on</strong>.2000015000Sec<strong>on</strong>ds1000050000<strong>Dsync</strong><strong>Direct</strong> IOCreate DB4
Summary of IO Operati<strong>on</strong>sAsynchr<strong>on</strong>ous I/OSynchr<strong>on</strong>ous I/O<strong>Dsync</strong> writes<strong>Direct</strong> I/OFeature/Major benefitsAllows OS c<strong>on</strong>trol/no databaseinterferingIO issuing process is blocked while OSis executing the IOWrites to the device occur directly tomedia storage. Recovery is ensured incase of failureBypasses the OS buffer cache.Provides better write performanceVersi<strong>on</strong>ASE 12.5, ASE15.0ASE 12.5, ASE 15.0ASE 12.5, ASE 15.0/UNIX <strong>on</strong>lyASE 15.06.0 SUPPORTED PLATFORMSFollowing chart shows supported platforms for <strong>Direct</strong> IO.<strong>Direct</strong> I/O <strong>Dsync</strong> <strong>Raw</strong> Partiti<strong>on</strong>Solaris SPARC 2-bit/64 bit yes yes yesIBM® AIX 64 bit yes yes yesWindows® x86 yes yes yesLinux® x86 yes yes yesHP-UX PA-RISC 64 bit No yes yes7.0 PERFORMANCETo analyze the I/O performance <strong>on</strong> various <strong>Sybase</strong> devices, TPCH benchmark suite with dbscale of 15 has been usedto populate the database. Total database size is 24 GB. Most of the comm<strong>and</strong>s used in this benchmark test are I/Ooriented.8.0 CONFIGURATION8.1 System C<strong>on</strong>figurati<strong>on</strong>Server MachineTotal MemorySun-Fire-V49016 GB# of CPUS 8 X 1050 MHzServer Software ASE 15.05
8.2 ASE 15.0 c<strong>on</strong>figurati<strong>on</strong>ASE 15.0 server is built with 4k page size. In all three tests, ASE 15.0 server c<strong>on</strong>figurati<strong>on</strong> is kept identical. Most ofthe c<strong>on</strong>figurati<strong>on</strong> parameters are kept to default value except shown in the table.C<strong>on</strong>figurati<strong>on</strong> Parameter NameValueMax memoryCache size2k Buffer Pool16k Buffer Pool4 GB3 GB1.5 GB1.5 GBDisk I/O Structures 2048Max network packet size 4096Default network packet size 4096Max async i/os per engine 2048Max async i/os per server 2048Max <strong>on</strong>line engines 4Procedure cache size 30000Number of pre-allocated extents 31Number of sort buffers 20009.0 RESULTSAll the results are measured in sec<strong>on</strong>ds. All the comm<strong>and</strong>s used in this benchmark are shown in Appendix A.9.1 <str<strong>on</strong>g>Performance</str<strong>on</strong>g> comparis<strong>on</strong> between <strong>Direct</strong> I/O <strong>and</strong> devices with dsync <strong>on</strong>Results are categorized in three secti<strong>on</strong>s. In first secti<strong>on</strong>, results of the read <strong>on</strong>ly comm<strong>and</strong>s are shown. In sec<strong>on</strong>dsecti<strong>on</strong>, results of the comm<strong>and</strong>s that perform read <strong>and</strong> write operati<strong>on</strong>s are shown. And in the third secti<strong>on</strong>, resultsof the comm<strong>and</strong>s that perform write operati<strong>on</strong>s are illustrated.9.1.1 Read operati<strong>on</strong>s Results<str<strong>on</strong>g>Performance</str<strong>on</strong>g> for read operati<strong>on</strong> is better when devices are opened with dsync opti<strong>on</strong>. Reads benefit from the UNIXfile system cache as well as the Adaptive Server cache, <strong>and</strong> more reads may take place without requiring physical diskaccess.Comm<strong>and</strong>s <strong>Dsync</strong> <strong>Direct</strong> IO Percentage Gain CommentDbcc Tablealloc 96 Sec<strong>on</strong>ds 381 Sec<strong>on</strong>ds -296% 24 milli<strong>on</strong> rows tableUpdate Statistics 249 Sec<strong>on</strong>ds 323 Sec<strong>on</strong>ds -29.72% 24 milli<strong>on</strong> rows tableNote: Resp<strong>on</strong>se time for dbcc tablealloc comm<strong>and</strong> was 373 sec<strong>on</strong>ds, when executed after clearing file system cache.Data from file system cache was cleared by rebooting the machine.6
9.1.2 Read Write operati<strong>on</strong>s ResultsComm<strong>and</strong>s <strong>Dsync</strong> <strong>Direct</strong> IO Percentage Gain CommentAlter Table 10635 Sec<strong>on</strong>ds 3969 Sec<strong>on</strong>ds 167.95% 24 milli<strong>on</strong> rows tablewith cl index, <strong>and</strong> nlindexCreate Clustered 16233 Sec<strong>on</strong>ds 3103 Sec<strong>on</strong>ds 423.14% 24 milli<strong>on</strong> rows, 4.3 GBindextable sizeCreate N<strong>on</strong> Clustered 4928 Sec<strong>on</strong>ds 2296 Sec<strong>on</strong>ds 114.63% 24 milli<strong>on</strong> rows table,index4.3 GB of sizeReorg Rebuild 9686 Sec<strong>on</strong>ds 5102 Sec<strong>on</strong>ds 89.85% Same as above9.1.3 Write operati<strong>on</strong>s resultsResults clearly dem<strong>on</strong>strates that comm<strong>and</strong>s performing <strong>on</strong>ly write operati<strong>on</strong>s to the storage device showssignificant performance gain with direct IO, as it bypasses operating system buffer cache.Comm<strong>and</strong>s <strong>Dsync</strong> <strong>Direct</strong> IO Percentage Gain CommentCreate Database 16343 Sec<strong>on</strong>ds 440 Sec<strong>on</strong>ds 3614.32% Database size is 25 GBBCP In 1073 Sec<strong>on</strong>ds 334 Sec<strong>on</strong>ds 221.26% 6 milli<strong>on</strong> rowsInsert 88.3 Sec<strong>on</strong>ds 18.8 Sec<strong>on</strong>ds 369.68% 100000 rowsUpdate 11 Sec<strong>on</strong>ds 10.4 Sec<strong>on</strong>ds 5.77% 10000 rows9.2 <str<strong>on</strong>g>Performance</str<strong>on</strong>g> comparis<strong>on</strong> between <strong>Direct</strong> I/O <strong>and</strong> <strong>Raw</strong> partiti<strong>on</strong>Comm<strong>and</strong>s <strong>Direct</strong> I/O <strong>Raw</strong> Partiti<strong>on</strong>s % difference CommentsCreate Database 440 Sec<strong>on</strong>ds 457 Sec<strong>on</strong>ds 3.86% Database size is 25 GBUpdate Statistics 323 Sec<strong>on</strong>ds 343 Sec<strong>on</strong>ds 6.19% 24 milli<strong>on</strong> rows tableAlter Table 3969 Sec<strong>on</strong>ds 3699 Sec<strong>on</strong>ds 7.3% 24 milli<strong>on</strong> rows tablewith cl index, <strong>and</strong> nlindexReorg Rebuild 5102 Sec<strong>on</strong>ds 5181 Sec<strong>on</strong>ds 1.55% Same as aboveBcp In 334 Sec<strong>on</strong>ds 338 Sec<strong>on</strong>ds 1.2% 6 milli<strong>on</strong> rowsInsert 18.8 Sec<strong>on</strong>ds 19.2 Sec<strong>on</strong>ds 2.13% 100000 rowsUpdate 10.4 Sec<strong>on</strong>ds 9.2 Sec<strong>on</strong>ds 13.04%Create clustered index 3103 Sec<strong>on</strong>ds 2799 Sec<strong>on</strong>ds -10.86% 24 milli<strong>on</strong> rows tableCreate N<strong>on</strong> Clusteredindex2296 Sec<strong>on</strong>ds 2224 Sec<strong>on</strong>ds -3.24% 24 milli<strong>on</strong> rows table,4.3 GB of size7
10.0 CONCLUSION<strong>Direct</strong> I/O is <strong>on</strong>e of the many features in ASE 15.0 that improves the performance significantly. Above results showsthat direct I/O provides same performance benefit as raw devices, but has the ease of use <strong>and</strong> manageability of thefile system devices. On Solaris platform, direct IO provides excellent performance gain over dsync opti<strong>on</strong>. Theperformance gain provided by direct IO varies from platform to platform.APPENDIX A1) Create database comm<strong>and</strong>Create database tpcd <strong>on</strong> tpcd_dev1 = 10000, tpcd_dev2 = 10000 log <strong>on</strong> tpcd_log1 = 50002) Update Statistics comm<strong>and</strong>Update statistics lineitem3) Alter table comm<strong>and</strong>Alter table lineitem lock datarows4) Reorg comm<strong>and</strong>Reorg rebuild lineitem5) Create clustered index comm<strong>and</strong>Create unique clustered index cl <strong>on</strong> lineitem(l_orderkey, l_linenumber)6) Create n<strong>on</strong> clustered index comm<strong>and</strong>Create n<strong>on</strong> clustered index nl <strong>on</strong> lineitem(l_suppkey, l_orderkey,l_extendedprice, l_discount,l_shipdate)7) Update comm<strong>and</strong>set rowcount 100000update lineitem set l_comment = “xxxxxxxxxxxxxxxxxxxxxxxxxx”8) BCP IN comm<strong>and</strong>Bcp tpcd..line in line.out –Usa –P –b 20000 –c –A40969) Checkalloc comm<strong>and</strong>Dbcc checkalloc(“tpcd”)10) Tablealloc comm<strong>and</strong>Dbcc tablealloc(“lineitem”)8
www.sybase.comSYBASE, INC. WORLDWIDE HEADQUARTERS, ONE SYBASE DRIVE, DUBLIN, CA 94568 USA 1 800 8 SYBASE Copyright © 2006 <strong>Sybase</strong>, Inc.All rights reserved. Unpublished rights reserved under U.S. copyright laws. <strong>Sybase</strong>, the <strong>Sybase</strong> logo, <strong>and</strong>Adaptive Server are trademarks of <strong>Sybase</strong>, Inc. or itssubsidiaries. All other trademarks are the property of their respective owners. ® indicates registrati<strong>on</strong> in the United States. Specificati<strong>on</strong>s are subject tochange without notice. 08-06