One - The Linux Kernel Archives

More documents

Recommendations

Info

220 • Linux Symposium 2004 • Volume One accept strategy used in conjunction with epoll- LT and epoll-ctlv to further improve their performance. We initially ran the one byte and the SPECweb99-like workloads to compare the performance of the select, poll and leveltriggered epoll mechanisms. As shown in Figure 1 and Figure 2, for both of these workloads select and poll perform as well as epoll-LT. It is important to note that because there are no idle connections for these experiments the number of socket descriptors tracked by each mechanism is not very high. As expected, the gap between epoll-LT and select is more pronounced for the one byte workload because it places more stress on the event dispatch mechanism. We tried to improve the performance of the server by exploring different techniques for using epoll as described in Section 3. The effect of these techniques on the one-byte workload is shown in Figure 3. The graphs in this figure show that for this workload the techniques used to reduce the number of epoll_ctl calls do not provide significant benefits when compared with their level-triggered counterpart (epoll- LT). Additionally, the performance of select and poll is equal to or slightly better than each of the epoll techniques. Note that we omit the line for poll from Figures 3 and 4 because it is nearly identical to the select line. Replies/s 25000 20000 15000 10000 5000 select epoll-LT epoll-ET epoll-ctlv epoll2 Replies/s 25000 20000 15000 10000 select poll epoll-LT 0 0 5000 10000 15000 20000 25000 30000 Requests/s Figure 3: µserver performance on one byte workload with no idle connections 5000 0 0 5000 10000 15000 20000 25000 30000 Requests/s Figure 1: µserver performance on one byte workload using select, poll, and epoll-LT Replies/s 25000 20000 15000 10000 5000 0 0 5000 10000 15000 20000 25000 30000 Requests/s select poll epoll-LT Figure 2: µserver performance on SPECweb99-like workload using select, poll, and epoll-LT We further analyze the results from Figure 3 by profiling the µserver using gprof at the request rate of 22,000 requests per second. Table 1 shows the percentage of time spent in system calls (rows) under the various event dispatch methods (columns). The output for system calls and µserver functions which do not contribute significantly to the total run-time is left out of the table for clarity. If we compare the select and poll columns we see that they have a similar breakdown including spending about 13% of their time indicating to the kernel events of interest and obtaining events. In contrast the epoll-LT, epoll-ctlv, and epoll2 approaches spend about 21 – 23% of their time on their equivalent functions (epoll_ctl, epoll_ctlv and epoll_wait). Despite these extra overheads the throughputs obtained using the epoll techniques compare favourably with those obtained
Linux Symposium 2004 • Volume One • 221 select epoll-LT epoll-ctlv epoll2 epoll-ET poll read 21.51 20.95 21.41 20.08 22.19 20.97 close 14.90 14.05 14.90 13.02 14.14 14.79 select 13.33 - - - - - poll - - - - - 13.32 epoll_ctl - 16.34 5.98 10.27 11.06 - epoll_wait - 7.15 6.01 12.56 6.52 - epoll_ctlv - - 9.28 - - - setsockopt 11.17 9.13 9.13 7.57 9.08 10.68 accept 10.08 9.51 9.76 9.05 9.30 10.20 write 5.98 5.06 5.10 4.13 5.31 5.70 fcntl 3.66 3.34 3.37 3.14 3.34 3.61 sendfile 3.43 2.70 2.71 3.00 3.91 3.43 Table 1: gprof profile data for the µserver under the one-byte workload at 22,000 requests/sec using select and poll. We note that when using select and poll the application requires extra manipulation, copying, and event scanning code that is not required in the epoll case (and does not appear in the gprof data). The results in Table 1 also show that the overhead due to epoll_ctl calls is reduced in epoll-ctlv, epoll2 and epoll-ET, when compared with epoll-LT. However, in each case these improvements are offset by increased costs in other portions of the code. The epoll2 technique spends twice as much time in epoll_wait when compared with epoll-LT. With epoll2 the number of calls to epoll_wait is significantly higher, the average number of descriptors returned is lower, and only a very small proportion of the calls (less than 1%) return events that need to be acted upon by the server. On the other hand, when compared with epoll-LT the epoll2 technique spends about 6% less time on epoll_ctl calls so the total amount of time spent dealing with events is comparable with that of epoll-LT. Despite the significant epoll_wait overheads epoll2 performance compares favourably with the other methods on this workload. Using the epoll-ctlv technique, gprof indicates that epoll_ctlv and epoll_ctl combine for a total of 1,949,404 calls compared with 3,947,769 epoll_ctl calls when using epoll-LT. While epoll-ctlv helps to reduce the number of user-kernel boundary crossings, the net result is no better than epoll- LT. The amount of time taken by epoll-ctlv in epoll_ctlv and epoll_ctl system calls is about the same (around 16%) as that spent by level-triggered epoll in invoking epoll_ctl. When comparing the percentage of time epoll- LT and epoll-ET spend in epoll_ctl we see that it has been reduced using epoll-ET from 16% to 11%. Although the epoll_ctl time has been reduced it does not result in an appreciable improvement in throughput. We also note that about 2% of the run-time (which is not shown in the table) is also spent in the epoll-ET case checking, and tracking the state of the request (i.e., whether the server should be reading or writing) and the state of the socket (i.e., whether it is readable or writable). We expect that this can be reduced but that it wouldn’t noticeably impact performance. Results for the SPECweb99-like workload are
Page 1:
Proceedings of the Linux Symposium
Page 4 and 5:
The State of ACPI in the Linux Kern
Page 7 and 8:
Conference Organizers Andrew J. Hut
Page 9 and 10:
TCP Connection Passing Werner Almes
Page 11 and 12:
Linux Symposium 2004 • Volume One
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Cooperative Linux Dan Aloni da-x@co
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Build your own Wireless Access Poin
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Run-time testing of LSB Application
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Linux Block IO—present and future
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Linux AIO Performance and Robustnes
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Methods to Improve Bootup Time in L
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Linux on NUMA Systems Martin J. Bli
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Improving Kernel Performance by Unm
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Linux Virtualization on IBM POWER5
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
The State of ACPI in the Linux Kern
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Scaling Linux® to the Extreme From
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Get More Device Drivers out of the
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Big Servers—2.6 compared to 2.4 W
Page 165 and 166:
Page 167 and 168:
Multi-processor and Frequency Scali
Page 169 and 170: Linux Symposium 2004 • Volume One
Page 187 and 188: Dynamic Kernel Module Support: From
Page 203 and 204: e100 Weight Reduction Program Writi
Page 207 and 208: NFSv4 and rpcsec_gss for linux J. B
Page 215 and 216: Comparing and Evaluating epoll, sel
Page 219: Linux Symposium 2004 • Volume One
Page 227 and 228: The (Re)Architecture of the X Windo
Page 239 and 240: IA64-Linux perf tools for IO dorks
Page 255 and 256: Carrier Grade Server Features in th
Page 269 and 270: Demands, Solutions, and Improvement
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
Page 287 and 288:
Hotplug Memory and the Linux VM Dav
Page 289 and 290:
Page 291 and 292:
Page 293 and 294:
show all

One - The Linux Kernel Archives

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?