One - The Linux Kernel Archives

More documents

Recommendations

Info

82 • Linux Symposium 2004 • Volume One ide_init-> probe_for_hwifs(1)-> ide_scan_pcibus(1)-> ide_scan_pci_dev(2)-> piix_init_one(2)-> init_setup_piix(2)-> ide_setup_pci_device(2)-> probe_hwif_init(2)-> probe_hwif(4)-> do_probe(31)-> ide_delay_50ms(78)-> __const_udelay(3900)-> __delay(3900)-> delay_tsc(3900) try_to_identify(8)-> actual_try_to_identify(8)-> ide_delay_50ms(24)-> __const_udelay(1200)-> __delay(1200)-> delay_tsc(1200) Figure 1: IDE init call tree isapnp_init-> isapnp_isolate(1)-> isapnp_isolate_rdp_select(1)-> __const_udelay(25)-> __delay(25)-> delay_tsc(25) isapnp_key(18)-> __const_udelay(18)-> __delay(18)-> delay_tsc(18) Figure 2: ISAPnP init call tree __const_udelay() very many times. The busywaits in ide_delay_50ms() alone accounted for over 5 seconds, or about 70% of the total boot up time. Another significant area of delay was the initialization of the ISAPnP system. This took about 380 milliseconds on the test machine. Both the mouse and the keyboard drivers used crude busywaits to wait for acknowledgements from their respective hardware. Finally, the routine calibrate_delay() took about 25 milliseconds to run, to compute the value of loops_per_jiffy and print (the related) BogoMips for the machine. The remaining sections of this paper discuss various specific methods for reducing bootup time for embedded and desktop systems. Some of these methods are directly related to some of the delay areas identified in this test configuration. 4 Kernel Execute-In-Place A typical sequence of events during bootup is for the bootloader to load a compressed kernel image from either disk or Flash, placing it into RAM. The kernel is decompressed, either during or just after the copy operation. Then the kernel is executed by jumping to the function start_kernel(). Kernel Execute-In-Place (XIP) is a mechanism where the kernel instructions are executed directly from ROM or Flash. In a kernel XIP configuration, the step of copying the kernel code segment into RAM is omitted, as well as any decompression step. Instead, the kernel image is stored uncompressed in ROM or Flash. The kernel data segments still need to be initialized in RAM, but by eliminating the text segment copy and decompression, the overall effect is a reduction in the time required for the firmware phase of the bootup. Table 3 shows the differences in time duration for various parts of the boot stage for a system booted with and without use of kernel XIP. The times in the table are shown in milliseconds. The table shows that using XIP in this configuration significantly reduced the time to copy the kernel to RAM (because only the data segments were copied), and completely eliminated the time to decompress the kernel (453 milliseconds). However, the kernel initialization time increased slightly in the XIP configuration, for a net savings of 463 milliseconds. In order to support an Execute-In-Place con-
Linux Symposium 2004 • Volume One • 83 Boot Stage Non-XIP time XIP time Copy kernel to RAM 85 12 Decompress kernel 453 0 Kernel initialization 819 882 Total kernel boot time 1357 894 Note: Times are in milliseconds. Results are for PowerPC 405 LP at 266 MHz Table 3: Comparison of Non-XIP vs. XIP bootup times figuration, the kernel must be compiled and linked so that the code is ready to be executed from a fixed memory location. There are examples of XIP configurations for ARM, MIPS and SH platforms in the CELinux source tree, available at: http://tree. celinuxforum.org/ 4.1 XIP Design Tradeoffs There are tradeoffs involved in the use of XIP. First, it is common for access times to flash memory to be greater than access times to RAM. Thus, a kernel executing from Flash usually runs a bit slower than a kernel executing from RAM. Table 4 shows some of the results from running the lmbench benchmark on system, with the kernel executing in a standard non-XIP configuration versus an XIP configuration. Operation Non-XIP XIP stat() syscall 22.4 25.6 fork a process 4718 7106 context switching for 16 932 1109 processes and 64k data size pipe communication 248 548 Note: Times are in microseconds. Results are for lmbench benchmark run on OMAP 1510 (ARM9 at 168 MHz) processor Table 4: Comparison of Non-XIP and XIP performance Some of the operations in the benchmark took significantly longer with the kernel run in the XIP configuration. Most individual operations took about 20% to 30% longer. This performance penalty is suffered permanently while the kernel is running, and thus is a serious drawback to the use of XIP for reducing bootup time. A second tradeoff with kernel XIP is between the sizes of various types of memory in the system. In the XIP configuration the kernel must be stored uncompressed, so the amount of Flash required for the kernel increases, and is usually about doubled, versus a compressed kernel image used with a non-XIP configuration. However, the amount of RAM required for the kernel is decreased, since the kernel code segment is never copied to RAM. Therefore, kernel XIP is also of interest for reducing the runtime RAM footprint for Linux in embedded systems. There is additional research under way to investigate ways of reducing the performance impact of using XIP. One promising technique appears to be the use of “partial-XIP,” where a highly active subset of the kernel is loaded into RAM, but the majority of the kernel is executed in place from Flash. 5 Delay Calibration Avoidance One time-consuming operation inside the kernel is the process of calibrating the value used for delay loops. One of the first routines in the kernel, calibrate_delay(), executes a series of delays in order to determine the correct value for a variable called loops_per_ jiffy, which is then subsequently used to execute short delays in the kernel. The cost of performing this calibration is, interestingly, independent of processor speed. Rather, it is dependent on the number of iter-
Page 1:
Proceedings of the Linux Symposium
Page 4 and 5:
The State of ACPI in the Linux Kern
Page 7 and 8:
Conference Organizers Andrew J. Hut
Page 9 and 10:
TCP Connection Passing Werner Almes
Page 11 and 12:
Linux Symposium 2004 • Volume One
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Cooperative Linux Dan Aloni da-x@co
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32: Linux Symposium 2004 • Volume One
Page 33 and 34: Build your own Wireless Access Poin
Page 41 and 42: Run-time testing of LSB Application
Page 51 and 52: Linux Block IO—present and future
Page 63 and 64: Linux AIO Performance and Robustnes
Page 79 and 80: Methods to Improve Bootup Time in L
Page 81: Linux Symposium 2004 • Volume One
Page 89 and 90: Linux on NUMA Systems Martin J. Bli
Page 103 and 104: Improving Kernel Performance by Unm
Page 113 and 114: Linux Virtualization on IBM POWER5
Page 121 and 122: The State of ACPI in the Linux Kern
Page 133 and 134:
Scaling Linux® to the Extreme From
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Get More Device Drivers out of the
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Big Servers—2.6 compared to 2.4 W
Page 165 and 166:
Page 167 and 168:
Multi-processor and Frequency Scali
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Dynamic Kernel Module Support: From
Page 189 and 190:
Page 191 and 192:
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
e100 Weight Reduction Program Writi
Page 205 and 206:
Page 207 and 208:
NFSv4 and rpcsec_gss for linux J. B
Page 209 and 210:
Page 211 and 212:
Page 213 and 214:
Page 215 and 216:
Comparing and Evaluating epoll, sel
Page 217 and 218:
Page 219 and 220:
Page 221 and 222:
Page 223 and 224:
Page 225 and 226:
Page 227 and 228:
The (Re)Architecture of the X Windo
Page 229 and 230:
Page 231 and 232:
Page 233 and 234:
Page 235 and 236:
Page 237 and 238:
Page 239 and 240:
IA64-Linux perf tools for IO dorks
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Carrier Grade Server Features in th
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Demands, Solutions, and Improvement
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
Page 287 and 288:
Hotplug Memory and the Linux VM Dav
Page 289 and 290:
Page 291 and 292:
Page 293 and 294:
show all

One - The Linux Kernel Archives

Create successful ePaper yourself

Delete template?

Save as template?