12.07.2015 Views

dtrace_tips

dtrace_tips

dtrace_tips

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Advanced DTraceTips, Tricks and GotchasBryan Cantrill, Mike Shapiro andAdam LeventhalTeam DTrace


Advanced DTrace●●●●Assumption that the basics of DTraceare understood – or at least familiarYou need not have used DTrace toappreciate this presentation......but the more you have, the moreyou'll appreciate itIn no particular order, we will bedescribing some <strong>tips</strong>, some tricks andsome gotchas


DTrace Tips●●●Tips are pointers to facilities that are(for the most part) fully documentedBut despite (usually) being welldocumented,they might not be wellknown...This presentation will present thesefacilities, but won't serve as a tutorialfor them; see the documentation fordetails


DTrace Tricks●●●There are a few useful DTracetechniques that are not obvious, andare not particularly documentedSome of these “tricks” are actuallyworkarounds to limitations in DTraceSome of these limitations are being (orwill be) addressed, so some tricks willbe obviated by future work


DTrace Gotchas●●●Like any system, DTrace has somepitfalls that novices may run into – anda few that even experts may run intoWe've tried to minimize these, butmany remain as endemic to theinstrumentation problemSeveral of these are documented, butthey aren't collected into a single place


Tip: Stable providers●●Allow meaningful instrumentation ofthe kernel without requiring knowledgeof its implementation, covering:– CPU scheduling (sched)– Process management (proc)– I/O (io)– Some kernel statistics (vminfo, sysinfo,fpuinfo, mib)– More on the way...If nothing else, read the documentationchapters covering them!


Tip: Speculative tracing●●●●DTrace has a well-known predicatemechanism for conditional executionThis works when one knows at probefiringwhether or not one is interestedBut in some cases, one only knowsafter the factSpeculative tracing is a mechanism forspeculatively recording data,committing it or discarding it at a latertime


Tip: Normalizing aggregations●●●Often, one wishes to know not absolutenumbers, but rather per-unit rates (e.g.system calls per second, I/O operationsper transaction, etc.)In DTrace, aggregations can be turnedinto per-unit rates via normalizationFormat is “normalize(@agg, n),”where agg is an aggregation and n isan arbitrary D expression


Tip: clear and tick probes●●clear zeroes an aggregation's valuesWith tick probes, clear can be usedto build custom monitoring tools:io:::start{@[execname] = count();}tick-1sec{printa(“%40s %@d\n”, @);clear(@);}


Trick: Valueless printa●●●●●printa takes a format string and anaggregation identifier“%@” in the format string denotes theaggregation valueThis is not required; you can print onlythe aggregation tupleCan be used as an implicit uniq(1)Can be used to effect a global orderingby specifying max(timestamp) as theaggregating action


Tip: stop●●One may wish to stop a process toallow subsequent investigation with atraditional debugger (e.g. DBX, MDB)Do this with the stop destructiveaction:#pragma D option destructiveio:::start/execname == “java”/{printf(“stopping %d...”, pid);stop();}


Trick: Conditional breakpoints●●●Existing conditional breakpointmechanisms are limited to pretty basicconditionsThe stop action and the pid providerallow for much richer conditionalbreakpointsFor example, breakpoint based on:– Return value– Argument value– Latency– ...


Gotcha: stop gone haywire●●●Be very careful when using stop – it'sa destructive action for a reason!If you somehow manage to stop everyprocess in the system, the system willeffectively be wedgedIf a stop script has gone haywire, try:– Setting <strong>dtrace</strong>_destructive_disallowto 1 via kmdb(1)/OBP– Waiting for deadman to abort DTrace enabling,then remotely logging in (hoping that inetdhasn't been stopped!)


Gotcha: Running into limits●If you try to enable very large D scripts(hundreds of enablings and/orthousands of actions), you may findthat DTrace rejects it:●<strong>dtrace</strong>: failed to enable './biggie.d': DIFprogram exceeds maximum program size●●This can be worked around by tuning<strong>dtrace</strong>_dof_maxsize in/etc/system or via “mdb -kw”Default size is 256K


Gotcha: Too many pid probes●●●pid probes are created on-the-fly asthey are enabledTo avoid denial-of-service, there is alimit on the number of pid probes thatcan be createdThis limit (250,000 by default) is lowenough that it can be hit for largeprocesses:●<strong>dtrace</strong>: invalid probe specifier pid123:::: failedto create probe in process 123: Not enough space


Tip: Allowing more pid probes●●Increase fasttrap-max-probesin /kernel/drv/fasttrap.confAfter updating value, either reboot or:– Make sure DTrace isn't running– Unload all modules (“modunload -i 0”)– Confirm that fasttrap is not loaded(“modinfo | grep fasttrap”)– Run “update_drv fasttrap”– New value will take effect upon subsequentDTrace use


Gotcha: Buffer drops●●●●There is always the possibility ofrunning out of buffer spaceThis is a consequence of instrumentingarbitrary contextsWhen a record is to be recorded andthere isn't sufficient space available,the record will be dropped, e.g.:<strong>dtrace</strong>: 978 drops on CPU 0<strong>dtrace</strong>: 11 aggregation drops on CPU 0


Tip: Tuning away buffer drops●●Every buffer in DTrace can be tuned ona per-consumer basis via -x or#pragma D optionBuffer sizes tuned via bufsize andaggsize● May use size suffixes (e.g. k, m, g)●Drops may also be reduced oreliminated by increasing switchrateand/or aggrate


Trick: Tracking object lifetime●●●Assign timestamp to an associativearray indexed on memory address uponreturn from mallocIn entry to free:– Predicate on non-zero associative array element– Aggregate on stack trace– quantize current time minus stored timeNote: eventually, long-lived objects willconsume all dynamic variable space


Trick: Rates over time●●●For varying workloads, it can be usefulto observe changes in rates over timeThis can be done using printa andclear out of a tick probe, butoutput will be by time – not byaggregated tupleInstead, aggregate with lquantize ofcurrent time minus start time (fromBEGIN enabling) divided by unit time


Tip: Using system●●Use the system action to execute acommand in response to a probeTakes printf-like format string andarguments:●#pragma D option quiet#pragma D option destructiveio:::start/args[2]->fi_pathname != “” &&args[2]->fi_pathname != “”/{system(“file %s”, args[2]->fi_pathname);}


Gotcha: Using system● system is processed at user-level –there will be a delay between probefiring and command execution,bounded by the switchrate●Be careful; it's easy to accidentallycreate a positive feedback loop:●<strong>dtrace</strong> -n 'proc:::exec{system(“/usr/ccs/bin/size %s”, args[0])}'●To avoid this, add a predicate to above:●/!progenyof($pid)/


Trick: system(“<strong>dtrace</strong>”)●●●In DTrace, actions cannot enable probesHowever, using the system action,one D script can launch anotherIf instrumenting processes, steps canbe taken to eliminate lossiness:– stop in parent– Pass the stopped process as an argument to thechild script– Use system to prun(1) in a BEGIN clause inthe child script


Tip: -c option●●●●To observe a program from start tofinish, use “-c cmd”$target is set to target process ID<strong>dtrace</strong> exits when command exits# <strong>dtrace</strong> -q -c date-n 'pid$target::malloc:entry{@ = sum(arg0)}'-n 'END{printa(“allocated %@d bytes\n”, @)}'Fri Feb 11 09:09:30 PST 2005allocated 10700 bytes#


Gotcha: Stripped user stacks●●●When using the ustack action,addresses are translated into symbolsas a postprocessing stepIf the target process has exited, symboltranslation is impossibleResult is a stripped stack:●# <strong>dtrace</strong> -n syscall:::entry'{ustack()}'CPU ID FUNCTION:NAME0 363 resolvepath:entry0xfeff34fc0xfefe4faf0x80474c0


Tip: Avoiding stripped stacks●●●With the “-p pid” option, <strong>dtrace</strong>attaches to the specified process<strong>dtrace</strong> will hold the target process onexit, and perform all postprocessingbefore allowing the target to continueLimitation: you must know a prioriwhich process you're interested in


Trick: Using stop and ustack●●●If you don't know a priori whichprocesses you're interested in, you canuse a stop/system trick:– stop in syscall::rexit:entry– system(“prun %d”, pid);Any user stacks processed beforeprocessing the system action will beprinted symbolicallyThis only works if the application callsexit(2) explicitly!


Gotcha: Slow user stacks●●●If neither -p or -c is specified, processhandles for strack symbol translationare maintained in an LRU grab cacheIf more processes are being ustack'dthan handles are cached, user stackpostprocessing can be slowedDefault size of grab cache is eightprocess handles; can be tuned viapgmax option


Tip: Ring buffering and -c/-p●●●●●Problem: program repeatedly crashes,but for unknown reasonsUse ring buffering by settingbufpolicy to ringRing buffering allows use on longrunningprocessesFor example, to capture all functionscalled up to the point of failure:<strong>dtrace</strong> -n 'pid$target:::entry'-x bufpolicy=ring -c cmd


Gotcha: Deadman●●●DTrace protects against inducing toomuch load with a deadman that abortsenablings if the system becomesunresponsive:<strong>dtrace</strong>: processing aborted: Abort due to systemicunresponsivenessCriteria for responsiveness:– Interrupt can fire once a second– Consumer can run once every thirty secondsOn a heavily loaded system, a deadmantimeout may not be due to DTrace!


Tip: Tuning the deadman●●If the deadman is due to residual load,the deadman may simply be disabledby enabling destructive actionsAlternatively, the parameters for thedeadman can be explicitly tuned:– <strong>dtrace</strong>_deadman_user is user-levelreponsiveness expectation (in nanoseconds)– <strong>dtrace</strong>_deadman_interval is interruptresponsiveness expectation (in nanoseconds)– <strong>dtrace</strong>_deadman_timeout is the permittedlength of unresponsiveness (in nanoseconds)


Trick: Stack filtering●●●Often, one is interested in a probe onlyif a certain function is on the stackDTrace doesn't (yet) have a way to filterbased on stack contentsYou can effect this by using thread-localvariables:– Set the variable to “1” when entering thefunction of interest– Predicate the probe of interest with the thread-– Don't forget to clear the thread-local variable!


Trick: Watchpoints via pid●●●Problem: you know which data is beingcorrupted, but you don't know bywhomPotential solution: instrument everyinstruction, with stop action andpredicate that data is incorrect valueOnce data becomes corrupt, processwill stop; attach a debugger (or usegcore(1)) to progress towards theroot-cause...


Trick: Measuring DTrace●●Can exploit two properties of DTrace:– Clause-local variables retain their values acrossmultiple enablings of the same probe in thesame program– The timestamp variable is cached for theduration of a clause, but not across clausesRequires three clauses:– Assign timestamp to clause-local in 1 st clause– Perform operation to be measured in 2 nd clause– Aggregate on difference between timestampand clause-local in 3 rd clause


Trick: Iterating over structures●●●●To meet safety criteria, DTrace doesn'tallow programmer-specified iterationIf you find yourself wanting iteration,you probably want to use aggregationsIn some cases, this may not suffice...In some of these cases, you may beable to effect iteration by using atick-n probe to increment anindexing variable...


Gotcha: Unsporting libraries●●●Regrettably, on x86 there are compileroptions that cause the compiler to notstore a frame pointerThis is regrettable because theselibraries become undebuggable: stacktraces are impossibleLibrary writers: don't do this!– gcc: Don't use -fomit-frame-pointer!– Sun compilers: avoid -xO4; it does this bydefault!


Gotcha: Unsporting functions●●●●●Some compilers put jump tables in-linein program textThis is a problem because dataintermingled in program text confusestext processing tools like DTraceDTrace always errs on the side ofcaution: if it becomes confused, it willrefuse to instrument a functionMost likely to encounter this on x86Solution to this under development...


Gotcha: Unsporting apps●●●Some applications have strippedsymbol tables and/or static functionsMakes using the pid provider arduousCan still use the pid provider toinstrument instructions in strippedfunctions by using “-” as the probefunction and the address of theinstruction as the name:●# <strong>dtrace</strong> -n pid123::-:80704e3<strong>dtrace</strong>: description 'pid123::-:80704e3' matched 1probe


Trick: sizeof and profiling●●●sizeof historically works with typesand variablesIn DTrace, sizeof(function) yieldsthe number of bytes in the functionWhen used with profile provider,allows function profiling:profile-1234hz/arg0 >= `clock &&arg0


Trick: Using GCC's preprocessor●●●-C option uses /usr/ccs/lib/cpp bydefault, a cpp from Medieval TimesSolaris 10 ships gcc in /usr/sfw/bin soa modern, ANSI cpp is available withsome limitations (#line nesting broken)To use GCC's cpp:# <strong>dtrace</strong> -C -xcpppath=/usr/sfw/bin/cpp -Xs -s a.d● Needed when .h uses ANSI-isms like ##●Also useful for M4 propeller-heads


Gotcha: $target evaluation●●●When using the -c option, the childprocess is created and stopped, the Dprogram is compiled with $target setappropriately, and the child is resumedBy default, the child process is stoppedimmediately before the .init sectionsare executedIf instrumenting the linker or a library,this may be too late – or too early


Tip: Tuning $target evaluation●●Exact “time” of D program evaluationcan be tuned via the evaltime optionevaltime option may be set to one ofthe following:– exec: upon return from exec(2) (first instruction)– preinit: before .init sections run (default)– postinit: after .init sections run– main: before first instruction of main() function


Gotcha: Data model mismatch●●●●By default, D compiler uses the datamodel of the kernel (ILP32 or LP64)This may cause problems if includingheader files in instrumenting 32-bitapplications on a 64-bit kernelAlternate data model can be selectedusing -32 or -64 optionsIf alternate model is specified, kernelinstrumentation won't be allowed


Gotcha: Enabled probe effect●●●●When enabled, DTrace (obviously) has anon-zero probe effectIn general, this effect is sufficientlysmall as to not distort conclusions...However, if the time spent in DTraceoverwhelms time spent in underlyingwork, time data will be distorted!For example, enabling both entry andreturn probes in a short, hot function


Tip: Sample with profile●●●When honing in on CPU time, use theprofile provider to switch to asample-based methodologyRunning with high interrupt ratesand/or for long periods allows for muchmore accurate inference of cycle timeAggregations allow for easy profiling:– Aggregate on sampled PC (arg0 or arg1)– Use “%a” to format kernel addresses– Use “%A” (and -p/-c) for user-level addresses


Trick: Higher-level profiling●●●●In interrupt-driven probes, self->denotes variables in the interruptthread, not in the underlying threadCan't use interrupt-driven probes andpredicate based on thread-localvariables in the underlying threadDo this using an associative array keyedon curlwpsinfo->pr_addrCan use this to profile based on higherlevelunits (e.g. transaction ID)


Gotcha: vtimestamp●●●●vtimestamp represents the numberof nanoseconds that the current threadhas spent on CPU since some arbitrarytime in the pastvtimestamp factors out time spent inDTrace – the explicit probe effectThere is no way to factor out theimplicit probe effect: cache effects, TLBeffects, etc. due to DTraceUse the absolute numbers carefully!


Gotcha: Fixed-length strings●●●D string type behaves like this C type:typedef struct {● char s[n]; /* -xstrsize=n, default=256 */} string;Implications:– You always allocate the maximum size– You always copy by value, not by reference– String assignment silently truncates at size limitUsing strings as an array key or in anaggregation tuple is suboptimal if othertypes of data are available


Tip: Demo DTrace scripts●●●/usr/demo/<strong>dtrace</strong> contains all ofthe example scripts from thedocumentationindex.html in that directory has alink to every script, along with thechapter that contains itDTrace demo directory is installed bydefault on all Solaris 10 systems


Team DTraceBryan CantrillMike ShapiroAdam Leventhal<strong>dtrace</strong>-core@kiowa.eng.sun.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!