Fighting Institutional Memory Loss: The Trackle Integrated ... - Usenix
  • No tags were found...

Fighting Institutional Memory Loss: The Trackle Integrated ... - Usenix

Fighting Institutional Memory Loss:The Trackle Integrated Issue andSolution Tracking SystemDaniel S. Crosta and Matthew J. Singleton – Swarthmore College Computer SocietyBenjamin A. Kuperman – Swarthmore CollegeABSTRACTFor part-time sysadmins, a record of past actions is an invaluable tool that provides guidancein repairing or extending system services. However, requiring sysadmins to keep a detailed log ofchanges made to a live system can often seem like a low priority task when compared toaddressing long and growing to-do lists. This problem is worse if the system administrator is apart-time volunteer and an overworked student. In this paper we present Trackle, an integratedtrouble ticket and solution tracking system which takes the legwork out of creating andmaintaining this sort of institutional memory. Furthermore, Trackle is designed to allow untrainedstudent sysadmins to bootstrap their knowledge by peeking over the shoulders of their moreexperienced colleagues – even if those colleagues graduated years earlier. We accomplish this bytracking the exact actions taken by sysadmins, showing what lines were changed and in whichconfiguration files. We allow experienced and inexperienced sysadmins alike to freely annotateand cross-reference these shell session logs through an integrated Wiki web interface.IntroductionThe Swarthmore College Computer Society(SCCS) is a group of student volunteers who provideservices to more than 2,000 current students, alumni,faculty, and staff of Swarthmore College. We provideUNIX shell accounts, email, group mailing lists, webspace for individuals and student groups, Wikis, databaseaccess, and general computer expertise. We alsomaintain a public lab of eight Debian GNU/Linux andtwo Apple Mac OS X computers. The sysadmins meetweekly for an hour to keep up to date on projects,problems, and planned improvements, and communicatevia email between meetings.Unlike many large UNIX administration setups,the SCCS has no full-time trained staff. The studentsysadmins are hired once a year, usually in groups oftwo or three, and most graduate after serving as asysadmin for two years. At any time we have abouteight sysadmins on staff. Since a part of SCCS’s missionis to provide an environment in which interestedstudents may learn the art of system administration,we recruit students with technical backgrounds rangingfrom ‘‘I can email and browse the web,’’ to ‘‘mydesktop PC is a Beowulf cluster.’’ This disparity inbackground and the high turnover rate presents uswith the dual problems of how to preserve our collectiveknowledge and how to train future generations.For many years, our primary means of trainingsysadmins has been the SCCS staff email list andarchived diaries. The diaries are a cluttered repository;we use the staff email list as often to tease one anotheras to discuss pertinent policy, administrative issues, orhelp requests from our users. Though sysadmins aresupposed to report changes to our servers and labmachines to the list, we often have to rediscover particularconfigurations when problems arise. Furthermore,mbox mailboxes are a hard format to browse,particularly for new, inexperienced sysadmins who areeager to learn but more comfortable browsing the webthan navigating the sometimes murky waters of UNIXshell prompts, commands, and filesystems.In the last three years, SCCS staff have used aprivate Wiki to document changes, configurations, andinternal policies as well as for collaborative planning.We have found the unconstrained and highly interlinkedenvironment very valuable to our operations,but relatively little of the content on the Wiki is pertinentsystem configurations or records of changesmade. As with the email diaries, the extra time andeffort required to self-report changes made to our systemspresents too high a hurdle for our volunteersysadmins, each of whom has a range of other academicand extra-curricular commitments.For training, we needed a system that preservedthe ability to associate high level conceptual descriptionsof desired configurations or problems with theparticular shell actions or configuration file changes toput those ideas into effect. Unlike the Wiki and emaillist, we wanted a system that did not require sysadminsto take on the extra burden of having to write upcomplicated records of their actions after long, sometimesfrustrating problem-solving sessions. Furthermore,since our sysadmins are hired without anyrequirement of previous experience, we wanted a toolwhich could help new sysadmins learn the subtleties20th Large Installation System Administration Conference (LISA ’06) 287

Fighting Institutional Memory Loss ...Crosta, Singleton, & Kupermanof UNIX administration and our particular setup withoutrequiring any former familiarity with UNIX.To respond to these concerns, we createdTrackle, a web- and console-based integrated issueand solution tracking system. Each problem reportedto the system, either by our users or by SCCS staff,creates a ticket in the issue database. Open issues maybe viewed through the web where they can be freelyannotated, edited, and cross-referenced with the builtinWiki; or through a console interface, from which atracked shell session may be started. All actions takenin the shell session, and all files changed, are recordedin the Trackle database, and create an entry in theticket history log. These shell session logs are theneditable in Wiki fashion.Existing and Related WorkWe first attempted to discover if any existingtools could be used to solve this problem. Since asysadmin’s job is primarily task-oriented, we beganour search with trouble ticket and issue tracking systems.Issue tracking systems seem to have lost appealin the literature since the late 1990s. Before that, therewere one or two issue tracking papers per year inLISA, most of which explored different extensions tothe basic trouble ticket idea.One early tool, Request 2 [8], was designed witha pedagogy of training junior system administrators inmind. With Request 2, senior sysadmins could divideand delegate work to less experienced admins, andoversee their work or offer advice.PLOD, the Personal LOgging Device [6],enables sysadmins to easily self-report by providing aset of command-line Perl scripts which the sysadminmay use to make notes to him or herself about workjust completed. PLOD simplifies the tasks of keepingrecords consistently and making them centrally available.PLOD did not include, and was not integratedwith, any issue tracking system.The LOS Task Request System [9], designedseveral years ago by another SCCS sysadmin, allowsusers to create ticket-like task requests through a webinterface. Each task request is associated with a taskdescription, which includes validation logic and thecommands required to execute the task. This alleviatessysadmin workload by automating the tedious tasks ofcollecting and verifying information for repetitivechanges to the system.A related field of more current interest is configurationmanagement and version control. Configurationmanagement systems attempt to centralize orabstract configuration details for large numbers ofsimilar hosts. Tasks include initial setup, maintenance,troubleshooting, and incorporating local changes backinto the global description. For a more careful considerationof the history and future of configuration management,see [1, 3, 7].The SCCS currently has no plans to use a configurationmanagement system at our site. Our two primaryservers do not share enough common configurationto yield much advantage from such a system, whileour public lab machines, which are configured similarly,are periodically reinstalled to a known consistentstate for security reasons. For these reasons, we do notbelieve we would gain enough benefit from a configurationmanagement system to justify the setup effort.Much of the time our sysadmins spend workinginvolves user-owned files, such as spam filter settings,dotfiles, and web access configurations. Since thesefiles may be created and deleted by users at any time,a configuration management system would not be agood choice for monitoring changes to these files.Moreover, though configuration managementmay be considered an industry best practice, webelieve no automatic configuration system can adequatelyreplace a sysadmin’s ability to get dirty withmanual configuration and hand-tuning of UNIX servicesand applications. We would rather our sysadminslearn how to discover a problem at its source andimplement a solution appropriately than merely learnhow to control a single piece of software, no matterhow powerful it may be.One of the chief goals of Trackle is to provide aplatform for training inexperienced sysadmins. Webelieve that it is important to gain an understanding ofUNIX cause and effect by observing and learningabout configuration files, logs, commands, and scripts.If new sysadmins only learn to rely on abstract configurationmanagement systems, they might find themselvesunprepared to deal with problems outside thescope of those systems.None of the systems we considered deployingcovered our needs for a system to track issues, actions,and train new sysadmins. A ticket tracking systemwould help organize and abstract the institutionalmemory knowledgebase, but would still require selfreporting.A configuration management system wouldkeep accurate records for the files that it tracked, butwould not provide a good platform for training incomingsysadmins. Having searched and failed to find justwhat we wanted gave us a chance to reevaluate ourproblems, and we decided that only by implementing acustom solution could we meet our requirements.Design of TrackleFrom the beginning, the design of Trackle hasbeen motivated by a philosophy of minimalism andsimplicity. We believe that the most useful system forour needs is one that imposes as little as possible onour sysadmins in terms of new workflows and interfacesto learn. We want Trackle to stay out of the wayof sysadmins, but at the same time to automate asmuch of its own data collection and presentation tasksas possible.288 20th Large Installation System Administration Conference (LISA ’06)

Crosta, Singleton, & Kuperman Fighting Institutional Memory Loss . . .It is always difficult to find the right balance ofthe automatic and the manual, the consistent and thecustomizable. Given the sort of information Tracklemanages, we believe the right place for the cut is atdata collection and initial presentation. Creating thecomplex knowledge structures through annotation andcross referencing is left to the sysadmins as a secondarytask, since it is not one that a computer islikely to do well.Tr a c k l e is intended for smaller groups of sysadminswith a relatively low volume of requests. It isdesigned to be used effectively by both expert and inexperiencedsysadmins. Trackle’s information organizationmethods are best suited to small groups who havetime to annotate and cross-reference collected data. Thegoal of allowing this rehashing and reorganization ofinformation is to provide a platform for self-directedtraining of inexperienced sysadmins on their own time.RequirementsTw o classes of users: Since the SCCS naturallyhas two classes of users, sysadmins and SCCS endusers, Trackle was designed to accommodate bothtypes. Differentiating the web interface for the twotypes of users allows us to improve both security andconvenience. Unauthenticated end users are denied sensitiveinformation, such as email addresses in tickets orfiles in logged shell sessions, and are not shown potentiallyconfusing prompts when creating tickets. Sysadmins,on the other hand, need access to all the informationabout tickets and shell logs, and should knowenough to understand the more detailed ticket attributes.Low barriers to use: One problem with existingticket tracking systems is the often overwhelmingamount of information that is requested to submit aticket. With Trackle, we wanted to make filing a ticketas easy as possible so that end users and sysadmins alikecan move quickly through the web forms. Our interfacesare designed to be intuitive so that they can be usedwithout having to waste time reading documentation.For end users, we wanted Trackle to be as easyto use the first time as the tenth. End users do not haveto register accounts with Trackle to create or subscribeto tickets. Instead we use an email confirmation systemto verify an end user’s identity.High-level organizational tools: Trackle’s centralgoal is to provide a framework for flexible representationof the collected data. It is designed to facilitateWiki-like annotation and cross-referencing ratherthan imposing a fixed organizational structure. Objectsin Trackle support Wiki formatting, enabling them tolink to each other. With these tools, the recorded datacan be arranged, indexed, and presented in the waysmost useful to the sysadmins who need access to it.Few dependencies: Trackle was designed to bedependent on as few external software packages aspossible. It is specifically designed to work with thepackages and versions available in the Debian SargeGNU/Linux distribution (our deployment platform);however, we wanted to make Trackle freely availablefor any interested groups for use on many platforms.To ensure easy portability, it is not closely tied to anyparticular versions or packages.Ticket SystemAs discussed earlier, we decided to use a tickettracking system as the basis for Trackle’s higher-levelcategorization of captured data. We briefly consideredRT, 1 but decided that it would be too confusing andcluttered with information to be useful to inexperiencedsysadmins. Instead, we decided to use a relativenewcomer to the trouble ticket world, Trac, 2 as ourfoundation.Trac is a BSD-licensed bug tracking system withintegrated version control and Wiki. Trac embodies aminimalistic approach to ticketing which fit well withour design goals: it presents a clean, interface intuitiveto both our end users and sysadmins, and has relativelyfew features and ticket attributes specific to bugtracking. Many of the features we wanted (easy annotationof tickets and shell histories, high-level organizationand cross-referencing through the integratedWiki, file change visualization) were already implemented,and adding our own functionality was veryeasy due to the well-organized Trac API.However, Trac required more than cosmeticchanges to fit our needs. Despite its flexible permissionsystem, Trac does not natively support multipleclasses of users, nor does it have ticket locking. Additionally,we had to modify the ticket status logic toaccommodate unconfirmed tickets.Tracking File ChangesTracking file changes was the biggest challengeencountered when designing Trackle. Because we donot use any configuration management or version control,we needed to ensure we could get a before andafter view of each file modified during a sysadmin’sshell session. We considered several alternative strategiesbefore eventually settling on a custom interpositionlibrary. The possibilities may be roughly split intokernel-space solutions (LVM snapshots, UnionFS,FUSE, C2 audit logs) and user-space solutions (plugins/historyinterpretation and library interposition).LVM snapshots: The Linux Logical VolumeManager 3 (LVM) provides a way to create a timefrozensnapshot of a volume. In Trackle, we could usea LVM Snapshot to capture the pre-shell tracking sessionstate of the system and then compare the twofilesystems at the end of the session. LVM worksbelow the filesystem and captures changes by trackingchanged disk blocks rather than changed files. To useLVM with Trackle we would have to relate changed1 Large Installation System Administration Conference (LISA ’06) 289

Fighting Institutional Memory Loss ...Crosta, Singleton, & Kupermandisk blocks to changed files, or compute a recursivediff between the two hierarchies; the former would beprohibitively hard and would be tied to particularfilesystem implementations, and the latter would beprohibitively slow. It is also unclear how to differentiatewhat files were changed by the user during theshell session and which files were merely changed byother users or processes during that time. Additionally,it would impose LVM as a runtime dependency forTrackle and limit Trackle to use on Linux.UnionFS: UnionFS 4 is a meta-filesystem thatstacks one or more existing, mounted filesystems intoone hierarchy. It supports copy-on-write so that a readonlyfilesystem can be made to appear as a read-writefilesystem. This functionality is used by many LiveLinux Distributions, like SLAX 5 and KNOPPIX. 6When operating on a file in a UnionFS stack, the topmostinstance of the named file is used. This meansthat in order to keep a initial copy of the file forTrackle’s use, the underlying filesystem would have tobe mounted read-only. Since it would be impractical toremount the entire filesystem hierarchy read-only,Trackle would have to create a read-only bind mountof the root hierarchy, to use as the bottom of theUnionFS stack. On top of that would be placed a readwritefilesystem mounted elsewhere to capture therevised versions of files. Because this third filesystemis hierarchically removed from the active root filesystem,any changes made during the shell tracking sessionwould not be reflected to the root filesystemunless manually copied (for instance, by a specialTrackle command) or when synchronized at the end ofthe session. Additionally, UnionFS requires the insertionof a new kernel module, which some sysadminsmay be hesitant to allow. At this time, UnionFS isonly available for Linux.FUSE: The Filesystem in Userspace 7 (FUSE)technique works similarly to the interposition libraryapproach we finally adopted. By use of a special kernelmodule, some or all filesystem-related library callscan be delegated to a user space daemon. Tracklecould use FUSE to make note of which files the user isaccessing and copy the files’ initial state before returningfrom the user’s library call. This provides aboutthe same level of flexibility as using an interpositionlibrary. Using FUSE requires loading a kernel moduleand mounting and unmounting filesystems. In order touse FUSE with Trackle, we would have to create additionalsetuid-root binaries to handle these tasks. FUSEis currently only available for Linux and FreeBSD.C2-like audit logs: Many operating systemsinclude auditing facilities to track file accesses as oneof the requirements for a trusted computer system [10,2]. There are projects to bring this type of auditing to4 such as SNARE for Linux. 8 and SELinux 9These systems require kernel modifications, and aretypically enabled for the entire system instead of just aprocess and its subprocesses. Consequently, it is difficultto dynamically enable auditing, which results inthe generation of overwhelming amounts data. Additionally,these audit logs only notify us that a file hasbeen modified after the fact, limiting our ability todetermine what changes were made.Plugins and shell history interpretation: In theoryit is possible to capture the majority of file changesmade by a sysadmin by writing plugins for the editorsVim and Emacs to record files opened or saved, andby looking at the command history of the shell to inferwhich files might have been modified. This approachcan never be more than approximate as there are manyother ways to change a file other than by these twoeditors. Even writing plugins for these editors will notcapture changes made to underlying files by wrapperprograms like vipw that act on temporary files. Additionally,editor plugins do not capture file changesmade by users at the command line either by shell redirection(‘‘echo stuff >> outfile’’) or file-related commands(mv, cp, rm, chmod, etc.).While parsing a shell history file might be able todiscover these kinds of changes, it would have to happenafter the commands changing the files had run.This would prevent us from tracking file changes, andthe list of command patterns to check would be quitelarge. Wrapping all file change tools and implementinga custom shell would impose a large (if not insurmountable)maintenance task.Library interposition: Most binary executableson UNIX are dynamically linked and access filesthrough standard C library functions (open, unlink,chmod, etc.) The dynamic linker/loader checks theenvironment variable LD_PRELOAD for a list of sharedlibraries to be loaded and searched before all others.These libraries are called interposed libraries becauseany function that is defined in one of these librariesintercepts the call to standard libraries.We use a custom interposition library to interceptfile-related library calls so that we can track changes tofiles during a shell session. When we intercept the call,we are able to gather information about the currentstate of the file before passing the request on to the reallibrary call. We are also able to observe the resultingchanges and collect additional information as needed.We decided to use an interposition library for anumber of reasons. It can be written so that it onlycaptures the events we are concerned with, includingprivileged operations (using sudo, su, etc.). It is auserspace solution, so it is not dependent on any particularkernel and is portable to most other UNIX platforms.Finally and perhaps most importantly, themajority of the functionality was already present in a8 20th Large Installation System Administration Conference (LISA ’06)

Crosta, Singleton, & Kuperman Fighting Institutional Memory Loss . . .component of Audlib [4, 5] designed to track user andsysadmin actions for detecting abuse of authority.Trackle ArchitectureTrackle consists of four functional components,each of which communicates with a central database:1. The web interface, which allows full access tothe ticket database, Wiki pages, and shell sessionlogs. It is the primary interface to allTrackle data.2. The console tools, which allow sysadmins toaccess the ticket database, and begin shelltracking sessions to capture changes made toresolve a problem.3. The interposition library, which is responsiblefor tracking changes to files during a trackedshell session.4. The email notification system, which keeps endusers and sysadmins informed of ticket statuschanges, and the email confirmation systemwhich enables unauthenticated end users tointeract with aspects of the web interface.The web interface, console tools, and email systemare all written in Python and communicatedirectly with the central PostgreSQL 10 database. The10 tracking session is written in C, and communicatesthrough the console tools (see Figure 1).Figure 1: Trackle consists of four components whichstore their data in a central PostgreSQL database.Web InterfaceThe Trackle web interface provides full access totickets, shell logs, and an integrated Wiki. The webinterface has been designed with simplicity and integrationin mind. Like most of Trackle, the web interfaceis written mostly in Python. Trackle uses Clearsilver11 for its HTML templates. The web interface recognizestwo classes of users, anonymous end usersand authenticated sysadmins. Because they are notrequired to log in, end users are required to completeemail confirmation when creating tickets. Sysadmins11 2: The ticket creation form for anonymous users is streamlined so that a user can file a ticket quickly. Theset of fields displayed and the explanation shown next to each is configurable.20th Large Installation System Administration Conference (LISA ’06) 291

Fighting Institutional Memory Loss ...Crosta, Singleton, & Kupermanwill be able to authenticate with the system, whichwill allow them to perform privileged tasks.Tickets: Trackle’s integrated issue tracking systemis more than just a to-do list. It provides the initialframework for organizing automatically collecteddata, linking an abstract description of a problem andthe concrete steps required to solve it. All users maycreate tickets. End users are presented with a streamlinedform (Figure 2) requesting:• contact email address• brief problem description• problem type• problem severityIn addition to these fields, authenticated sysadminsare prompted for more information which wouldnot be relevant to end users:• priority• sysadmin assignment• keywords• subscription listTicket browsing is also available to anonymousand authenticated users. Again, the two user groupshave very different views. Anonymous users are presentedwith a minimal amount of information aboutthe ticket, for convenience, security, and privacy. Theycan also subscribe to existing tickets. Authenticatedusers have access to the entire ticket history, have theability to change the ticket fields, and can close,reopen, and assign tickets.Wiki pages: Trac’s integrated Wiki required verylittle modification for Trackle. A Wiki enables users toadd, remove, and modify content easily and quicklydirectly through their web browser using a simple formattinglanguage. Wiki pages are visible to bothauthenticated and anonymous users, but are only modifiableby authenticated users and can be made private.Most long text fields in Trackle support Wiki formattingto allow even more possibilities for cross-referencingand annotation.Shell session logs: All the information recordedin a shell tracking session is presented in the webinterface in a shell session log (Figure 3). They areaccessible to authenticated sysadmins through anindex where they are sorted by ticket, with the mostrecently added shell log appearing first. The logs arealso linked individually from the associated ticket. Inkeeping with the ideal of providing the most functionalitywhile imposing the least structure, Trackle shellFigure 3: The shell session log displays all the data that was recorded during the shell session. At the top, the shellhistory, start and end times, and environment variables are displayed. Below, the contents of files that werechanged are shown with deletions highlighted in red and additions in green.292 20th Large Installation System Administration Conference (LISA ’06)

Crosta, Singleton, & Kuperman Fighting Institutional Memory Loss . . .logs are editable and support Wiki formatting. TheWiki-like nature of the shell logs allows for easyannotation and cross-referencing.After the Wiki portion of the log (which containsshell history, environment variables, and any otherpertinent shell data), the modified files are displayedas colored diffs. All of the modified files are stored inthe database, but not all are displayed by default. Theshell log edit page provides a list of all files fromwhich individual files can be selected for display.Console ToolsThe Trackle console tools consist of severalscripts supporting the main interface, trackle-cli. Likemost of the rest of Trackle, the console tools are writtenin Python for easy extensibility and maintenance.Trackle-cli: The primary interface to the ticketdatabase from the console is trackle-cli, a screen-orientedprogram for the curses environment. It is writtenusing DTK, 12 a Pythonic wrapper for the cursesPython module. Following our goal of minimalisticinterfaces, trackle-cli is designed to give enough informationthat sysadmins can quickly find a ticket andbegin working on it, but not so much that they arebogged down by text-filled screens of details. The twoprimary screens of trackle-cli are the ticket overviewand ticket detail screens.The ticket overview screen (Figure 4) lists theopen tickets, showing for each the values of a configurableset of ticket fields. The default set shows theticket’s numeric ID, summary, owner, and time of lastchange. Sysadmins navigate this list using the keyboardand may either select an existing ticket or createa new ticket.12 ticket detail screen (Figure 5) shows themost pertinent details of an individual ticket, withshort ticket fields displayed above the fold and thelonger ticket description shown below in unformattedWiki text. Some ticket information, notably the ticketchange log, is not available through the console interfacedue to space constraints.Trackle-cli supports editing of all visible informationin the ticket detail screen. After selecting a field,the focus is moved to either a text editing field or anenumerated selection field which displays appropriateavailable values. Enumerated fields can have possiblevalues added to them through the web interface ortrackle-admin program. Ticket editing in trackle-cli isdesigned to allow a sysadmin to correct minor errorsfound in a ticket – full-featured editing is available inthe web interface.Sysadmins can edit the ticket description usingthe editor set in the VISUAL or EDITOR environmentvariable. When the editor terminates, the descriptiondisplay area is updated with the new value. If thesysadmin wishes to save the changes back to theTrackle database, trackle-cli prompts for a ticketchangelog message using the same method.From the ticket detail screen, the sysadmin canstart a tracked shell session. Trackle-cli is suspendedand replaced by a shell with a modified environment.Upon completion of the shell session, the sysadmin isshown a list of all the files that were modified duringthe session (see Figure 6). The sysadmin then selectsfiles to display in the web shell session log. This selectionmay be changed later through the web interface.After selecting the files and saving the session log, thesysadmin is returned to the ticket detail screen.Figure 4: Trackle-cli’s ticket overview list, sorted by change date. The ‘‘Help Bar’’ at the top lists currently activekeybindings. Trackle-cli is designed to work in an 80 × 24 character window.20th Large Installation System Administration Conference (LISA ’06) 293

Fighting Institutional Memory Loss ...Crosta, Singleton, & KupermanTr a c k l e - s h e l l - p r o m p t : Our experience at theSCCS with the beta release of Trackle showed that evenexperienced sysadmins occasionally forgot whether theywere in an ordinary shell or in a tracked shell spawnedby trackle-cli. To address this, we created trackle-shellprompt,a helper script which detects whether the user isrunning inside a tracked shell session by checking forcertain environment variables. If so, it prepends theprompt with ‘‘Trackle’’ in green boldface. Trackle-shellpromptalso sets its exit code to 0 when in a shell trackingsession, or 1 when not, so that it can be used inshell scripts.Trackle-admin: This tool is used to configure therun-time settings of Trackle stored in the central database.Trackle-admin is used to add, update, or removeaccounts (only sysadmins need an account for Trackle– end users may use the web interface without authentication),and the following enumerated ticket attributefields: component (a brief description of the area ofthe system affected by a problem), milestone, priority,user severity, and ticket type. Wiki pages can beimported from or exported to plain text files. Liketrackle-shell-prompt, the exit code returned by trackleadminis set to 0 on success, or another value on error.Figure 5: Trackle-cli displays the most pertinent ticket details, and allows simple editing to correct any mistakes.From this screen the sysadmin may begin recording shell actions to be associated with this ticket.Figure 6: Trackle-cli allows the sysadmin to decide which files will be displayed in the web version of the shell sessionlog.294 20th Large Installation System Administration Conference (LISA ’06)

Crosta, Singleton, & Kuperman Fighting Institutional Memory Loss . . .Shell Session TrackingThe ability to track shell sessions sets Trackleapart from previous solutions. The tracking systemremoves the burden of self-reporting from the sysadmin.Relying on the sysadmin to report exactly whatchanges were made can be problematic. After a longsession, even the most conscientious sysadmin may beunable to remember exactly what they did, let alonerecord it accurately. A seemingly inconsequentialchange made early on may be overlooked after tacklinga more frustrating issue.Recording incorrect or incomplete informationdefeats the purpose of tracking changes because itreduces the accuracy of the report and degrades theutility of the system as a training tool. The goal ofTrackle is to automate as much record collection aspossible in order to avoid human error.To provide an accurate summary of a shell session,Trackle collects information about the session:• start and end time• environment variables• shell history• copies of all created, removed, or modified filesMost of these items are easy to collect usingbuilt-in Python commands. The history of shell commandscan be obtained from the bash shell. 13 Thisinformation is stored in a state directory created foreach tracked shell session.Recording file changes is the tricky part. For thiswe use library interposition. Our interposition library,libtrackle, is based on the work done by Kuperman andPaksoy for Audlib. We can intercept library calls bywriting our own version of the call in question withthe same signature. Effectively, we wrap the librarycall with additional functionality to support Trackle. Inthis function we record any information we need andthen pass the call on. This whole process is transparentto the running program, and as long as the parametersand return values are passed through to the real versionwithout modification, interposition does notchange the program’s behavior.In order to catch file modifications we need tointercept file-related calls (fopen, chmod, etc.) beforethey get to the kernel so that we can make an initialcopy of the file in question. Once we have a copy, weallow all further calls on that file to pass through to thekernel without logging. Just before the program terminates,we make another copy of the file to compareagainst the original version.In addition to intercepting calls, interpositionallows us to define functions that are called when thelibrary is first loaded and when the process terminates.We use the initialization routine to cache local statevariables originally set in the environment by the13 Trackle currently only supports the bash shell. Supportfor tcsh is planned for a future release.console tools. In the finalization function we iterateover all files that have been accessed during the executionof the program and make a final copy of allfiles which have been modified.A file might be modified several times during thecourse of a tracked shell session. We record the initialcopy only once but overwrite the final copy each timea program terminates. Therefore, the difference informationgenerated consists of changes made over theentire session.Some programs written with security in mindattempt to disable library interposition (for good reasons),and sysadmins are encouraged to use these sortsof tools (e.g., sudo) when changing system configuration.In order to collect these changes, we need to preventthese programs from disabling libtrackle. Forexample, the environment created for executing vim by‘‘sudo vim /etc/filename.cfg’’ lacks the LD_PRELOADenvironment variable, effectively disabling libtrackle.We could not find a way to override this behavior insudo without modifying its source code. We circumventedthis issue by overriding the exec library callswith our own versions that reset the LD_PRELOADvariable to the proper value before executing the program.14 This solution ensures that libtrackle is loadedfor all dynamically linked programs.Email Notification and ConfirmationEmail plays two roles in Trackle, providing notificationto sysadmins and end users, and email confirmationto prevent abuse of the web interface by unauthenticatedusers.Whenever a sysadmin changes a ticket’s status, likeclosing an open ticket or assigning a new ticket, a configurablelist of staff email addresses (in SCCS’s case,our staff email list address) is sent a notification indicatingthe status change and any comments or changedfields changed at the same time. Subscribed usersreceive an abbreviated version of this email containingjust the status change information and the comment.Trackle also provides the trackle-notifyscript which sends periodic updates to the staff emailaddresses. The email contains a list of all open tickets,details for each open ticket, and a list of ticket statuschanges since the previous periodic update was sent.Trackle-notify is designed to be run through cron oranother task scheduler, therefore it produces no outputunder normal circumstances.The email confirmation system allows end usersto confirm their identity without requiring them to get,remember, and use a separate set of user names andpasswords. When an end user creates a ticket or subscribesto an existing ticket, Trackle sends an email to14 For setuid programs, only honors values ofLD_PRELOAD that are in the default library search path andare also setuid. A user would have needed administrative accessto put such a library in place, so this is not a major vulnerability.20th Large Installation System Administration Conference (LISA ’06) 295

Fighting Institutional Memory Loss ...Crosta, Singleton, & Kupermanthe address the end user entered. Users click the linkin the email to make the action take effect.Sample WorkflowsTr a c k l e is designed with human interaction inmind, and as such any description of its componentsdoes not fully capture the feel of the application. In thissection we describe two sample workflows to demonstratethe expected usage of Trackle: 1) the lifecycle ofa typical ticket created by an end user, and 2) a sysadminannotating existing resources for future reference.Ticket LifecycleSuppose SCCS user Alice is unable to access theadministration page for her group’s email list. Shepoints her browser at Trackle’s ticket creation page, , and begins to fillout the form. She is prompted for her email address, abrief summary, a ticket type (software bug, softwarerequest, security issue, etc.), a user severity (that is,how severe the issue is to Alice) and a longer descriptionof the problem. Alice notices the link to a referenceon Wiki formatting, and helpfully formats herdescription into several sections including error outputshe received, a link to the page with the problem, etc.When Alice clicks the submit button, she is takento a screen informing her that she will receive anemail containing instructions and a link to confirm herticket, and that her email address has been automaticallysubscribed to the ticket for status updates. Withoutclicking this link, her ticket will not show up inticket reports, and the sysadmins will not be notified.Once she clicks the link, Trackle marks the ticket asconfirmed and emails the sysadmins to notify them ofthe newly created ticket.Now suppose SCCS user Bob is also unable toaccess the administration page for his group’s emaillist. Because he read the page on ticket etiquette, heknows to first use Trackle’s built-in search feature tosee if any open tickets are already filed for this problem.He notices the ticket Alice just created, and confirmsthat it is for the same issue he is facing, so hedoes not need to file his own ticket. But Bob is impatientand has important work to do on his list, andwants to know as soon as the problem is resolved. Fortunately,Trackle allows Bob to subscribe to the ticketAlice created, again using email confirmation to verifyhis identity.Now suppose SCCS sysadmin Melvin checkseither his email or the Trackle website and notices thenewly created ticket. Being our resident expert onMailman list administration, Melvin SSH’es to our primaryweb and email server, runs trackle-cli, and beginsworking on the ticket. Behind the scenes, trackle-cli haslocked his ticket so that no other sysadmin can beginworking on the same issue. His shell history as well asany files he changes are now being recorded, so as heinvestigates and solves the problem, all his actions arelogged. When he types ‘exit’ to leave the shell trackingsession and returns to trackle-cli, he is shown a list ofall the files whose contents or permissions he changed,and he can select which will be included in the log bydefault. When he saves the shell session log to thedatabase, the ticket is automatically unlocked.Since, by virtue of his Mailman wizardry, Melvinresolved the problem for Alice and Bob, he opens upthe ticket’s page in Trackle’s web interface, makessome remarks to be appended to the ticket’s history,and marks the ticket closed. Trackle then emails all thesubscribed users to notify them that the issue has beenresolved. This notification includes the remarksMelvin appended to the ticket’s history.Annotation and Cross-ReferencingSuppose SCCS sysadmin Wendy, our residentWiki enthusiast, decides it is time for her to learn moreabout Mailman. She begins by searching through theexisting Wiki pages, tickets and shell session logs tofind anything that might be related to Mailman lists.She creates a new Wiki page, which she protects fromanonymous access since it may contain sensitive informationthat should not be leaked to the public, andbegins adding links.In Trackle, all Wiki pages, tickets, shell sessionlogs, and milestones are linkable objects, and all havebuilt-in Wiki support so that any of them can link toany of the others. Wendy takes advantage of this bynot only including forward links from the new Wikipage to the related tickets and shell session logs, butby editing those objects to include return links to thenew page she is creating.By default, the shell session logs that are capturedby Trackle are pretty bare and contain just collectedinformation. Since Wendy is in no hurry, she isat ease to spend some time investigating the effects ofthe commands recorded in the shell history of the sessionlog created by Melvin above. She can integratethis information, along with links to other Trackleobjects, online documentation, or any website, by editingthe Wiki page at work behind each shell sessionlog. She can remove commands from the history thatare not pertinent so that the shell session log containsjust the relevant information for future reference. Shecan also change the set of files which are displayed inthe shell session log, overriding the defaults Melvinselected at the end of his shell tracking session.Future PlansAs with any large software project, the initialpublic release of Trackle is far from complete. As frequentlyhappens with new software, some of the mostinteresting features were not suggested until the SCCSstarted using Trackle, and others were inspired as abyproduct of the development process. Many of thesenew feature ideas that came to us mid-project made itin to the initial public release, but some did not. The296 20th Large Installation System Administration Conference (LISA ’06)

Crosta, Singleton, & Kuperman Fighting Institutional Memory Loss . . .following are planned features for Trackle that havenot yet been implemented.Ti c k e t extensions: Early feedback from SCCSsysadmins has shown that some extensions to theticket system may be useful, particularly the ability toexpress relationships among active tickets. A dependencyrelationship (the completion of ticket B can onlyhappen after ticket A is closed) could hide or deprioritizetickets depending on others, or emhpasize ticketswhich are depended on. A parent-child relationshipwould be used to group several related tickets (e.g.,configuration upgrades) under one parent (e.g., Linuxdistribution upgrades). Ticket due dates would enableTr a c k l e to automatically escalate a ticket’s importanceover time. Finally, private tickets (not visible to unauthenticatedend users) would be used to track sensitiveinformation such as hiring or policy debates.Multiple machine support: All of the componentsof Trackle interact with one central PostgreSQLdatabase. Currently, the Trackle console tools do notkeep track of host-specific info, so only one machinecan be tracked per instance of Trackle. Because PostgreSQLcommunicates over TCP, it should not be difficultto add network functionality to the console tools.It would then be possible to install the console tools onany number of properly configured servers and clients,and have them all report back to one central database.There are, however, some situations where a permachineinstances of Trackle might be useful. Tryingto track many disparate issues occurring on unrelatedmachines would become cumbersome and is unnecessary.An alternative that might provide the best of bothworlds would be a hierarchical approach. Rather thanstoring all the data on one central server, leave the datadistributed, but allow communication between theindividual instances of Trackle. For example, deployone network-wide overview server, one site-wideoverview server per site, and install Trackle individuallyon the machines at each site. This could help mitigatesome of the scaling issues that would come withvery large databases.File revision control: We briefly mentioned configurationmanagement tools, some of which includerevision control functionality. Trackle is based on theopen source program Trac, which includes tight integrationwith the Subversion revision control system,and could provide revision control for all files touchedduring shell session tracking. A straightforwardapproach of storing one revision per session would notwork, as the files in question may also changebetween tracked shell sessions. A better approach is tocreate a revision at each of the before and after statesfor each file in the repository.Further high-level abstractions: One recent innovationof Web 2.0 technologies is the establishment ofnew interface and organization paradigms that moreclosely model how people think about information. Inparticular, the tagging concept, where arbitrary wordsor phrases are associated with each idea or object in asystem, supported by an interface which makes suggestionsabout tags to apply, would further increase theutility and scalability of Trackle. Some tags for shellsession logs could be automatically generated, forinstance, a tag for each file and each directory involvedin a particular shell tracking session. Tags could also beassigned for software packages involved in a shelltracking session, by integrating with package managementtools (dpkg, rpm, etc.).ConclusionsOur experience with the SCCS staff email listand Wiki has shown that relying on self-reportingleads to missing, incomplete, or inaccurate reports ofchanges made to our systems. The poor quality ofthese reports makes it hard to find the source of a particularchange. Further, an inaccurate report mightcause a sysadmin trying to duplicate past steps tocause new errors instead of repairing existing ones.Additionally, the presence of inaccuracies degradesconfidence in all reports. Trackle alleviates these problemsby keeping consistent and accurate records sothat sysadmins can focus on solving problems ratherthan the tedious task of keeping logs.Trackle’s integrated Wiki has allowed us to begincollecting related topics into a sysadmin training manualand how-to guide. This evolving guide allows newsysadmins to ask more sophisticated questions of theirexperienced colleagues. Trackle allows sysadmins tolearn on their own time and at their own pace so thatno one gets bored or left behind.Often, volunteer sysadmins learn the intricaciesof a system only when it breaks. Trackle allows us tolearn by reflection rather than by struggling to fix acritical error. This leads to more efficient use of officetime for those not present when problems occur. Wealso learn from authentic situations rather than toyproblems or contrived examples.Because our shell tracking tools operate transparently,Trackle can be used to complement existingchange/configuration management systems. Thoughmany configuration management systems have alreadysolved the problem of discovering file changes,Trackle goes further to associate file changes with aparticular issue. Additionally, Trackle detects changesto any files, not just those that are already being monitoredby a configuration management system.AvailabilityTrackle is open source software, licensed underthe BSD license. You may download stable Tracklereleases and documentation from .AcknowledgementsWe would like to thank Benjamin A. Kuperman andMustafa Paksoy for their work on Audlib [5], on which20th Large Installation System Administration Conference (LISA ’06) 297

Fighting Institutional Memory Loss ...Cro s t a ,Singleton, & Kupermanlibtrackle is based. We would also like to thank EdgewallSoftware and the numerous contributors to Trac, withoutwhich Trackle would not have been possible.Author BiographiesDaniel S. Crosta graduated from Swarthmore Collegein June, 2006, with a B.A. in Computer Science.During his tenure at Swarthmore, he served three yearsas Systems Administrator for the Swarthmore CollegeComputer Society, most recently as Lead SystemsAdministrator. He has also participated in research inComputer Graphics at Princeton University, and inComputer Vision at Swarthmore College. Since July,2006, he has been working as a Software Developer atWi r e l e s s Generation in New York City. Contact himelectronically at .Matthew J. Singleton is a currently a senior atSwarthmore College in Swarthmore, PA, where he is adouble-major in Computer Science and Linguistics. Heis also the Lead Systems Administrator for the SwarthmoreCollege Compter Society. He has participated inComputational Linguistics research in the Departmentof Computer Science at Swarthmore College. Reachhim electronically at .Benjamin A. Kuperman received the M.S. andPh.D. degrees from the Department of Computer Sciencesat Purdue University in 1999 and 2004. He is anassistant professor at Oberlin College in Ohio and previouslytaught at Swarthmore College in Pennsylvania.While at Purdue, he was a researcher in the Centerfor Education and Research in Information Assuranceand Security (CERIAS) for five years and was affiliatedwith COAST before that. His main areas ofresearch are on host-based computer security monitoringsystems and OS level audit systems. Reach himelectronically at .Swarthmore College Sigma Xi poster session,September, 2005.[6] Pomeranz., Hal ‘‘PLOD: Keep Track of WhatYou’re Doing,’’ Proceedings of LISA 1993: 5thSystems Administration Conference, November1993.[7] Roth, Mark D., ‘‘Preventing Wheel Reinvention:The psgconf System Configuration Framework,’’Proceedings of LISA 2003: 17th Systems AdministrationConference, pp. 205-211, October, 2003.[8] Sharp, James M., ‘‘Request: A Tool for TrainingNew Sys Admins and Managing Old Ones,’’Proceedings of LISA 1992: 4th Systems AdministrationConference, October, 1992.[9] Stepleton, Thomas, ‘‘Work-Augmented Lazinesswith the LOS Task Request System,’’ Proceedingsof LISA 2002: 16th Systems AdministrationConference, November, 2002.[10] US Department of Defense, Trusted ComputerSystems Evaluation Criteria (also known as the‘Orange Book’) Technical Report DoD 5200.28-STD, DoD Computer Security Center, Fort Meade,MD, December, 1985.Bibliography[1] Anderson, Paul and Edmund Smith, ‘‘ConfigurationTools: Working Together,’’ Proceedings ofLISA 2005: 19th Systems Administration Conference,pp. 31-37, December, 2005.[2] Common Criteria for Information TechnologySecurity Evaluation, .[3] Evard, Rémy, ‘‘An Analysis of UNIX SystemConfiguration,’’ Proceedings of LISA 1997: 11thSystems Administration Conference, pp. 179-193,October, 1997.[4] Kuperman, Benjamin A., A Categorization ofComputer Security Monitoring Systems and theImpact on the Design of Audit Sources, Ph.D.thesis, Purdue University, West Lafayette, IN,August, CERIAS TR 2004-26, 2004.[5] Paksoy, Mustafa and Benjamin A. Kuperman,‘‘ A u d l i b : Generating computer security audit logswith interposing libraries,’’ Presented at 2005298 20th Large Installation System Administration Conference (LISA ’06)

More magazines by this user
Similar magazines