21.07.2015 Views

Linux Journal | August 2012 | Issue 220

Linux Journal | August 2012 | Issue 220

Linux Journal | August 2012 | Issue 220

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

visit us at www.siliconmechanics.com or call us toll free at 888-352-1173RACKMOUNT SERVERS STORAGE SOLUTIONS HIGH-PERFORMANCE COMPUTING“ Just becauseit’s badass,doesn’t meanit’s a game.”Pierre, our new Operations Manager,is always looking for the right tools to get morework done in less time. That’s why he respectsNVIDIA ® Tesla ® GPUs: he sees customers returnagain and again for more server productsfeaturing hybrid CPU / GPU computing, like theSilicon Mechanics Hyperform HPCg R2504.v3.We start with your choice of two state-ofthe-artprocessors, for fast, reliable, energyefficientprocessing. Then we add four NVIDIA ®Tesla® GPUs, to dramatically accelerate parallelprocessing for applications like ray tracing andfinite element analysis. Load it up with DDR3memory, and you have herculean capabilitiesand an 80 PLUS Platinum Certified power supply,all in the space of a 4U server.When you partner withSilicon Mechanics, youget more than stellartechnology - you get anExpert like Pierre.Silicon Mechanics and Silicon Mechanics logo are registered trademarks of Silicon Mechanics, Inc. NVIDIA, the NVIDIA logo, and Tesla, are trademarks or registered trademarks of NVIDIA Corporation in the US and other countries.


CONTENTSAUGUST <strong>2012</strong>ISSUE <strong>220</strong>LINUX AT WORKFEATURES64 A Guideto Using LZOCompressionin HadoopHow to startimplementingMapReduceprograms thathandle LZOcompressed data.Arun Viswanathan76 IntroducingVagrantManage multipledevelopmentenvironments easilyusing Vagrant.Jay Palat86 WritingSambaVSF ModulesExtend Samba’sfunctionality withVSF ModulesRichard SharpeON THE COVER• Raspberry Pi/N900 Hacks, p. 46• Set Up a Local Mirror to Save Time and Bandwidth, p. 52• Manage Multiple Development Environments with Vagrant, p. 76• Using LZO Compression in Hadoop, p. 64• Create VFS Modules to Extend Samba's Functionality, p. 86• Learn about Shell Aliases and Functions, p. 42• Tarsnap: High-Security, Command-Line-Friendly Automated Backups, p. 104• An Except from Doc Searls' New Book The Intention Economy: When Customers Take Charge, p. 117Cover Image: © Can Stock Photo Inc. / NexusPlexus4 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNS34 Reuven M. Lerner’s At the ForgeDatabase Integrity andWeb Applications42 Dave Taylor’s Work the ShellScripting Lite: Shell Aliasesand Functions46 Kyle Rankin’s Hack and /N900 with a Slice of Raspberry Pi52 Shawn Powers’The Open-Source ClassroomMirror, Mirror, Down the Hall114 Doc Searls’ EOFDebugging Free Markets ThatStill Aren’t117 “Your Choice of Captor”an Excerpt from Doc Searls’The Intention EconomyINDEPTH104 Tarsnap: On-line Backupsfor the Truly ParanoidTarsnap’s <strong>Linux</strong>-friendly encryptedbackup service makes keepingyour data in the cloud a snap.Andrew FabbroIN EVERY ISSUE8 Current_<strong>Issue</strong>.tar.gz10 Letters18 UPFRONT32 Editors’ Choice60 New Products123 Advertisers Index22 LIGHTSPEED32 EDITORS' CHOICE: RESARA46 RASPBERRY Pi/N900LINUX JOURNAL (ISSN 1075-3583) is published monthly by Belltown Media, Inc., 2121 Sage Road, Ste. 310, Houston, TX 77056 USA. Subscription rate is $29.50/year. Subscriptions start with the next issue.WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 5


Executive EditorSenior EditorAssociate EditorArt DirectorProducts EditorEditor EmeritusTechnical EditorSenior ColumnistSecurity EditorHack EditorVirtual EditorJill Franklinjill@linuxjournal.comDoc Searlsdoc@linuxjournal.comShawn Powersshawn@linuxjournal.comGarrick Antikajiangarrick@linuxjournal.comJames Graynewproducts@linuxjournal.comDon Martidmarti@linuxjournal.comMichael Baxtermab@cruzio.comReuven Lernerreuven@lerner.co.ilMick Bauermick@visi.comKyle Rankinlj@greenfly.netBill Childersbill.childers@linuxjournal.comContributing EditorsIbrahim Haddad • Robert Love • Zack Brown • Dave Phillips • Marco Fioretti • Ludovic MarcottePaul Barry • Paul McKenney • Dave Taylor • Dirk Elmendorf • Justin RyanProofreaderGeri GalePublisherAdvertising Sales ManagerAssociate PublisherWebmistressAccountantCarlie Fairchildpublisher@linuxjournal.comRebecca Cassityrebecca@linuxjournal.comMark Irgangmark@linuxjournal.comKatherine Druckmanwebmistress@linuxjournal.comCandy Beauchampacct@linuxjournal.com<strong>Linux</strong> <strong>Journal</strong> is published by, and is a registered trade name of,Belltown Media, Inc.PO Box 980985, Houston, TX 77098 USAEditorial Advisory PanelBrad Abram Baillio • Nick Baronian • Hari Boukis • Steve CaseKalyana Krishna Chadalavada • Brian Conner • Caleb S. Cullen • Keir DavisMichael Eager • Nick Faltys • Dennis Franklin Frey • Alicia GibbVictor Gregorio • Philip Jacob • Jay Kruizenga • David A. LaneSteve Marquez • Dave McAllister • Carson McDonald • Craig OdaJeffrey D. Parent • Charnell Pugsley • Thomas Quinlan • Mike RobertsKristin Shoemaker • Chris D. Stark • Patrick Swartz • James WalkerAdvertisingE-MAIL: ads@linuxjournal.comURL: www.linuxjournal.com/advertisingPHONE: +1 713-344-1956 ext. 2SubscriptionsE-MAIL: subs@linuxjournal.comURL: www.linuxjournal.com/subscribeMAIL: PO Box 980985, Houston, TX 77098 USALINUX is a registered trademark of Linus Torvalds.


TrueNAS Storage AppliancesHarness the CloudUnified. Scalable. Flexible. Storage.As IT infrastructure becomes increasingly virtualized, effectivestorage has become a critical requirement. iXsystems’ TrueNASStorage appliances offer high-throughput, low-latency backingfor popular virtualization programs such as Hyper-V, VMWare,and Xen. TrueNAS hybrid storage technology combinesmemory, NAND flash, and traditional hard disks to dramaticallyreduce the cost of operating a high performance storageinfrastructure. Each TrueNAS appliance can also serve multipletypes of clients simultaneously over both iSCSI and NFS, makingTrueNAS a flexible solution for your enterprise needs.For growing businesses that are consolidating infrastructure,the TrueNAS Pro is a powerful, flexible entry-level storageappliance. iXsystems also offers the TrueNAS Enterprise, whichprovides increased bandwidth, IOPS and storage capacity forresource-intensive applications.Supports iSCSI orNFS exports simultaneouslyCompatible with popularVirtualization programs suchas Hyper-V, VMware, and Xen128-bit ZFS file system with upto triple parity software RAIDCall 1-855-GREP-4-IX, or go to www.iXsystems.comTrueNAS Pro Features• One Six-Core Intel® Xeon® Processor5600 Series• High Performance Write Cache• Up to 480GB MLC SSD Cache• Up to 320 TB SATA capacity• Quad Gigabit Ethernet• 48GB ECC MemoryTrueNAS Enterprise Features• Two Six-Core Intel® Xeon® Processors5600 Series• Extreme Performance Write Cache• Up to 640GB SLC or MLC High PerformanceioMemory• Up to 500TB SATA or 320TB SAS capacity• Dual Ten Gigabit Ethernet• 96GB ECC MemoryIntel, the Intel logo, and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and other countries.


Current_<strong>Issue</strong>.tar.gzWater Coolers,Cubicles,CommitteeMeetings anda PenguinSHAWN POWERSOne of these things doesn’t belongin the workplace. If you ask mostpeople in the business world, they’dsay a penguin is a silly thing to keep at work.Those of us in the server room, however, justsnicker at such foolishness. I’ll take <strong>Linux</strong>over a committee meeting any day! Thismonth, we get to see <strong>Linux</strong> at work.You might have noticed the pastfew months we’ve chosen a product orprogram as “Editors’ Choice” for theissue. It’s a great way for us to highlightsomething we find particularly awesome.This month, I introduce Resara. Samba 4has been in active development for years,but the folks at Resara package the currentbuild into a very functional MicrosoftActive Directory replacement. Theircommunity version is full-featured andfree—it’s definitely worth checking out.Reuven M. Lerner talks about a particularlyserious issue this month, as he discussesdatabase integrity. Databases often areoverlooked as boring, but we rely on themconstantly. Having viable, accurate data issomething we assume, but Reuven showsus how to make that assumption possible.Dave Taylor follows up with a tutorial on howto use shell aliases and functions. We oftentake the simple tweaks in our .bashrc file forgranted, but Dave not only opens it up, healso shows how to add tweaks of our own.The Raspberry Pi device is somethingmost people in our circles are familiar with,even if we haven’t been able to get ourhands on one! Kyle Rankin, as usual, takessomething awesome and makes it evenmore so. Combining his Nokia N900 and the8 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


lettersDigitalErgonomicsI’mconsideringrenewing mysubscription.My mainproblem withthe magazineis that thedigital formatis not setup for digital devices. Digital devicesdo not have letter page size and donot have 600dpi resolution, and manylaptops do not have a portrait aspectratio. What does this mean? Well,in order to view a page, you have tozoom in. Once zoomed in, you haveto scroll up and down to view a singlepage. That is annoying. Compare itwith the Web. You start at the topand scroll down because most Webpages are single-column text. Theyare digitally ergonomic, whereas yourmagazine’s PDF is not. Please, I begyou, fix it by going to a single-columnPDF or, better yet a, downloadableHTML .tgz. HTML would allow readersto set the single-column size to whatworks best for their devices.—Mark RamsellThe main PDF has changed slightlysince the digital transition. It “fits” ona tablet-size screen better. Laptops, Iagree, don’t lend themselves to readinga portrait document very well. In thatcase, I recommend using an EPUBreader, which allows for flowing text(a couple issues ago I wrote abouta great Firefox plugin that does it).Otherwise, the interactive on-lineversion of the magazine formats as atwo-page spread, which looks good ona computer screen. Hopefully, one ofthose formats will work well for you.The good news is they’re all includedwith your subscription!—Ed.Digital LibraryI have to admit, I am quite messy atkeeping track of stuff, and thingsthat have been added to my list ofuntrackable stuff are e-books. I havean old Android tablet, a new Androidphone, a laptop with Windows 7/OpenSUSE 12.1 and three more PCs thatI usually work on. That’s a lot of devices,and because of this, except for my workPC, I don’t spend a lot of time at anygiven device. I think the PDF and .mobiformats are great for e-books like themagazine; however, it takes precioustime to find the last page that I read.I am currently using Dropbox to sharethe files between all the computers,but I can’t help wondering if there is a10 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[ LETTERS ]recently. I’m a very recent twoyearsubscriber, and I must say thatI echoed Greg’s impression when Ipreviewed a few past issues. I made atwo-year subscription in spite of myinitial impression because I was certainthat it was a passing thing and that LJeventually would cover more generaland practical topics. A Bluetoothproximity device that locks and unlocksmy computer when I leave and returnis pretty incredibly cool, I admit, butnot too practical. A wall-mounted,dual-monitor PC is also pretty cool.But practical? Don’t get me wrong,there is nothing at all wrong withthe esoteric, but let’s not exclude themore common—something for theintermediate reader: an article onUdev and the many gotchas involvedin manual configuration; Calibre, andwhat it can do; using notify-send toset up your own notifications; Flexgetfor automating torrent downloads.There is a trove of topics that would bebeneficial to a very broad slice of yourreadership. I’d even be willing to writea few articles myself, if that wouldhelp. Love the magazine as a whole,just hoping for something a notch ortwo closer to home.—SlurryIt’s an admittedly strange game tryingto get just the right balance of articlesto make everyone equally happy (orequally frustrated I suppose). We havea group of <strong>Linux</strong> <strong>Journal</strong> readers on ouradvisory board, and we run ideas pastthem every issue. We are always up fornew writers, so if you’d like to send anarticle idea, we’d love to hear it. Seehttp://www.linuxjournal.com/authorfor more information.Substandard Working the ShellDave Taylor’s Work the Shell column isfar less useful than it could be. This isnot UNIX <strong>Journal</strong>. This is <strong>Linux</strong> <strong>Journal</strong>.As such, the shell that people use isbash, not the Bourne shell. I just don’tunderstand why every example is basedon a 20-year-old shell that requiresthat every single thing can occuronly by running external processes.Let’s look at a few examples from thecurrent issue:if [ ! -z "$2" ] ...vs.:if [[ -n "$2" ]] ...vs.:[[ -z "$2" ]] || targetLength="$2"WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 13


[ LETTERS ]For the record, I’d rather see peoplegetting educated on what the differenceis between use of single vs. doublesquare brackets.pushing my luck because of all thosepeople out there who are still(?) usingan antique copy of bash. Maybe theycan’t use:Here’s another:uword="${word^^}"for word in $(cat $possibilities)But, at the very least, they could use:dovs.:length=$(echo $word | wc --c)length="$(( $length - 1 ))" # Why the dbl quotes?for word in $(< $possibilities)dolength=$(( ${#word} - 1 ))A few processes were saved. Whocares? The people who go into a jobinterview and demonstrate how littlethey know and how sloppy they areat not bothering to learn properly thelanguage in which they are being paidto write. Besides that, there are lots ofbad >10K LOC scripts out there that getburied under the sheer weight of theexcess processes.One last one:uword=$(echo $word | tr '[[:lower:]]' '[[:upper:]]')Sure, using a bash4 feature might beuword=$(tr '[[:lower:]]' '[[:upper:]]'


for it, and thanks again for taking thetime to write.And, Steven replies to Dave: Youhave to read the bash man page about200 times.There used to be a program in /bin or/usr/bin called test. For a while, thatprogram used to link to a file called[. After that, the test became built in,and you’d have to go out of your wayto invoke the external test program.The [[ ]] construct is relatively new,around 20 years old or so. It startedwith ksh88, and bash has had it for avery long time.The differences are quite important:if [ -z "$foo" ] && [ $(xxx; echo $?) -eq 0 ] ||➥[ $(yyy; echo $?) -eq 0 ]will not do what you expect. It will testto see if $foo is null, and if it is, it willthen run the xxx test and not the yyytest. If the foo test is false, it will runthe yyy test and not the xxx test.The correct approach is:if [[ -z "$foo" ]] && {xxx || yyy}Note that double square brackets are nota built-in; they are a keyword.Another example that will demonstratethings:if [[ t1 && t2 && t3 ]]is not the same as:if [ t1 -a t2 -a t3 ]Multithreaded Simulation andOpenGL Visualization DemosI am a long-term subscriber to <strong>Linux</strong><strong>Journal</strong>, and I have recently postedthe source code of two of my projectson GitHub. One is a lattice simulationand the other an N-body simulation:https://github.com/fhstoica/AbelianHiggs and https://github.com/fhstoica/N_Body_Simulation.The code gives a good real-worldexample of highly parallelizablecomputations making use of barriersynchronization and CPU affinity(<strong>Linux</strong>-specific) to improve performance.The visualization code implements therendering of an isosurface using theMesa and GLUT libraries.I would have liked to see such exampleswhen I was learning multithreaded andWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 15


Save the Date!10th USENIX Symposiumon Operating Systems Designand ImplementationOctober 8–10, <strong>2012</strong>, Hollywood, CAThe 10th OSDI seeks to present innovative, exciting research in computersystems. OSDI brings together professionals from academic and industrialbackgrounds in what has become a premier forum for discussing thedesign, implementation, and implications of systems software.The full program and registration info will be available in <strong>August</strong> <strong>2012</strong>.www.usenix.org/osdi12STAY CONNECTED ONwww.facebook.org/usenixassociationwww.twitter.com/usenix #osdi12Don’t Miss the Co-Located Workshops• HotDep ’12: Eighth Workshop on Hot Topicsin System Dependability, October 7• HotPower ’12: <strong>2012</strong> Workshop on Power AwareComputing and Systems, October 7• MAD ’12: <strong>2012</strong> Workshop on Managing SystemsAutomatically and Dynamically, October 7


UPFRONTNEWS + FUNdiff -uWHAT’S NEW IN KERNEL DEVELOPMENTRecently, some folks identified a badpatch that had gotten into the 3.3.1kernel in the stable tree. It caused acertain set of hardware to fail to wakeup again, once it’d been suspended.The fix was simply to revert the badpatch, after which the affected systemswould work fine again. But, GregKroah-Hartman refused to make thechange in his upcoming 3.3.2 releaseuntil Linus Torvalds made the changein the upstream git tree.The result was an interesting debatebetween Filipe Contreras and a bunchof other developers over whether thatwas a good idea. Filipe’s argumentwas that the 3.3.x kernels representedthe “stable series”, and had the goalof eliminating bugs and giving thesmoothest ride possible. Therefore, heargued, it made no sense to wait aroundfor bug fixes to be incorporated into theupstream git tree first. The upstreamgit tree was for regular development,not stabilization, although the bug fixin question would result in real systemsbeing able to do real work, instead ofsitting there in a hung state.There were a number of responses tothat argument from various developers,including Linus. One argument madeby Willy Tarreau was that if bug fixesdidn’t get into the upstream tree first,the bugs potentially would reappearin the very next development release.And, because each stable series wasderived from a development release, itwas only by incorporating stable seriesfixes upstream that the developers couldensure that fixes to one stable serieswould make it into the next stable series.Linus also pointed out that there hadbeen a time when this particular ruledidn’t exist, and the problem of fixesgoing into the stable series and thenbeing left out of the development treehappened “all the time”.Adrian Chadd offered up a bitof history from the Squid Project.Apparently, it had a stable and adevelopment branch that it allowedto diverge, with the result that userswanted features and bug fixes for theolder tree that were incompatible withthe newer one, and mayhem ensued.So, Adrian argued, having a rulerequiring changes to the stable tree tobe incorporated into the developmenttree was a clean way to avoid that.The really interesting thing about the18 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[ UPFRONT ]whole discussion is that it illustratesthe way kernel development itselfdevelops. Traditionally, there’s alwaysbeen some aspect of the process thatwould bog down or become in someway unworkable—whether it was thestruggle to find a good version controlsystem, or identifying which peoplewere the right ones to send patchesdirectly to Linus, or figuring out how todo development while also producinga stable kernel for distributions.Eventually something breaks downand reveals some set of nuances thatno one had noticed before. At thatpoint, Linus and various other folkstypically take an algorithmic approachto smoothing things out again.Sometimes the solutions break downlater in new ways, and the successionof adjustments is fascinating to watchover time.—ZACK BROWN<strong>Linux</strong>-Native Task Management?Check.Task management programsare commonplace in ourbusy lives, but it seemsthat every system lackssomething. Google Tasksis nice, but it lacks thefeatures of a more robusttask management system.Remember the Milk is nice, but it chargesfor some of its features. Many standalonetask management programs are great,but they don’t sync between devices.Thankfully, Nitrotasks is a <strong>Linux</strong>basedtask management program thatnot only syncs between computers,but also has some organizationfeatures reminiscent of the “GetThings Done” method. When you addthe Web-based version and syncingcompatibility with Dropbox or UbuntuOne, Nitrotasks is worth checkingout. It’s simple, but its <strong>Linux</strong>-nativeroots and Web-based access makeit compelling. Be sure to put “checkout http://www.nitrotasks.com” onyour to-do list!—SHAWN POWERSWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 19


[ UPFRONT ]Visit <strong>Linux</strong><strong>Journal</strong>.comand Help Take Overthe World<strong>Linux</strong> really means business. It’s not justan issue focus, but also a reality. <strong>Linux</strong><strong>Journal</strong> has been around since before<strong>Linux</strong> was cool. We’ve seen it grow froma system praised by hobbyists and hardcoregeeks, through its not-so-awkwardadolescence during the tech boom of thelate 1990s, and blossom into an operatingsystem so elegant that is has penetratedthe mainstream and directly touches somany people in such a way that few evenare aware of its presence. As we all know,few people who participate in the digitalworld in which we currently live arenot touched by <strong>Linux</strong> in some way. Thatsaid, if you are reading this issue, youmore than likely are one of the informedfew who helps power the open-sourceecosystem that in turn powers the livesof so many. To that end, we offer you<strong>Linux</strong><strong>Journal</strong>.com as a resource to aid youin your pursuit of technical greatness.With articles focused on programming,security, desktop, system administrationand more, we think you’ll find it a handyreference for staying up to date anddiscovering new skills that will help youmake the world a better place.—KATHERINE DRUCKMANThey Said ItMost certainly,some planets arenot inhabited, butothers are, andamong these theremust exist lifeunder all conditionsand phases ofdevelopment.—Nikola TeslaLife is and willever remain anequation incapableof solution, but itcontains certainknown factors.—Nikola TeslaThe opinion of theworld does notaffect me. I haveplaced as the realvalues in my lifewhat follows when Iam dead.—Nikola TeslaI predict thatvery shortly theold-fashionedincandescent lamp,having a filamentheated to brightnessby the passage ofelectric currentthrough it, willentirely disappear.—Nikola Tesla (in April, 1930!)The desire thatguides me in all Ido is the desire toharness the forces ofnature to the serviceof mankind.—Nikola Tesla20 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[ UPFRONT ]Ubuntu’s New DNS: Unknown HostIf you’re the type of person whoinstalls Ubuntu’s server edition,you’re also likely the sort of personwho knows how to configurenetwork settings. For mostdistributions, especially those basedon Debian, the process is a bitstrange, but familiar.To configure the various interfaces,you edit /etc/network/interfaces andadd the appropriate IP information,along with the gateway address.That doesn’t complete the process,however, because if you manuallyconfigure the network interfaces,you need to add the DNS serversmanually to the /etc/resolv.conf file.That’s the way it’s always been, andI never put much thought into it—until Canonical changed the way theresolv.conf file works.I’ll admit, my initial reactionwas one of frustration, but onceI got over myself, I have to say itmakes much more sense to add theDNS configuration right into the/etc/network/interfaces file as well.My only complaint is that therearen’t comments in either the /etc/resolv.conf file or the /etc/network/interfaces file on actually how to doit! (Thankfully, there is a note in /etc/resolv.conf warning that any changeswill be overwritten, but no hints onhow to make changes properly.)The process, as it turns out, israther simple. In /etc/network/interfaces, simply add a couplelines to the end of the stanza for aparticular interface—for example:# The primary network interfaceauto eth0iface eth0 inet staticaddress 192.168.1.200netmask 255.255.255.0gateway 192.168.1.1dns-nameservers 192.168.1.1dns-domain example.comdns-search example.comYou’ll notice the last three linescontain a few unfamiliar directives.They are fairly self-explanatory, butimportant to know. Once addedto the /etc/network/interfaces filelike above, the /etc/resolv.conffile is populated with the properinformation when the network isactivated. It’s fairly simple, and itmakes sense to have all the settingsin one spot—but frustrating if youdon’t know about the changes!—SHAWN POWERSWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 21


[ UPFRONT ]Lightspeed on Your DesktopOne area of physics that is hard to wrapyour head around is relativity. Basically,relativity breaks down into general andspecial relativity. General relativity dealswith large masses and high energies,and it describes how space-time iswarped by these. Special relativitydeals with what happens during highvelocities. Many odd and counterintuitiveeffects happen when speedsget close to the speed of light, or c. Theproblem is that these types of conditionsare quite far outside normal experience,so people don’t have any frame ofreference as to what these effects wouldlook like—enter Lightspeed.Lightspeed is an OpenGL programthat shows what an object would looklike as it travels closer and closer tothe speed of light. Lightspeed actuallymodels four different effects that occurwhen you get close to the speed oflight. The first effect is called Lorentzcontraction. This effect causes objectsto appear to shrink in the direction oftravel. This is scaled by a factor calledgamma. Gamma is calculated by:1 / sqrt(1 - v 2 /c 2 ).So, as you can see, as your velocity(v) gets closer to the speed of light(c), the value gamma increases towardinfinity. The length contraction iscalculated byl’ = l / gamma.This means the length of an objectin the direction of travel (l) decreasestoward zero as the velocity increasestoward the speed of light.The second effect that Lightspeedmodels is Doppler shifting. Youprobably have noticed the soundversion of Doppler shift when a firetruck drives by with its siren on. Youcan hear the shift in sound frequencyfrom high to low as it goes by. Thesame thing happens with light too. Asa light source comes toward you, itshifts in color toward blue. As it goesaway, the color shifts toward red.The third effect is something called theHeadlight effect. When you have a lightsource that is moving, the amount oflight being emitted isn’t the same in alldirections if that light source is moving.If the light is coming toward you, it willappear brighter. If it is moving away fromyou, it will appear darker.The last effect that Lightspeedmodels is something called opticalaberration. Because light travels at afinite speed, the light from differentparts of the object come to you atdifferent times. The end effect isthat as an object comes toward you,it looks stretched out. And, whenit travels away from you, it lookssquashed. This is actually differentfrom Lorentz contraction.22 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[ UPFRONT ]Figure 1. Initial Scene on Starting LightspeedSo, how can you see what anobject would look like, taking all ofthese effects into account? This iswhere Lightspeed comes in. Mostdistributions should have a packageavailable that you can install usingtheir package management system.If you are interested in building fromsource, it is hosted at SourceForge.When you first start it up, it openswith a cube set in the middle of afield made up of rods for the edgesand balls for the vertices (Figure 1).The beginning velocity is set as 1m/s.Since this is an OpenGL program, yousimply can grab the cube with yourmouse and spin it around, or up anddown, in order to change the view ofthe object. At the far right, you can setthe velocity that the object has relativeto you, going from 1m/s all the way upto 299,792,457m/s (the speed of lightis 299,792,458m/s).Figure 2. Setting the Number of Vertices inYour ObjectFigure 3. Altering the Smoothness of theRenderingYou can change the dimensions of thedefault cube object by selecting the menuoption File→New lattice (Figure 2). YouWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 23


[ UPFRONT ]Figure 4. Displaying Axes and a Bounding Boxaround Your Objectcan set how many balls will be drawn ineach of the three dimensions. You alsocan change the smoothness factor whenthe object is rendered on the display. Inthis case, it will look like what is shown inFigure 3.You also have the option of loadingyour own 3-D object, either in 3DStudio format (*.3DS or *.PRJ) or inLightWave format (*.LWO). This newobject is what will be rendered, andthe optical effects will be appliedto this. You can save a snapshot ofa particular object at a particularspeed in either PNG or TIFF format.You also can export to an SRS fileformat (Special Relativity Scene).This is a format used by the programBACKLIGHT, which is a specializedraytracer used to illustrate relativisticeffects. This lets you generate muchhigher resolution images of theFigure 5. Setting Options for an Animation of YourObjectaffected object.In the Objects menu item, there areoptions for drawing supplementaryobjects. By default, the floatinggrid is selected. You also can selectcoordinate axes, identifying the x, yand z axes. You can select the displayof a bounding box for your object aswell (Figure 4).Additionally, you can animate thescene by selecting Objects→Animationin the menu. This pops up a dialog boxwhere you can select the starting andending points on the x axis and theloop time in seconds (Figure 5). Thescene then will be animated until youclick stop.Above, I looked at the four effectsLightspeed can model and apply toyour object. By default, all of the24 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[ UPFRONT ]Figure 6. Looking at Just Lorentz Contractionat .9ceffects are selected when you start up.If you want to change which effectsare being modeled, you can go tothe Warp menu item and select anycombination of Lorentz contraction,Doppler red/blue shift, Headlight effector Optical aberration. So, you canselect only the Lorentz contraction andsee what it looks like at 90% of thespeed of light (Figure 6).You have quite a bit of control overthe camera as well in Lightspeed.You can select the focal length ofthe camera lens, going from 28mmto 200mm. You even can set acustom lens focal length by selectingCamera→Lens→Custom. You can setthe position of the camera preciselyby clicking Camera→Position. Thisdisplays a pop-up dialog, where youcan set the exact x, y and z values forits location (Figure 7).Figure 7. Changing Camera Position for aBetter ViewYou can select what information isdisplayed on the screen by selectingthe menu item Camera→Info display.You can have the velocity, the time,the gamma factor and/or the framerate displayed on the screen.The default background color is black,but you can change it to gray, white orvery white. This display is an OpenGLdisplay, so you can select the renderingmode. The default is shaded, but you canchange it to wireframe rendering.One of the really cool options is thatyou can spawn off other cameras byselecting Camera→Spawn camera. Thislets you see your object from severaldifferent angles at the same time. So,now you can see what it looks likecoming and going at the same time(Figure 8).WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 25


[ UPFRONT ]Figure 8. Setting Up Multiple Cameras for Different PerspectivesOnce you have the speed and therelativistic effects set, you probably wantto interact with the object a bit to seehow it looks from different directionsand so on. If you click the left mousebutton and drag around, your objectwill be rotated around. If you click theleft mouse button and press Shift atthe same time, the camera view willbe moved around, pointing in differentdirections. You can move the cameraitself by clicking the middle mousebutton and moving it up and down orleft and right. You can move the camerain and out from the object by clickingthe right mouse button and sliding upand down. If you completely mess upyour view, you always can go back tothe defaults by clicking the menu itemCamera→Reset View.So, remember this program whenyou are studying special relativity. Nowyou finally can see what it would reallylook like if you were traveling down thetunnel with the protons in the LHC orwhat the Starship Enterprise would looklike as it flew by at sublight speed.—JOEY BERNARD26 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[ UPFRONT ]<strong>Linux</strong> in the Workplace PollWe at <strong>Linux</strong> <strong>Journal</strong> HQ have a lot of<strong>Linux</strong> in our office. It’s on desktops,laptops, servers, phones and tablets.We love <strong>Linux</strong> so much, we even have aTux Droid (http://lnxjr.nl/tux-droid).We also use a wide variety of non-<strong>Linux</strong> open-source software for variouspurposes, and as FOSS has becomemore and more ubiquitous, we bet youdo too. So, we wondered how you,our readers, use <strong>Linux</strong> and open-sourcesoftware in your workplaces. Thanksto all who answered our short poll on<strong>Linux</strong><strong>Journal</strong>.com; we now have a betteridea of how much love our favoriteoperating system gets in your offices.When asked how much <strong>Linux</strong> you usein the workplace, we’re thrilled to learnthat the most-popular response (29%)was “We’re allowed to choose theoperating system we use, and the coolkids use <strong>Linux</strong>.” Freedom to chooseyour operating system is somethingwe love to see. And, although we arealso excited that a solid 14% of youindicated that everything you touch inyour office uses <strong>Linux</strong>, we appreciatethat this is not always possible in everywork environment, so we’re happythat most of you are in an environmentwhere your preferences are respected,whatever they may be.We are also glad to report that 66% ofyou use <strong>Linux</strong> on your work computers.The most popular flavor was Ubuntu,with Fedora a pretty distant second. Andto the 34% using Windows, we hope itisn’t giving you too many headaches!86% of you use other, non-<strong>Linux</strong>open-source software, and a full 25% ofyou use mostly open-source software on<strong>Linux</strong>. And to the 14% who believe youare able to use open-source softwarejust because no one knows Firefox isopen source, let’s hope the powers thatbe remain blissfully ignorant.The most-popular reasons for using<strong>Linux</strong> at your companies are that <strong>Linux</strong>saves money and is more secure, and wecouldn’t agree more. You most commonlyuse <strong>Linux</strong> as a Web server, databaseserver, desktop system, file server, forsoftware development and as a mailserver. Not too many of you are using<strong>Linux</strong> for movies and graphics, but we’dlove to hear more from the 8% who are.Please drop us a line if you’d like to showoff your work.We are pleased that 32% of you arein your current position because of your<strong>Linux</strong> expertise. We also understand thatour readers are people of many talents,so 44% of you hold your current positionregardless of your <strong>Linux</strong> knowledge.Finally, we’re relieved that 73% of yourbosses do know what <strong>Linux</strong> is, and to therest of you, please let me know if you’dlike my help in staging an intervention.WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 27


[ UPFRONT ]n 14% — Yes, we use some desktopsoftware like LibreOffice.n 6% — Yes, we use mostly OSS onWindows.n 1% — Yes, we use mostly OSS on OS X.n 25% — Yes, we use mostly OSS on <strong>Linux</strong>.n 7% — Everything in our office is opensource, even the soda we drink.4. IF YOUR COMPANY USESLINUX, IT IS BECAUSE:n 56% — <strong>Linux</strong> saves money.n 53% — <strong>Linux</strong> is more secure.n 28% — <strong>Linux</strong> runs on older hardware.n 23% — <strong>Linux</strong> isn’t a Microsoft product.n 24% — Because I put it there and noone knows.6. DOES YOUR BOSS KNOWWHAT LINUX IS?n 73% — Yes.n 13% — No.n 7% — Well, my boss thinks <strong>Linux</strong> is anair-conditioning company.7. HOW IS LINUX USED ATYOUR COMPANY?n 60% — Web servern 55% — Database servern 53% — Desktop systemn 51% — File servern 48% — Software developmentn 35% — Mail servern 34% — DNS servern 34% — Virtual server5. DO YOU HAVE YOURCURRENT POSITION:n 32% — Because of <strong>Linux</strong> expertise.n 9% — In spite of <strong>Linux</strong> expertise.n 44% — It has nothing to do with <strong>Linux</strong>.n 6% — I’m not currently employed;thanks for bringing it up.n 23% — VPNn 22% — Spam or virus filteringn 16% — Directory servern 13% — Telephony or PBXn 9% — HPCn 9% — Movie/graphics developmentWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 29


[ UPFRONT ]Note: not all respondents answeredeach question, and a few questionsallowed for multiple answers, so pleasedon’t be alarmed when the answers don’tadd up to 100%.—KATHERINE DRUCKMANNon-<strong>Linux</strong> FOSSIf you’re a Windows user, choosing an instant-messaging client actually canbe a little frustrating. Some of the more-popular clients are loaded withadvertising or cost a lot to get full functionality. Normally, I recommendWindows users install theWindows version of Pidgin.Recently, however, I stumbledacross a nifty Windows-only IMclient named Miranda.Miranda is open source, adfreeand integrates perfectlyinto Windows. Granted,Pidgin works brilliantly underWindows, but it always feelslike what it is—a port of aprogram designed for somethingelse. Miranda has only aWindows version, and the onething it does, it does quite well.If the standard install ofMiranda doesn’t quite cover yourneeds, be sure to check out theadd-ons designed for it. Whetheryou want to add Facebookchatting, integrate with Skypeor check the weather, there aretons of add-ons from which tochoose. Check it out today athttp://www.miranda-im.org.—SHAWN POWERS30 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[ EDITORS' CHOICE ]Resara Me ThisMore than any other program, Sambaallows <strong>Linux</strong> desktops to exist in the worldof Windows. In fact, Samba historicallyhas allowed <strong>Linux</strong> to live secretly inthe server room as well. It’s possible toemulate a Primary Domain Controller fromyour <strong>Linux</strong> server, and Windows machinescan’t tell the difference. The problem isthat Microsoft no longer uses PDCs andhas turned to Active Directory.Thankfully, Samba version 4 has justentered beta, and it supports ActiveDirectory emulation! Although it’s still inearly beta stage, that hasn’t stopped thefolks at Resara from packaging upthe software into a turnkey ActiveDirectory replacement. Offeringa fully functional communityversion along with its only slightlymore-functional commercialversion, Resara gives you asurprising feature set.The server itself is built onUbuntu, and its cross-platform(<strong>Linux</strong>, Windows, OS X) GUIadministration tool runs prettymuch anywhere. It offers simpleadministration of users, groups,computers, DNS and DHCP. If youneed further customization of theActive Directory itself,Resara is compatiblewith Microsoft’s ownGroup Policy Editor. That, unfortunately,runs only on Windows.You’ll be hearing more about Resarafrom me in future issues, but forfolks looking to implement an ActiveDirectory system without the desireto pay for Microsoft licensing, checkout Resara now! The communityversion is at http://www.resara.org,and the commercial version is athttp://www.resara.com. I’m using thecommercial version, both because I wantto support the project and because I’msure I’ll take advantage of the technicalsupport!—SHAWN POWERSThe administration GUI is simple and effective.EDITORS’CHOICE★32 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSAT THE FORGEDatabaseIntegrity andWeb ApplicationsREUVEN M.LERNERWant to improve the integrity of your data? Place constraints inthe database, as well as in your application.NoSQL, the catchall phrase fornon-relational databases, is all the rageamong Web developers. However, it’ssomewhat unfair and unhelpful to usethe term NoSQL to describe them, giventhe variety of technologies involved.Even so, there are some fundamentaldifferences between traditionalrelational databases and their NoSQLcounterparts. For one, as the nameimplies, NoSQL databases don’t usethe standard SQL query language, anduse either their own SQL-like language(for example, MongoDB) or an objectorientedAPI. Another difference is thelack of two-dimensional tables; whereasSQL databases operate solely with suchtables, NoSQL databases eschew themin favor of name-value pairs or hash-likeobjects. And finally, NoSQL databasestypically lack the features that led to thedevelopment of relational databases,namely transactions and data integrity.There’s no doubt that the flexibilityNoSQL databases offer is attractiveon a number of levels. Just as I enjoyworking with dynamic languagesin which I don’t have to declare myvariables (or their types) before I usethem, it’s nice to be able to storeobjects in my database without havingto define the object structure inadvance. If I want to add a new field tomy Person object, I just do so, and thedatabase magically catches up.At the same time, there are manycases when I want the database to betough with me and enforce the integrityof my data. That is, I want to be surethat even if I have made a mistake in myapplication, or if a user enters a valuethat shouldn’t be allowed, the databasewon’t allow that bad data to be stored.And yes, I believe that it’s good to have34 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSAT THE FORGEI often have to do what I refer to as “database surgery”on applications that are running for my clients,and it’s always reassuring to know I cannot makea change manually that would corrupt the data.such checks at the database level, ratherthan just at the application level—notonly because it provides an additionalguard against corrupt data, but alsobecause the database often is accesseddirectly, outside the application itself.I often have to do what I refer to as“database surgery” on applicationsthat are running for my clients, andit’s always reassuring to know I cannotmake a change manually that wouldcorrupt the data.Readers of this column know I’ma fan of both Ruby on Rails and ofPostgreSQL, and I often use themtogether on projects. However,because Rails originally was developedunder MySQL, which lacks many of thedata-integrity aspects of PostgreSQL,the standard Active Record packagefails to include such items as foreignkeys in its base implementation. Thismeans that although Rails will supportPostgreSQL out of the box, it doesn’tprovide support for foreign keys, letalone data-integrity checks.So, in this article, I look at some Rubygems that make it possible to includeforeign-key support in your Active Recordmodels. I also describe ways you cantake advantage of PostgreSQL’s CHECKmechanism to ensure that your data is assafe as possible.Where Do You Check Your Data?Before I go into the actual code andimplementation, I admit there are severalphilosophies of where and how to checkyour data. As I mentioned previously, Iprefer to have the data checked at boththe database and application levels,although sometimes I’ve been lazy andused the default in Rails that does soonly at the application level.Checking your data only at theapplication level is attractive in manyways. It makes the implementationeasier, and it often will work just fine. Italso, in the case of Rails, allows you towork in a single language (that is, Ruby)and put the integrity checks in yourmodel files, where you’re going to beusing them.But as I already said, if I have mychecks only in the application, it’squite possible that when I try toaccess the database from outside myapplication, I’ll end up messing thingsup, if only accidentally. Now, I’ll admitthis is the way many Web applicationsWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 35


COLUMNSAT THE FORGEIt would be a bigger mistake to try to do things theother way around—that is, to have the checks onlyin the database, rather than at the application level.are designed, and it’s not a fatal flaw.But it is something that I’ll try to avoidin this article.It would be a bigger mistake to tryto do things the other way around—that is, to have the checks only in thedatabase, rather than at the applicationlevel. Imagine, for example, if you wereto have a “NOT NULL” constraint in thedatabase, such that a particular columncould not contain a NULL value. If youdidn’t protect against such things at theapplication layer, you might end up tryingto put that data into your database.And although the database itself wouldnot be corrupted, the applicationwould generate an error, confusing andfrustrating users.So, upsetting and annoying as it mightbe, the best way to do things probablyis to duplicate some work—adding theconstraints on the database and thenhaving them on the application as well.You could argue that you should have theconstraints in place in a third location—in the HTML form the user often willuse to submit information to theserver. Fortunately, there is at least onetechnique for handling this sort of thingautomatically. The client_side_validationsgem for Rails copies the validationsas best as possible, putting them intoJavaScript within the user’s views.Foreign KeysWith this background, let’s now assumethat I want to implement a simpleappointment calendar. In order toimplement my calendar, I need to keeptrack of people (each of whom hasa first name, a last name, an e-mailaddress and a telephone number) andappointments (which will indicate thedate/time of the appointment, theperson with whom I’m meeting and anote about the meeting itself).At the database level, the tables willbe fairly straightforward. This is how Iwould create the tables if I were doingso manually:CREATE TABLE People ();id SERIAL,first_name TEXT NOT NULL,last_name TEXT NOT NULL,email TEXT NOT NULL,phone TEXT NOT NULL,PRIMARY KEY(id)CREATE TABLE Appointments (id SERIAL,36 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSAT THE FORGE);person_id INTEGER REFERENCES People NOT NULL,notes TEXT NOT NULL,PRIMARY KEY(id)The important thing here, for thepurposes of this discussion, is theREFERENCES People clause in thedefinition of the Appointments table.With the REFERENCES clause in place,I can do the following:application that I was developing in Rubyon Rails, I would use database migrationsto create them. The migrations wouldlook something like this (if combined intoa single file):class CreatePeopleAndAppointments < ActiveRecord::Migrationdef changecreate_table :people do |t|t.string :first_name, :null => falset.string :last_name, :null => falseINSERT INTO People (first_name, last_name, email, phone)VALUES ('Reuven', 'Lerner', 'reuven@lerner.co.il',t.string :email, :null => falset.string :phone, :null => false➥'847-230-9795');INSERT INTO Appointments (person_id, notes) VALUES (1, 'Meet with➥myself');endt.timestampsNow, here’s what happens if I try todelete myself from the People table:create_table :appointments do |t|t.integer :person_id, :null => falset.text :notes, :null => falseatf=# DELETE FROM People WHERE id = 1;ERROR: update or delete on table "people" violates foreign keyconstraint "appointments_person_id_fkey" on table "appointments"DETAIL: Key (id)=(1) is still referenced from table "appointments".Because Appointments.person_id is aforeign key to People.id, I cannot removea row from People if Appointments refersto it. This sort of relational integrity isa great thing for my application. Nomatter how you slice it, it means I cannotremove people from the system if theyare scheduled for an appointment.Rails IntegrationIf these tables were part of a Webendendendt.timestampsNow, this will give me the tabledefinitions, but it won’t give me theforeign keys, meaning that I couldcorrupt the database by deletinga single “People” record. In orderto do that, at least in the defaultRails configuration, I need to use an“execute” command in the migration tosend explicit SQL to the database.Fortunately, there is an easier way.WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 37


COLUMNSAT THE FORGEThe Foreigner gem, written by Matthew Higgins,which works with MySQL as well as with PostgreSQL,adds syntax to let you create and remove foreignkeys within your migrations.The Foreigner gem, written by MatthewHiggins, which works with MySQL aswell as with PostgreSQL, adds syntax tolet you create and remove foreign keyswithin your migrations. For example,with Foreigner active—putting it inthe Gemfile and then running bundleinstall—I can add a new migrationthat does the following:class AddForeignKey < ActiveRecord::Migrationenddef upendadd_foreign_key :appointments, :peopledef downendremove_foreign_key :appointments, :peopleSure enough, after running thismigration, I have the foreign key that Iwas hoping for. Again, it’s still importantfor me to have this check in my Railsmodel; otherwise, I easily could getmyself into a situation that the databaseforbids but the model allows and, thus,generate a bad error for my users.Note: if you add the foreign keyafter you already have populated thedatabase with some data, you might runinto trouble. That’s because PostgreSQLwon’t let you add a foreign key if one ormore rows fail to abide by the declaredconstraint. The NOT NULL constraintadditionally ensures that you point tosome person in the system. In otherwords, every appointment has to be withsomeone, and it has to be with someonein the People table.Now, let’s say you already have beenworking on a project, but you haveneglected to define foreign keys. Youcould go through each table, figureout which is pointing to which, andthen handle it accordingly, adding thesorts of migrations shown above. Butthe Immigrant gem (also written andreleased by Matthew Higgins) looks atRails models and adds foreign keys forevery has_one and has_many columnthat points to a belongs_to column inanother model.Additional ChecksForeigner is a great step forward, helpingensure the integrity of your data. Butthere is another issue, more subtle thanthat of foreign keys, to which I’m stillsusceptible. While I have ensured that the38 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSAT THE FORGEperson_id will not be NULL and will pointto a record in the People table, I haven’tensured that the People table will containvalid and reasonable values. That is,although the first_name, last_name, e-mailand phone columns will not containNULLs, they might contain empty stringsor values (for example, e-mail addresses)that are invalid.At the database level, I could handlethis sort of problem with a CHECKclause. Such a clause ensures that illegaldata—for whatever “illegal” values Iwant to define—cannot be placed in thedatabase. This could be anything froma certain minimum or maximum lengthof text string, to a pattern in the text,to minimum or maximum numbers. Forexample, I often like to indicate that thedatabase may not contain a price lowerthan 0, and that e-mail addresses needto match a very basic regular expression.(Note that matching e-mail addresses isnot for the faint of heart, at least if youwant to do it correctly.)So given my People table, I coulddefine a set of CHECK clauses thatwould ensure that the first_name fieldis non-empty. In other words, thefirst_name cannot be NULL, but it alsocannot be the empty string:ALTER TABLE People ADD CONSTRAINTpeople_first_name_non_empty_chks CHECK (first_name '');constraint, I prefer not to do so. Thatgives me greater flexibility to add andremove constraints, and it also ensuresthat when a constraint is violated,PostgreSQL will accurately tell me whichone it was.Now, how can I implement this in myRails migration, and do I want to? Myanswer, as you can imagine, is that thiswould indeed be a good thing to includein the database.Once again, a Ruby gem comes to therescue. This one, sexy_pg_constraints,was written by Maxim Chernyak, but ithas since forked and is maintained byseveral other people.I can include it in my Rails applicationby adding the following to my Gemfile:gem 'Empact-sexy_pg_constraints', :require => 'sexy_pg_constraints'and by uncommenting the following linein config/application.rb:config.active_record.schema_format = :sqlSimply put, sexy_pg_constraints adds anumber of additional attributes that you canpass to an add_column or change_columninvocation within your migration. Forexample, let’s say I want to make sure,as before, that the first_name column isnever blank—neither NULL nor the emptystring. I can do this by saying:Note that although I could add anumber of checks within this singleclass AddConstraints < ActiveRecord::Migrationdef upWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 39


COLUMNSWORK THE SHELLScripting Lite:Shell Aliases andFunctionsDAVE TAYLORYou’ve probably never paid much attention to the aliasesin your .bashrc, but Dave has. In this column, he talksabout aliases and command-line functions as an easy wayto expand your command-line options.Let’s take a side trip this month, a journeyaway from the busy thoroughfare of shellscript programming and into the relativelypeaceful eddy of shell aliases. You probablyare thinking “Yeah, that’s two paragraphs.Then what?” But stick with me—there’s a bitmore to aliases than you may have realized.To start, you can see what system andshell aliases you have simply by typingalias on the command line:$ aliasalias adb='/home/taylor/Documents/Android\ SDK/platform-tools/adb'alias grep='grep --color=always'alias ksnap='/home/taylor/Documents/Android\ SDK/tools/ddms'alias ls='ls -F'alias scale='/home/taylor/bin/scale.sh'alias vps='ssh taylor@intuitive.com'For the most part, these arestraightforward substitutions of oneword for another. Type in vps, and theshell expands that to be the commandssh taylor@intuitive.com.Look at the alias for ls though.You can start to see a tiny bit of thesophistication that the shell has byrealizing that it’s not an alias loop.I mean, ls = ls -F, so shouldn’tls -F expand to ls ls -F -F,and so on? Fortunately, it doesn’t,and that’s one reason you easily cantweak and expand your command-linecapabilities with aliases.In fact, my sequence is often: type inthe command until it’s complex enoughfor an alias, which works until it’ssufficiently complex for a shell script.Let me show you a handy trick bydemonstrating how I’d turn a complex42 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSWORK THE SHELLBash actually has a ton of handy little shortcutslike that, and they’re doubly useful if you havecomplex directory names and filenames!find command into an alias:$ find . -name "*core*" -printNow let me use a shell shortcut, !!,for the most recent command I’ve typedin within a more-complicated sequence:There’s another shell shortcut foryou: !$ is the last word of the previouscommand—in this case, it’ll be thefilename you just created. Bash actuallyhas a ton of handy little shortcuts likethat, and they’re doubly useful if you havecomplex directory names and filenames!$ echo "alias findcore=\"!!\"" >> ~/.bashrcThere’s no feedback, so what did itactually do? To find out, examine thelast line of the .bashrc:$ tail -1 ~/.bashrcalias findcore="find . -name *core* -print"Nice. Easy. When your alias hasbecome sufficiently complicated thata shell script would be easier, you cando the same basic thing again:$ alias findcore > findcore.shOr, if you’re really organized like I am,perhaps you could put the script into your~/bin directory, which changes the last line to:$ alias findcore > ~/bin/findcore.sh$ vi !$Complicated AliasesThe most common problem withan alias is that you want to beable to have the alias process thearguments you’ve specified, but notjust automatically at the end of thecommand sequence. For the ls aliasabove, it’s no big deal, but what if youwanted to have something like this:alias ls="ls -F | tee -a ~/ls.log"This would let you have a running logof all output from the ls command.The alias works fine until you invoke itwith something like:$ ls /homein which case it’s expanded to:ls -F | tee -a ~/ls.log /homeWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 43


COLUMNSWORK THE SHELLThat’s going to generate an error.To get this to work, you need to switch froman alias to a shell function. It’s not too commonon the command line, but quite reasonablefor your .bashrc. Here’s how I’d write it:list() {ls -F $* | tee -a ~/ls.log}Experimentation reveals that you can shrinkthings down to a minimum of two lines,but no shorter. So, do something like this:list() { ls -F $* | tee -a ~/ls.log }and the shell will give you the secondaryprompt (usually >) waiting for the closingcurly brace. Because you’re going to dropthis into the .bashrc, it doesn’t reallymatter much.Let’s do some more interesting thingswith these functions, fancier aliases thatdemonstrate a bit of the power of thesecommand-line functions.First off, convert to uppercase:upper() {echo $1 | tr '[a-z]' '[A-Z]'}Oops. It’s easily fixed by substituting$* instead. While I’m at it though, let’suse defined letter groups and sets ratherthan explicit ranges:upper() {}echo $* | tr '[[:lower:]]' '[[:upper:]]'$ upper linux journal rocksLINUX JOURNAL ROCKSThat’s better.A very nice feature of most modern <strong>Linux</strong>systems is a change directory command paircalled pushd and popd. Use pushd insteadof cd to change your current workingdirectory, and the system will rememberwhere you were. Do a quick popd, andyou’re back. That’s helpful when you’rebouncing around the filesystem, but what ifyou don’t have these available?Implementing a similar directory stackin a couple command functions is actuallystraightforward. Here’s my version:push() {current=$(pwd)cd $1 ; echo "Moved to $1"; ls -CF}That looks good, but there’s a flaw in thisfunction. Can you see it? The problem isthat if I use $1, I don’t get multiword input:$ upper linux journal rocksLINUXpop() {echo "Moving back to $current"cd $current}Let’s give it a shot:44 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSWORK THE SHELLYou can imagine how you really could make somechanges to how the <strong>Linux</strong> command line worksfor an unsuspecting colleague, but perhaps youshould leave that for when a new person joins yourteam and you feel the need for some hazing.$ push $HOMEwhere the name collides with an alias:Moved to /home/taylorDesktop/ Dropbox/ Movies/ Presentations/Documents/ Google Drive/ Music/ Public/Downloads/ Library/ Pictures/ Sites/What’s happened to the “current”shared variable? Let’s have a look:$ echo $current/home/taylor/DesktopTo switch back to the previous workingdirectory, therefore, all that’s needed is:$ popMoving back to /home/taylor/DesktopThat’s easy!To make this more sophisticated, youwould need to use an array of directorynames and keep track of your arraydepth. It’s not too difficult, but I’ll leavethat one for an enthused reader!One big difference between aliasesand functions, I should note, is that youcan use an alias with the same name asa command, but most shells complainabout attempts to define a function$ ls() {-bash: syntax error near unexpected token '('$ unalias ls$ ls() {}echo "file listing disabled"$ ls /file listing disabledSneaky, isn’t it? You can imagine howyou really could make some changes tohow the <strong>Linux</strong> command line works foran unsuspecting colleague, but perhapsyou should leave that for when a newperson joins your team and you feel theneed for some hazing.Anyway, that’s it for aliases andcommand-line functions this time. Iencourage you to explore and experimentwith what you can do in your .bashrcto make your command-line interactionmore pleasant and efficient.■Dave Taylor has been hacking shell scripts for more than 30 years.Really. He’s the author of the popular Wicked Cool Shell Scriptsand can be found on Twitter as @DaveTaylor and more generallyat http://www.DaveTaylorOnline.com.WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 45


COLUMNSHACK AND /N900 witha Slice ofRaspberry PiKYLE RANKINNow that I’ve finally got my Raspberry Pi, let the hacks begin.First off: N900 Raspberry Pi hacks.It may not come as a surprise toanyone who regularly reads my columnthat I tried to be first in line to orderthe Raspberry Pi. I mean, what’s notto like in a $35, 700MHz, 256MB ofRAM computer with HDMI out that runs<strong>Linux</strong>? In the end, I didn’t make thefirst batch of 10,000, but I wasn’t toofar behind either. So, now that I’ve hada Raspberry Pi for a week, I’ve alreadyfound a number of interesting uses forit. You can expect more Raspberry Picolumns from me in the future (possiblyincluding an update to my beer fridgearticle), but to start, in this article, Italk about a combination of two of myfavorite pocket-size <strong>Linux</strong> computers: theRaspberry Pi and my Nokia N900.At first you may wonder why combinethe two computers. After all, both arearound the same size and have similarinitial hardware specs. Each computerhas its own strengths, such as cellularnetworking and a touchscreen on theN900 and an Ethernet port and HDMIvideo output on the Raspberry Pi. In thisarticle, I explain how to connect theN900 to the Raspberry Pi in a privateUSB network, share the N900’s cellularconnection, and even use the N900as a pocket-size display. In all of theexamples, I use the default Debian SqueezeRaspberry Pi image linked off the mainhttp://www.raspberrypi.org page.Set Up USB TetheringThe first step to using the N900 with theRaspberry Pi is to set up a private USBnetwork between the two devices. Thereare a number of ways to do this, but I’vefound that the most effective way is viathe Mobile Hotspot application on the46 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSHACK AND /N900. This program started as a way toallow you to tether your computer withyour N900 by turning the N900 into awireless access point; however, becauseit uses WEP for security, I always favoredusing Mobile Hotspot’s lesser-known USBnetworking option. That way, I not onlyget to tether my laptop, but becausetethering uses up quite a bit of batterypower, by being plugged in over USB, mylaptop can keep my N900 charged as well.By default, the Raspberry Pi is notset up to enable USB networking, butluckily, this is easy to set up. Just login to your Raspberry Pi and edit the/etc/network/interfaces file as root.Below where it says:iface eth0 inet dhcpadd:Jan 1 01:04:44 raspberrypi kernel: usb 1-1.3: new high speed➥USB device number 5 using dwc_otgJan 1 01:04:44 raspberrypi kernel: usb 1-1.3: New USB device found,➥idVendor=0421, idProduct=01c8Jan 1 01:04:44 raspberrypi kernel: usb 1-1.3: New USB device➥strings: Mfr=1, Product=2, SerialNumber=0Jan 1 01:04:44 raspberrypi kernel: usb 1-1.3:➥Product: N900 (PC-Suite Mode)Jan 1 01:04:44 raspberrypi kernel: usb 1-1.3:➥Manufacturer: NokiaJan 1 01:04:47 raspberrypi kernel: cdc_ether 1-1.3:1.8: usb0:➥register 'cdc_ether' at usb-bcm2708_usb-1.3,➥CDC Ethernet Device, 66:77:ea:fa:12:8cJan 1 01:04:47 raspberrypi kernel: usbcore: registered new➥interface driver cdc_etherJan 1 01:04:47 raspberrypi kernel: cdc_acm 1-1.3:1.6:➥ttyACM0: USB ACM deviceJan 1 01:04:47 raspberrypi kernel: usbcore: registered➥new interface driver cdc_acmJan 1 01:04:47 raspberrypi kernel: cdc_acm: USB Abstract➥Control Model driver for USB modems and ISDN adaptersiface usb0 inet dhcpNow, launch the Mobile Hotspotprogram on your N900 and make sureit is configured so that the Interface isset to USB, as shown in Figure 1. Thenconnect the Raspberry Pi to your N900,which should prompt you to selectbetween Mass Storage mode or PC Suitemode. Choose PC Suite mode, and thenclick the Start button on the MobileHotspot GUI. This automatically shouldset up the USB network for you, and youshould see logs like the following in yourRaspberry Pi’s /var/log/syslog:The point-to-point network that isset up turns your N900 into a gatewaywith the IP address of 10.8.174.1,and your Raspberry Pi is given theIP 10.8.174.10, which you can seefrom the output of ifconfig on theRaspberry Pi:usb0Link encap:Ethernet HWaddr 66:77:ea:fa:12:8cinet addr:10.8.174.10 Bcast:10.8.174.255 Mask:255.255.255.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1RX packets:309 errors:0 dropped:3 overruns:0 frame:0TX packets:204 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:25703 (25.1 KiB) TX bytes:30676 (29.9 KiB)WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 47


COLUMNSHACK AND /Figure 1. Mobile Hotspot Configured for USB TetheringBecause the N900 is set up as thegateway, your Raspberry Pi can use thatcell-phone network for any outboundconnections without having to worry aboutplugging in the Ethernet port. In addition, ifyou start the SSH service on the RaspberryPi (sudo service start ssh) or better,if you make sure it’s enabled at boot, youcan ssh into your Raspberry Pi from theN900 with ssh pi@10.8.174.10 froma terminal. If for some reason when youtry to ssh to this IP, you get a “no routeto host” error, investigate your RaspberryPi logs and confirm that you truly aregetting the 10.8.174.10 IP. I found onmy system that occasionally I would get a.11 or .12 IP instead.N900 as a Remote DisplayNow that I have a point-to-pointlocal network between my N900 andRaspberry Pi, I’ve removed the need toconnect a network cable, but I still havethat pesky HDMI cable to get rid of.After all, you may want to hack on yourRaspberry Pi in places where you don’thave an HDMI-enabled display handy.Luckily, with a few tweaks you can useyour N900 touchscreen as a displayfor the Raspberry Pi and still be ableto use a keyboard or mouse you haveconnected to the Raspberry Pi for input.Unfortunately, I can’t just connectthe composite out or HDMI out ofthe Raspberry Pi into the N900, but48 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSHACK AND /what I can do is take advantage of therelatively low-latency local USB networkand share the N900 X display over VNC.The first step is to install the x11vncpackage on the Raspberry Pi with sudoapt-get install x11vnc.Once x11vnc is installed, I need to set itup so that it automatically launches whenX launches. I suppose this isn’t absolutelynecessary. After all, you could ssh inevery time and start it yourself, but I thinkhaving it automatically launch is much moreconvenient. To do this, create a file called/home/pi/.config/autostart/x11vnc.desktopwith the following contents:to start the X server”, select Anybody.The final step is to start X at boot time.There are a number of ways to do this, butone of the easiest ways on the RaspberryPi is via the /boot/boot.rc file. By default,the file is not there, but if present, itallows you to specify commands to runduring the boot process so you can dothings like enable SSH or start X. Thefollowing /boot/boot.rc file does both:# sourced from rc.local on Raspberry Pi## Name this file as "boot.rc" and put it on the boot# partition if you want to run it.[Desktop Entry]Name=X11VNC ServerComment=Share this desktop by VNCExec=x11vnc -foreverIcon=computerTerminal=falseType=ApplicationStartupNotify=false#StartupWMClass=x11vnc_port_promptCategories=Network;RemoteAccess;Next, I need to change the settingsfor the x11-common package so thatit allows X sessions to be launched byany user. This is necessary so that I canrun startx at boot time automatically.Without this change, X will detect it’snot being run from a console session,and it will error out. To do this, run sudodpkg-reconfigure x11-common, andwhen prompted to select “Users allowed# echo "Checking ssh and enabling if absent"if ! ls /etc/rc5.d | grep "^S..ssh\$" >/dev/null; thenfiinsserv sshservice ssh startsu pi -c startxOnce you boot the Raspberry Pi andset up the local USB network, if youssh in and run ps -ef, you shouldsee that x11vnc is running. Now youcan launch a VNC client from the N900(I prefer Presence VNC) and connectto 10.8.174.10, and you should seea copy of your Raspberry Pi X session(Figure 2). If you had the HDMI cableconnected when the Raspberry Pi booted,the X session should be at full 1080presolution, which might show up a bitsmall on the N900 screen. However,if you boot without HDMI connectedWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 49


COLUMNSHACK AND /Sure, it’s not as nice as a giant 1080p display, butthen it’s hard to fit one of those in your pocket.Figure 2. Presence VNC on N900 Connected to the Raspberry Pi(which is the general use case for thishack), the X session will be configuredfor composite output and be at a moremanageable 640x480. At this resolution,once you tell Presence VNC to go fullscreen, it uses up the full N900 display,and because you are sharing a real Xsession on the Raspberry Pi, you can useany keyboard or mouse you have pluggedin to it. Sure, it’s not as nice as a giant1080p display, but then it’s hard to fitone of those in your pocket.Although I’ve talked about the N900 alot for this hack, the same principles shouldwork to turn just about any device thatcan run a VNC client into a display for theRaspberry Pi, provided the two devices canconnect over the network. In fact, if youare one of the many people who carry acolor tablet around anyway, that would bea quite ideal display for the Raspberry Pi.■Kyle Rankin is a Sr. Systems Administrator in the San FranciscoBay Area and the author of a number of books, including TheOfficial Ubuntu Server Book, Knoppix Hacks and Ubuntu Hacks. Heis currently the president of the North Bay <strong>Linux</strong> Users’ Group.50 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Financial Research Associates and theHedge Fund Business OperationsAssociation proudly presentsFor FinancialServices OnlyCloudOps<strong>2012</strong>Cloud Computing for Asset ManagersOptimizing Cost Management, Operational Efficiency, Security, & ComplianceSeptember 27 – 28, <strong>2012</strong>The Princeton ClubNew York, New YorkSMThe most comprehensive and pointed conference detailing the intersection ofcloud computing and asset management:➢➢➢➢➢➢➢➢Physical and logical security concerns pertinent to the financial sectorBottom dollar logistics: Impacting cost management and measuring your ROIOperational efficiency for asset managers: Disaster recovery, mobility, compliance, multi-tenancy,scalability, flexibility, low-latency and more!Best practices for vendor selection and due diligence to ensure optimal leverage, security, andcost managementCloud governance: New paradigms for IT management, financial control of the cloud, delegated access,streamlining and maintaining data, access and ownership concernsCloud compliance: Ensuring SEC/FINRA readiness, as well as ensuring that you know whatyour investors expect from youDefining the cloud and identifying appropriate platforms for your firm: From back-office andadministrative systems to mission-critical applications and systemsCloud preparation and readiness: Cost profiling, initial forays, strategic planning, and governanceTo Register: Call 800-280-8440 or visit us at www.frallc.comMention FMP171 and save 15% off the standard rate


COLUMNSTHE OPEN-SOURCE CLASSROOMMirror, Mirror,Down the HallSHAWN POWERSYou can’t host the whole Internet yourself, but you can hostentire distributions!If you’re the sort of person who installs<strong>Linux</strong> a lot, or if you have a large numberof servers/desktops to keep updated,having a local mirror can be a real timeand bandwidth saver. You may have ahuge Internet connection and want toset up a public mirror, but for the sakeof this article, let’s assume you just wanta personal mirror. I have a local mirrorboth at home and at work, and I’ve neverregretted setting it up.Before setting up your local mirror,Figure 1. My local mirror is simple, but very useful.52 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSTHE OPEN-SOURCE CLASSROOMIn order to be a proper netizen, you should locatea mirror that is close to you.Hosting OptionsFor the sake of simplicity, I recommendusing HTTP to serve your mirroredfiles. You can add other protocols ifyou like (rsync, FTP or NFS), but at thevery least, I recommend HTTP. Becausemost mirrors use a subdirectory for filestorage, you generally can just add afolder or symlink to your current Webserver and never touch the Web server’sconfig files. For this article, I assumeyou’re adding the mirror to your/var/www/default folder. If you’re usinga distro that is not Debian-based, adjustyour folder structure as necessary.The Initial SyncEventually, I’ll explain how toautomate the update process, but forthe initial sync, I recommend watchingto make sure it works right. Themethod for mirroring Ubuntu is similarto mirroring CentOS; it’s basically asingle rsync command.In order to be a proper netizen,you should locate a mirror that isclose to you. A list of Ubuntu mirrorsis located at https://launchpad.net/ubuntu/+archivemirrors, and alist of CentOS mirrors is athttp://www.centos.org/modules/tinycontent/index.php?id=13.Once you’ve determined the mirror youwant to use (make sure to choose onethat supports rsync), create your localmirror directories:sudo mkdir /var/www/default/ubuntusudo mkdir /var/www/default/centosAgain, you can adjust the locationof your mirrors however you like,the above just creates folders inthe default Apache server locationin Ubuntu. Using your local mirroraddress, type the following to mirrorthe Ubuntu repository:sudo rsync -a --progress \rsync://your.ubuntu.mirror.com/ubuntu \/var/www/default/ubuntuThen make a pot of coffee—or 20.You should get screens full of textas your local mirror is created. Oncethat is finished, a similar one-liner willcreate your CentOS mirror:sudo rsync -a --progress \rsync://your.centos.mirror.org/CentOS/* \/var/www/default/centosThis will take a long time too. Youmay want to switch to decaf for thissecond batch of coffee.54 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


COLUMNSTHE OPEN-SOURCE CLASSROOMLook at Yourself in the MirrorAssuming everything was done correctly,you should have a working mirror! Testit out by going to http://your.server.ip/ubuntu and http://your.server.ip/centos,and see if you get a listing that lookslike a mirror. If you get a message about“forbidden” from your Web server,you’ll need to add indexing optionsin order to see directory listings. Evenwithout the indexing options added,however, the mirror itself should work.Now What?Depending on the mirror you chose forCentOS, you may or may not have amirror that includes ISO files. If you don’twant the ISO files, or a particular releaseversion, you can, of course, modifythe rsync command appropriately. Forexample, adding --exclude isos willcompletely ignore the entire isos folder,saving you lots of room and lots ofbandwidth. I encourage you to tweak thersync commands to create a mirror thatserves your needs.Once you have your mirrors tweakedhow you like them, I recommendgiving them a good test. Installa system from your mirror. WithUbuntu, the best way to do that is bygetting the minimal install CD fromhttps://help.ubuntu.com/community/Installation/MinimalCD and bootingit via CD or USB. When the installerasks you to choose a mirror, scroll allthe way to the top of the list and enteryour server address manually. If all goeswell, the system should find your serverand install Ubuntu completely over thenetwork. It’s really pretty cool to see.For CentOS, you need to find a mirrorcontaining ISO files and download thenetinstall ISO. Look in the iso folderinside the version number folder for theiso file ending with netinstall.iso.Booting from the netinstall ISO issimple, but knowing the mirror URL isa little tricky. You’ll want to use yourlocal URL for something like this:http://server.ip/centos/6.2/os/i386.Obviously, you’ll need to replace the“6.2” with whatever version of thenetinstall ISO file you downloaded. Theinstaller should download install.img, andfrom there the installer should look justlike an install using CDs or DVD.Keeping It All Up to DateHopefully, you now have a fully workingmirror on your local server. Of course, inthe time it took you to read this far intothe article, it’s probably already outdated.That means you need some way to keepthings updated, preferably without anywork on your part. Thank you cron.At the very least, the commandsshown above to do the initial rsync willdo the trick if you put them into cron. Irecommend removing the --progressflag and adding --delete-after sothat you don’t get so much feedbackand so obsolete files are deleted fromyour mirror. If you get tired of the nightlyWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 55


COLUMNSTHE OPEN-SOURCE CLASSROOMListing 1. Mirror Script#!/bin/bash## Mirror Script /usr/local/bin/mirror-sync.sh## Originally created by Peter Noble, modified by many nerds## Point our log file to somewhere and set up our admin emaillog=/var/log/mirrorsync.logadminmail=shawn@brainofshawn.com# Set to 0 if you do not want to receive emailsendemail=1# Subject is the subject of our emailsubject="Your Mirror Did Its Thing"# Set up the server to mirrorremote=rsync://archive.ubuntu.com/ubuntu# Set up the local directory for your mirrorlocal=/var/www/default/ubuntu## Initialize some other variablescomplete="false"failures=0status=1pid=$$echo "`date +%x-%R` - $pid - Started Mirror Sync" >> $logwhile [[ "$complete" != "true" ]]; doif [[ $failures -gt 0 ]]; then## Sleep for 5 minutes for sanity's sake## The most common reason for a failure at this## point is that the rsync server is handling## too many concurrent connections.sleep 5mfiif [[ $1 == "debug" ]]; thenecho "Working on attempt number $failures"rsync -a --delete-after --progress $remote $localstatus=$?elsersync -a --delete-after $remote $local >> $logstatus=$?fiif [[ $status -ne "0" ]]; thencomplete="false"(( failures += 1 ))elseecho "`date +%x-%R` - $pid - Finished" >> $log# Send the emailif [[ -x /usr/bin/mail && "$sendemail" -eq "1" ]]; thenmail -s "$subject" "$adminmail"


COLUMNSTHE OPEN-SOURCE CLASSROOMRather than putting the rsync commands directlyinto your crontab, I recommend creating a couplescripts with some error management.e-mail messages from the output, youmight want to add > /dev/null to theend of the cron line, so you’re notifiedonly if there are errors.Unfortunately, things often gowrong. Rather than putting the rsynccommands directly into your crontab,I recommend creating a couple scriptswith some error management. Thescript I use (Listing 1) is pieced togetherfrom several people, cobbled a bit onmy own, and has been shared all over.Feel free to take from it and modify itas you like. After reading this article, itshould be self-explanatory, and it canbe adapted for Ubuntu or CentOS.But I Want More!Once you have the basic mirrors on yourserver, you’ll quickly notice a few thingsare missing. With Ubuntu, you won’thave any ISO images. If you want to havea local copy of ISOs, you can get thatwith another simple rsync command:rsync -a --progress \rsync://rsync.releases.ubuntu.com/releases \/var/www/default/releases/Just like with the main archive,on subsequent rsyncs, I recommendremoving the --progress flag andadding --delete-after. That willkeep your mirror clean. It’s possible tocreate a script for this process as well,but honestly, I just put the rsync one-linerin my crontab. There aren’t too manychanges to the ISO repository; it reallychanges only when there is a release.Ultra Small Panel PCŸARM9 400Mhz Fanless ProcessorŸUp to 1 GB Flash & 256 MB RAMŸ4.3" WQVGA 480 x 272 TFT LCDŸAnalog Resistive TouchscreenŸ10/100 Base-T EthernetŸ3 RS232 & 1 RS232/422/485 PortŸ1 USB 2.0 (High Speed) Host portŸ1 USB 2.0 (High Speed) OTG portŸ2 Micro SD Flash Card SocketsŸSPI & I2C, 4 ADC, Audio BeeperŸBattery Backed Real Time ClockŸOperating Voltage: 5V DC or 8 to 35V DCŸOptional Power Over Ethernet (POE)ŸOptional Audio with Line-in/outŸPricing starts at $375 for Qty 1The PPC-E4+ is an ultra compact Panel PC that comes ready torun with the Operating System fully configured on the onboardflash. The dimensions of the PPC-E4+ are 4.8” by 3.0”, about thesame as that of popular touch cell phones. The PPC-E4+ is smallenough to fit in a 2U rack enclosure. Apply power and watch eitherthe <strong>Linux</strong> X Windows or the Windows CE User Interface appearon a vivid 4.3” color LCD. Interact with the PPC-E4+ using theresponsive integrated touch-screen. Everything works out of thebox, allowing you to concentrate on your application rather thanbuilding and configuring device drivers. Just Write-It and Run-It.www.emacinc.com/panel_pc/ppc_e4+.htmSince 1985OVER27YEARS OFSINGLE BOARDSOLUTIONSNew - PPC-E4+2.6 KERNELEQUIPMENT MONITOR AND CONTROLPhone: ( 618) 529-4525 · Fax: (618) 457-0110 · Web: www.emacinc.comWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 57


COLUMNSTHE OPEN-SOURCE CLASSROOMBecause there is no rsync server for the Partnerrepo, the best way to create and maintain a mirroris by using debmirror.The Partner repository is a little trickierto mirror. There is no rsync server setup for mirroring the Partner repository,and I suspect it is because the Partnerrepo contains commercial software andmirroring it violates distribution rights orsome such thing. I’m not a lawyer, butas long as you’re mirroring on your localnetwork for your own personal use, I don’tthink you run any risk of breaking the law.Because there is no rsync serverfor the Partner repo, the best way tocreate and maintain a mirror is by usingdebmirror. debmirror is a neat tool thatuses the HTTP protocol to downloadpackages and organizes them to matchthe remote repository. It can be used formirroring the entire Ubuntu (or Debian)repository, but I usually use rsync for themain archive, because it’s so simple.Setting up debmirror requires somework with GPG keys and modifyinga script to fit your needs. The folksat Ubuntu have outlined the process(although they don’t mention it canbe used to mirror the Partner repo!)here: http://help.ubuntu.com/community/Debmirror.I’ve also created a quick little videooutlining how I keep my mirrors insync on the LJ Web site. Here is avideo showing my setup:http://www.linuxjournal.com/video/mirror-partner-repo-canonical.Waste Not, Want Not?Hosting your own mirrors takes a lotof disk space. The initial syncs take aton of bandwidth. If you run a networkthat would benefit from a local mirror,however, it really can be worth the effort.Once you have a local mirror setup, if you can afford the bandwidth, Iencourage you to become a public mirror.Canonical and CentOS have methods andrequirements for becoming a public andofficial mirror. Although not everyonecan do it, it’s folks like you and me whomake the Open Source community sostrong. Even if you can’t contributecode, perhaps you’re in a situation tocontribute bandwidth. I’m not, as neitherof my locations has enough bandwidthto support a public mirror, but perhapssomeone reading this article will!■Shawn Powers is the Associate Editor for <strong>Linux</strong> <strong>Journal</strong>. He’s alsothe Gadget Guy for <strong>Linux</strong><strong>Journal</strong>.com, and he has an interestingcollection of vintage Garfield coffee mugs. Don’t let his silly hairdofool you, he’s a pretty ordinary guy and can be reached via e-mailat shawn@linuxjournal.com. Or, swing by the #linuxjournal IRCchannel on Freenode.net.58 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


NEW PRODUCTSNoMachine’s NoMachineEven if your friends and colleagues aren’tenlightened enough to run <strong>Linux</strong>, you stillcan rule their PCs remotely. NoMachine’snew NoMachine v.4 is a free, remotedesktop control application that nowruns on Mac OS and Windows in additionto the previously supported <strong>Linux</strong>. Thismultiplatform support allows any PCrunning <strong>Linux</strong>, Mac OS or Windows tocontrol any remote computer running thesesame OSes, including full access to files and programs, real-time audio and videostreaming, secure file sharing even through firewalls, screen capture and printing toany printer, all at a high level of performance.http://www.nomachine.comSonatype Insight for Continuous IntegrationSonatype Insight forContinuous Integration(CI) solves a major problemthat anyone developingsoftware today should careabout—the discovery of security and licensing issues with third-party components.Insight for CI, Sonatype’s latest intelligent tool for component-based softwaredevelopment, enables developers to find quality, security and licensing problemsand enforce open-source policy at build time, before fixes become costly and timeconsuming. Insight for CI supports agile development processes with analysis ofevery component in every build, alerting developers immediately of any changes orpolicy violations that put their project at risk. Support for Hudson and Jenkins alsois included. Sonatype hosts the Central Repository, the software industry’s primarysource for open-source components, housing more than 300,000 components andserving more than five billion requests per year.http://www.sonatype.com60 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


NEW PRODUCTSUserful MultiSeat 5Are you a solution provider or OEM lookingfor new revenue opportunities? If so, UserfulCorporation wants you to take a peek atthe new Userful MultiSeat 5, a product it calls “the first <strong>Linux</strong>-based software to delivervirtual desktops on Ethernet true zero client hardware”. Userful MultiSeat 5 is reportedto deliver PC-like performance at half the cost of thin clients, as well as support flawlessplayback of 20+ simultaneous HD video streams from a single, standard mid-range PC.The solution is targeted at applications like digital signage, kiosks and interactive touchdisplays, VDI, cloud computing, in-flight and in-room entertainment systems, casinogaming systems, government connectivity projects, schools, universities and others.Support is included for zero clients from OEM device-makers, such as HP, Viewsonic, Acer,Atrust, ThinGlobal and LG. Userful says that users worldwide now can afford blazing-fastcomputing performance with dozens of Ethernet true zero clients, powered only by asingle PC or server located anywhere within the LAN.http://www.userful.comRickford Grant and Phil Bull’sUbuntu for Your Mom (No Starch)The Ubuntu <strong>Linux</strong> distribution makes <strong>Linux</strong> easy, andUbuntu for Your Mom makes it even easier. Full of tips,tricks and helpful pointers, this pain-free guide is perfectfor those interested in, but nervous about, switching tothe <strong>Linux</strong> operating system. In the book, authors RickfordGrant and Phil Bull walk readers through common tasks likeinstalling and playing games, accessing social networks,troubleshooting common hardware and software problems,connecting with the Ubuntu community, interacting with Windows and more.Other topics covered in a step-by-step manner include system installation andadministration, enjoying various media, syncing mobile devices, editing and sharingdigital photos and videos, creating documents, customizing the system’s look and feeland working with the command line (or avoiding it altogether).http://www.nostarch.comWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 61


NEW PRODUCTSCorsair Force Series 3 SSDNotebook Upgrade KitsGive your beloved laptop a new lease on life with Corsair’s Force Series 3 SSD NotebookUpgrade Kits, available in 120GB and 240GB sizes. Corsair says that these solid-statedrives enable users to increase their laptop’s performance, battery life and reliability inan affordable manner. SSDs provide significant advantages over standard hard drives,such as faster data read and write speeds, which can reduce software load and boottimes. And because SSDs have no moving mechanical parts, they help laptops runcooler, reduce battery drain and resist data loss from bumps. The upgrade kit, whichincludes a USB-to-SATA cable and migration software, makes switching simple. Thedrives come in a slim 7mm-high case designed to fit in most space-constrained laptops.http://www.corsair.comAltova’s MetaTeamThe beauty of the MetaTeam team management andproject collaboration cloud service, says developerAltova, lies in “breaking down the mysteries of teamorganization into simple steps that anyone—not justproject managers—can use effectively”. Specificallydesigned to foster agile collaboration, MetaTeam makesteams more productive by creating an environment oftransparency and information accessibility. At the sametime, for larger teams and more elaborate projects, itraises accountability and commitment by spelling out individual roles and responsibilities. YetMetaTeam is not just for developers. Business teams in any capacity can access its powerful toolsto enhance collaboration and efficiency in managing projects of any kind. This cloud serviceincludes integrated apps for managing to-dos, team member roles and responsibilities, projectspecificterms and documentation, structured decision making, team member information andmore. Threaded discussions and customizable e-mail alerts promote communication and clarity.http://www.altova.comPlease send information about releases of <strong>Linux</strong>-related products to newproducts@linuxjournal.com orNew Products c/o <strong>Linux</strong> <strong>Journal</strong>, PO Box 980985, Houston, TX 77098. Submissions are edited for length and content.WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 63


A GUIDELZOTO USINGCompressionIN HadoopDeploy and implement MapReduce programsthat take advantage of the LZO compressiontechniques supported by Hadoop.ARUN VISWANATHAN64 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Compression is the process ofreducing the size of actual databy using an algorithm to encode theinformation. Compression providesthe following benefits:n Reduces the hard disk spaceoccupied by the data.n Uses lower transmission bandwidth.n Reduces the time taken to copy ortransfer the data from one locationto another.However, it also comes with akind of data, and it would be beneficialfor enterprises to reduce the space usedto store the data. The Hadoop frameworksupports a number of mechanisms, suchas gzip, bzip and lzo to compress the datathat is stored in HDFS.LZO CompressionLempel-Ziv-Oberhumer (or LZO) is alossless algorithm that compresses datato ensure high decompression speed. Ithas the following characteristics:n Data compression is similar to otherpopular compression techniques,such as gzip and bzip.In Hadoop, using LZO helps reduce data sizeand causes shorter disk read times.problem. Before putting the compresseddata to some use, it first must bedecompressed. After processing, thedata must be compressed again. Thisincreases the time it takes an applicationto process it before it can use this data.As Hadoop adoption grows in corporatecommunities, you see data in terms ofTeraBytes and PetaBytes that is storedin HDFS by large enterprises. BecauseHadoop works well on commodityhardware, high-end servers are notrequired for storing and processing thisn It enables very fast decompression.n It supports overlapping compressionand in-place decompression.n Compression and decompressionhappen on a block of data.n It requires no additional memory fordecompression except for sourcebuffers and destination buffers.In Hadoop, using LZO helps reduceWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 65


FEATURE A Guide to Using LZO Compression in Hadoopdata size and causes shorter diskread times. Furthermore, the blockstructure of LZO allows it to be splitfor parallel processing in MapReduceprograms. These characteristics makeLZO suitable for use in Hadoop.In this article, I look at theprocedure for enabling LZO inHadoop-based frameworks and look ata few examples of LZO’s usage.PrerequisitesThe following describes thesoftware that was set up in CentOS5.5-based machines.Set up and configure the ClouderaDistribution of Hadoop (CDH3) orApache Hadoop 0.20.x in a cluster oftwo or more machines. Refer to theCloudera or Apache Hadoop Web sitesfor more information on setting upHadoop. Alternatively, you also coulduse the Cloudera demo VM as a singlenodecluster for testing.Next, install the LZO package inthe system. Download and install thepackage from its <strong>Linux</strong> distributionrepository. For this article, I installedNew: Intel Xeon E5 Based ClustersBenchmark Your Code on Our Xeon E5 BasedTesla Cluster with:AMBER, NAMD, GROMACS, LAMMPS, or Your Custom CUDA CodesUpgrade to New Kepler GPUs Now!Microway MD SimCluster with8 Tesla M2090 GPUs8 Intel Xeon E5 CPUs and InfiniBand2X Improvement over Xeon 5600 SeriesGSA ScheduleContract Number:GS-35F-0431N


this RPM: lzo-2.04-1.el5.rf.i386.rpm.There are two ways to install theLZO-specific jars that can be usedby Hadoop:n Download and build the hadoop-lzoproject from Twitter that will providethe necessary jars (see Resources).n Download the prebuilt jars inRPM or Debian packages fromthe hadoop-gpl-packing project.For this article, I used this RPM:hadoop-gpl-packaging-0.2.0-1.i386.rpm.The following binaries will be installedon the machine:$HADOOP_GPL_HOME/lib/*.jar$HADOOP_GPL_HOME/nativeHADOOP_GPL_HOME is thedirectory where the hadoop-lzoproject will store the built binaries.Using the prebuilt RPMs, thebinaries will be installed in the/opt/hadoopgpl folder.Note: if you are using a cluster ofmore than one machine, the aboveHarness Microway’s Proven GPU ExpertiseThousands of GPU cluster nodes installed.Thousands of WhisperStations delivered.Award Winning BioStack – LSAward Winning WhisperStation Tesla – PSC with 3D‘11AWARDBESTBest NewTechnologyns/Day (Higher is Better)CPU + GPUCPU Only1.070.332.020.653.541.301 Node2 Nodes 4 NodesNAMD F1-ATP Performance GainConfigure Your WhisperStation or Cluster Today!www.microway.com/tesla or 508-746-7341


FEATURE A Guide to Using LZO Compression in Hadoopthree steps need to be done for allthe machines in the cluster.Deploy LZO in the Hadoop EcosystemFirst, install LZO for Hadoop. Then,add the Hadoop GPL-related jars tothe Hadoop path:$ cp $HADOOP_GP_HOME/lib/*.jar$HADOOP_HOME/lib/files to register external codecs in thecodec factory. Refer to Listing 1 to addthe lines to the $HADOOP_HOME/conf/core-site.xml file.Refer to Listing 2 to add the lines to the$HADOOP_HOME/conf/mapred-site.xml file.The LZO files also need to be added inthe Hadoop classpath. In the beginning ofthe $HADOOP_HOME/conf/hadoop-env.shfile, add the entries as shown in Listing 3.Next, run the following command,depending on the platform you’re using:$ tar -cBf - -C $HADOOP_GPL_HOME/native/ * |➥tar -xBvf - -C $HADOOP_HOME/lib/native/Install LZO for PigAdd the Hadoop GPL-related jars to thePig path:$ cp $HADOOP_GPL_HOME/lib/*.jar $PIG_HOME/lib/Then, update the Hadoop configurationNext, run the following command,Listing 1. Adding LZO Codecs to Hadoop core-site.xmlio.compression.codecsorg.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop➥.io.compress.DefaultCodec,com.hadoop.compression.lzo.➥LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.➥hadoop.io.compress.BZip2Codecio.compression.codec.lzo.classcom.hadoop.compression.lzo.LzoCodec68 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Install LZO for HBaseCopy the Hadoop GPL jars to the HBaselib directory:$ cp $HADOOP_GPL_HOME/lib/*.jar $HBASE_HOME/lib/Run either of the following commands,depending on the platform you’re using:$ cp $HADOOP_GPL_HOME/native/<strong>Linux</strong>-i386-32/lib/*➥$HBASE_HOME/lib/native/<strong>Linux</strong>-i386-32/or:$ cp $HADOOP_GPL_HOME/native/<strong>Linux</strong>-amd64-64/lib/*➥$HBASE_HOME/lib/native/<strong>Linux</strong>- amd64-64/Using LZO CompressionLet’s look at a sample program fortesting LZO in Hadoop. The code inListing 6 shows a sample MapReduceprogram that reads an input file inLZO-compressed format. To generatecompressed data for use with this wordcounter, run the lzop program on aregular data file. Similar sample code isprovided with the Elephant-Bird Project.Sample Program for Testing LZO in PigThe PigLzoTest program shown inListing 7 achieves the same result as theMapReduce program described previouslywith the only difference being it iswritten using Pig.The last line in Listing 7 calls auser-defined function (UDF) to writethe output in LZO format. The codesnippet in Listing 8 shows the contentsof this class. The LZOTextStorer classshown in Listing 8 extends thecom.twitter.elephantbird.pig.store.LzoBaseStoreFunc classprovided by the Elephant-Bird Project forwriting the output in the LZO format.Sample Program for Testing LZOin HBaseTo use LZO in HBase, specify aper-column family compression flagwhile creating the table:create 'test', {NAME=>'colfam:', COMPRESSION=>'lzo'}Any data that is inserted into this tablenow will be stored in LZO format.ConclusionIn this article, I looked at the processfor building and setting up LZOin Hadoop. I also looked at thesample implementation processesacross MapReduce, Pig and HBaseframeworks. LZO compression helps inreducing the space used by data thatis stored in the HDFS. It also providesan added performance benefit dueto the splittable block architecturethat it follows. Faster read times ofLZO compressed data with reduceddecompression time makes it ideal asWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 71


FEATURE A Guide to Using LZO Compression in HadoopListing 6. Sample MapReduce Program to Test LZO in a Hadoop Cluster/*** MapReduce Word count Sample* Input File ~V LZO compressed file* Run com.hadoop.compression.lzo.LZOIndexer /* com.hadoop.compression.lzo.* DistributedLZOIndexer to create .lzo.index file to further* improve the read speed of LZO compressed files.* If the input lzo files are indexed, the input format will take* advantage of it. The input file/directory is taken as the first* argument. The output directory is taken as the second argument.* Uses NullWritable for efficiency.** Usage: hadoop jar path/to/this.jar */public class SimpleLZOWC extends Configured implements Tool {private SimpleLZOWC () {}public static class LzoWordCountMapper extends Mapper {private final LongWritable one = new LongWritable(1L);private final Text word = new Text();@Overrideprotected void map(LongWritable key, Text value, Context context)➥throws IOException, InterruptedException {String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {word.set(tokenizer.nextToken());context.write(word, one);}}}public int run(String[] args) throws Exception {Job job = new Job(getConf());job.setJobName("Simple LZO Word Count");job.setOutputKeyClass(Text.class);job.setOutputValueClass(LongWritable.class);job.setJarByClass(getClass());job.setMapperClass(LzoWordCountMapper.class);job.setCombinerClass(LongSumReducer.class);job.setReducerClass(LongSumReducer.class);// Use the custom LzoTextInputFormat class.job.setInputFormatClass(LzoTextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));}return job.waitForCompletion(true) ? 0 : 1;}public static void main(String[] args) throws Exception {int exitCode = ToolRunner.run(new LzoWordCount(), args);System.exit(exitCode);}72 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Listing 7. Sample Program to Test LZO in Pig/*** Pig Word count Sample* Input File ~V LZO compressed file* Run com.hadoop.compression.lzo.LZOIndexer /* com.hadoop.compression.lzo.* DistributedLZOIndexer to create .lzo.index file to further* improve the read speed of LZO compressed files.* Output - output directory is taken as the second argument.** To generate data for use with this word counter, run lzop* on a data file* Usage: PigLzoTest */public class PigLzoTest {/*** @param args*/public static void main(String[] args) {try {PigServer pigServer = new PigServer("mapreduce");pigServer.registerJar("lib/elephant-bird-2.0.jar");pigServer.registerJar("lzotest.jar");}runWordCountQuery(pigServer, args[0], args[1]);} catch (IOException e) {e.printStackTrace();}/*** Pig Script for Word Count* @param pigServer* @throws IOException*/public static void runWordCountQuery(PigServer pigServer,➥String inputFile, String outputFile) throws IOException {pigServer.registerQuery("A = load '" + inputFile + "';");pigServer.registerQuery("B = foreach A generate➥flatten(TOKENIZE((chararray)$0)) as word;");pigServer.registerQuery("C = filter B by word matches '\\w+';");pigServer.registerQuery("D = group C by word;");pigServer.registerQuery("E = foreach D generate group as word,➥COUNT(C) as count;");pigServer.registerQuery("F = order E by count desc;");pigServer.registerQuery("store F into '" + outputFile + "'➥using com.hadoop.compression.lzo.LzoTextStorer();");}}WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 73


FEATURE A Guide to Using LZO Compression in HadoopListing 8. Sample Pig UDF to Write the Output in LZO Format/*** Write the LZO file line by line, passing each* line as a single-field Tuple to Pig.*/public class LzoTextStorer extends LzoBaseStoreFunc {private static final TupleFactory tupleFactory_ =TupleFactory.getInstance();protected enum LzoTextLoaderCounters { LinesRead }public LzoTextStorer() {}@Overridepublic OutputFormat getOutputFormat() throws IOException {return new TextOutputFormat();}@Overridepublic void putNext(Tuple tuple) throws IOException {if (writer == null)System.out.println("Writer is null");int numElts = tuple.size();StringBuilder sb = new StringBuilder();for (int i = 0; i < numElts; i++) {String field;try {field = String.valueOf(tuple.get(i));} catch (ExecException ee) {throw ee;}sb.append(field);}if (i == numElts - 1) {// Last field in tuple.sb.append('\n');} else {sb.append('\t');}}}Text text = new Text(sb.toString());try {writer.write(NullWritable.get(), text);} catch (InterruptedException e) {throw new IOException(e);}74 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


a compression algorithm for storingdata in the HDFS. It is already a populartechnique that is used by a number ofsocial Web companies, such as Twitter,Facebook and so on, internally to storedata. Twitter also has provided theopen-source Elephant-Bird Project thatprovides the basic classes for using LZO.■Arun Viswanathan is a Technology Architect with the Cloud Labs atInfosys Ltd, a global leader in IT and business consulting services.Arun has about nine years of experience with expertise in Java;Java EE application development; and defining, architecting andimplementing large-scale, mission-critical, IT solutions across arange of industries. He is currently involved in design, developmentand consulting for big-data solutions using Apache Hadoop.ResourcesLempel-Ziv-Oberhumer Algorithm:http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93OberhumerUsing LZO Compression in Hadoop: http://wiki.apache.org/hadoop/UsingLzoCompressionCloudera Demo VM: https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VMLZO Package for Red Hat-Based Distributions: http://rpmforge.sw.be/redhat/el5/en/i386/rpmforge/RPMS/lzo-2.04-1.el5.rf.i386.rpmTwitter Hadoop-lzo Project: http://github.com/kevinweil/hadoop-lzoRPM for LZO for <strong>Linux</strong>: http://code.google.com/p/hadoop-gpl-packingHadoop GPL Compression FAQ: http://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQLZOP binaries for Windows and <strong>Linux</strong>: http://www.lzop.orgHadoop at Twitter: http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compressionTwitter Elephant-Bird Project: https://github.com/kevinweil/elephant-birdWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 75


IntroducingVagrantHave you ever heard the following?“Welcome to the team! Here’s a list of 15applications to install, the instructions are inthe team room, somewhere. See you in a week!”Or: “What do you mean it broke production,it runs fine on my machine?” Or:“Why is this working on her machineand his machine, but not my machine?”JAY PALAT76 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


FEATURE Introducing Vagrantavailable in packages called RubyGemsor Gems. Ruby comes with a packagemanagement tool called gem. To installVagrant, run the gem command:> gem install vagrantVagrant is a command-line tool.Calling vagrant without additionalarguments will provide the list ofavailable arguments. I’ll visit most ofthese commands within this article,but here’s a quick overview:n init — create the baseconfiguration file.n up — start a new instance of thevirtual machine.n suspend — suspend the running guest.n halt — stop the running guest,similar to hitting the power button ona real machine.n resume — restart the suspended guest.n reload — reboot the guest.n status — determine the status ofvagrant for the current Vagrantfile.n provision — run theprovisioning commands.n destroy — remove the currentinstance of the guest, delete thevirtual disk and associated files.n box — the set of commands used toadd, list, remove or repackage box files.n package — used for the creationof new box files.n ssh — ssh to a running guest.The last thing you need to do inyour installation is set up a baseimage. A Box, or base image, is theprepackaged virtual machine thatVagrant will manage. Use the boxcommand to add the Box to yourenvironment. The vagrant box addcommand takes two arguments, thename you use to refer to the Box andthe location of the Box:> vagrant box add lucid32 http://files.vagrantup.com/lucid32.boxThis command adds a new Box tothe system called “lucid32” froma remotely hosted site over HTTP.Vagrant also will allow you to installa Box from the local filesystem:> vagrant box add rhel5.7 rhel5.7-<strong>2012</strong>0120-1223.box78 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


[vagrant] Downloading with Vagrant::Downloaders::File...[vagrant] Copying box to temporary location...[vagrant] Extracting box...[vagrant] Verifying box...[vagrant] Cleaning up downloaded box...edit files from their preferred Editorsor IDEs without updating the Guest.Changes made in the Host or Guestare immediately visible to the userfrom either perspective:>Now there are two Boxes installed:> mkdir ProjectX> cd ProjectX>vagrant box listlucid32rhel5.7It is important to note that youcan reuse these base images. A Boxcan be the base for multiple projectswithout contamination. Changes inany one project will not change theother projects that share a Box. Asyou’ll see, changes in the base can beshared to projects easily. One of thepowerful concepts about using Vagrantis that the development environmentis now totally disposable. You persistyour critical work on the Host, whilethe Guest can be reloaded quickly andprovisioned from scratch.Starting VagrantCreate a directory on the Host as yourstarting point. This directory is yourworking directory. Vagrant will sharethis directory between the Guest andthe Host automatically. Developers canTo get started, you need a Vagrantfile.The Vagrantfile is to similar to a Makefile,a set of instructions that tells Vagranthow to build the Guest. Vagrant usesRuby syntax for configuration. Thesimplest possible Vagrantfile would besomething like this:Vagrant::Config.run do |config|config.vm.box = "lucid32"endThis configuration tells Vagrantto use all the defaults and the Boxcalled lucid32. There actually are fourVagrantfiles that are read: the localversion in the current directory, a userversion in ~/.vagrant.d/, a Box version inthe Box file and an initial config installedwith the Gem. Vagrant reads thesefiles starting with the Gem version andfinishing with the current directory. In thecase of conflicts, the most recent versionwins, so the current directory overridesthe ~/.vagrant.d, which overrides theWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 79


FEATURE Introducing VagrantBox version and so on. Users can create anew Vagrantfile simply by running:> vagrant initThis creates a Vagrantfile in the currentdirectory. The generated Vagrantfilehas many of the common configurationparameters with comments on their use.The file tries to be self-documenting,but additional information is availablefrom the Vagrant Web site. To run mostVagrant commands, you need to be inthe same directory as the Vagrantfile.Let’s give it a try:[default] VM booted and ready for use![default] Mounting shared folders...[default] -- v-root: /vagrantLet’s break this down. I’m running withVirtualBox 4.1.8 and a Guest at 4.1.0.In this case it works smoothly, but thewarning is there to help troubleshootissues should they appear. Next, it sets upthe networking for the Guest:[default] Matching MAC address for NAT networking...[default] Clearing any previously set forwarded ports...[default] Forwarding ports...[default] -- 22 => 2222 (adapter 1)$ vagrant up[default] Importing base box 'lucid32'...[default] The guest additions on this VM do not match the installversion of VirtualBox! This may cause things such as forwardedports, shared folders, and more to not work properly. If any ofthose things fail on this machine, please update the guestadditions and repackage the box.Guest Additions Version: 4.1.0VirtualBox Version: 4.1.8[default] Matching MAC address for NAT networking...[default] Clearing any previously set forwarded ports...[default] Forwarding ports...[default] -- 22 => 2222 (adapter 1)[default] Creating shared folders metadata...[default] Clearing any previously set network interfaces...[default] Booting VM...[default] Waiting for VM to boot. This can take a few minutes.I used the default network setting ofNetwork Address Translation with PortForwarding. I am forwarding port 2222on the Host to port 22 on the Guest.Vagrant starts the Guest in headlessmode, meaning there is no GUI interfacethat pops up. For users who want to usethe GUI version of the Guest, the optionis available in the Vagrantfile to runwithin a window. Once the network isset up, Vagrant boots the virtual machine:[default] VM booted and ready for use![default] Mounting shared folders...[default] -- v-root: /vagrantAfter the Guest has started, Vagrantwill add the shared folder. Vagrant uses80 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


the VirtualBox extensions to mount thecurrent folder (ProjectX in this case) as/vagrant. Users can copy and manipulatefiles from the Host OS in the ProjectXdirectory, and all the files and changeswill be visible in the Guest. If the sharedfolder isn’t performing well because youhave a large number of files, Vagrantdoes support using NFS. However, it doesrequire that NFS is supported by both theGuest and Host systems. At this time, NFSis not supported on Windows Hosts.To access the Guest, VagrantBoxes have a default user withusername:vagrant, password:vagrantand a default shared insecure ssh-key.Vagrant users usually have sudo rights ifthey need to make changes to the GuestOS. You can ssh to the Guest by usingssh vagrant@localhost 2222 onthe Host or with the vagrant shortcut:>vagrant sshAlthough you can automaticallyreach your Guest using vagrant ssh,sometimes it is useful to use scp orgit, which refers to your SSH setup.To make life easier, you can updatethe entry in your ~/.ssh/config withinformation specific to the Guest(Windows users can achieve similarresults by making similar updates totheir PuTTY configuration):>vagrant ssh-configHost defaultHostName 127.0.0.1User vagrantPort 2222UserKnownHostsFile /dev/nullStrictHostKeyChecking noPasswordAuthentication noIdentityFile /Users/jay/.vagrant.d/insecure_private_keyIdentitiesOnly yesBy plugging this information in to~/.ssh/config, you can access your Guestwith scp to move a file my_file from/vagrant to the Host desktop:> scp default:/vagrant/my_file.txt ~/Desktop/my_file.txtWhat if you want to run a Web serveron your Guest that’s accessible by theHost? Let’s add one. Edit your Vagrantfile, adding the following line:config.vm.forward_port 80, 8080The forward_port parameter allowsthe Host to forward ports from the Host tothe Guest. In this case, you’re forwardingport 8080 on the Host to 80 on the Guest.By default, Vagrant uses Network AddressTranslation networking, meaning theGuest can access the network throughthe Host. Machines on the Host networkcannot reach the Guest directly unless theWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 81


FEATURE Introducing Vagrantports are forwarded. Vagrant also providesHost-only networking, where a Guest canaccess only the host or other Guests onthe Host, and bridge networking wherethe Guest will be on the same network asthe Host. Once you update the Vagrantfile,restart the system:>vagrant reloadMake sure that when you openport forwarding in Vagrant you alsoare opening the appropriate firewallpermissions on the Guest! If you try tomake a connection to a Guest via portforwarding with a firewall turned on, theinitial handshake will work with the hostport (8080 in the example here) but fail ifport 80 is blocked by the Guest firewall.Finishing VagrantOnce you’re done with your work,you have a few choices. If you’re justdone for the day, or trying to get a fewresources back for the Host, you cansimply suspend Vagrant:>vagrant suspendand resume:>vagrant resumewhen you want to restart. All state ismaintained, and nothing is lost. Once anenvironment is completely finished, or ifyou want to start over from a clean slate,you can run vagrant destroy. Thiswill remove the Guest and any changes tocontent outside of the shared folders(/vagrant). Shared folders reside on the Hostand will survive the destruction of the Guest.Provisioning a GuestCreating and maintaining virtual imagescan be a delicate balance in terms ofconfiguration. Over-configure the image,and it gets stale faster as applications arereplaced with patches. Under-configurethe image with a simple base, and eachtime a new instance is created, the usermust spend the time configuring theBox to the exacting specifications of theproject. With Vagrant, there is a goodmiddle ground. A base image can becreated and minimally updated to manageOS-level patches and libraries, whileapplications can be provisioned usingtools like Chef, Puppet or shell scripts.Many systems administrators arealready familiar with Chef and Puppet,two configuration management toolsthat allow for consistent creation andmaintenance of a system. By pluggingChef and Puppet into Vagrant, users cantake immediate advantage of the scriptsthat already have been created by systemadministrators to create up-to-date82 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


images for development.Chef and Puppet are both supportedto work in solo/standalone mode andserver mode. In Chef Solo or Puppetstandalone, the configuration for theimages are all self-contained. In servermode, Chef or Puppet Server users canleverage the existing Chef or Puppetinfrastructure already established intheir companies.This article doesn’t go into greatdetail about Chef, but let’s look at asimple example of how using Chef canbe helpful. Vagrant can run withoutadditional servers in Chef-solo mode. Toset up Chef, you first need a Cookbook.Cookbooks are a collection of Recipes. ARecipe is a description of how you wouldlike the system configured. Recipes alsoare written in Ruby. For this example, I’vedownloaded Cookbooks available fromthe Opscode community site.In this example, let’s install Apache2and MySQL on our Guest. Here’sthe setup. Create a directory called“cookbooks” under the ProjectXdirectory. Download the Cookbooksfor Apache2 and MySQL fromhttp://community.opscode.com (seeResources). In addition, you’ll needtwo dependencies: the apt recipe toget the latest updates and openssl,which is required for the MySQL recipe.Extract these files and save them tothe cookbooks directory:> ls -l cookbooks/total 0drwxr-xr-x@ 10 jay staff 340 Feb 16 18:49 apache2drwxr-xr-x@ 9 jay staff 306 Feb 14 12:00 aptdrwxr-xr-x@ 9 jay staff 306 Feb 16 18:23 mysqldrwxr-xr-x@ 7 jay staff 238 Jun 3 2011 opensslIn my Vagrantfile, I have added thefollowing lines:config.vm.provision :chef_solo do |chef|endchef.cookbooks_path = "cookbooks"chef.add_recipe("apt")chef.add_recipe("openssl")chef.add_recipe("mysql::server")chef.add_recipe("apache2")# You may also specify custom JSON attributes:chef.json = {:mysql => {}}:server_root_password => "MYSQL PASSWORD"When Vagrant runs the provisioningstep (included in vagrant up butalso available on a running Guest byusing vagrant provision), it willinstall apt, openssl, mysql and apache2.Recipes can take arguments as well. Inthis example, I’ve included a parameterWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 83


FEATURE Introducing Vagrantfor the mysql recipe that sets the serverroot password. The parameters availablefor a given recipe are included in thedocumentation. The parameters arepassed as JSON attributes to Vagrant,which will pass them along to Chef.Managing this with Vagrant allowsyou to enforce change controls aroundyour environments. Vagrantfiles andCookbooks can be checked into sourcecontrol and managed just like coderesources. Branching and taggingenvironment configurations can be usedto track changes as projects develop orrequirements and packages are updated.Creating a New ImageThe traditional method is to useVirtualBox to create a new virtualmachine image using the standardoperating system installer. Becauseoperating system steps vary, I’m notgoing to go into the details here, buthere are some key steps:n When choosing the Virtual Diskstorage details, choose “DynamicallyAllocated”—you want your diskto start small but be able to increaseas needed.n If you want to use Chef or Puppetwith your application, you’ll need toinstall the Chef or Puppet applicationswhen creating the base image.n Install the VirtualBox GuestAdditions. This is necessary tosupport shared folders!Let’s set up the vagrant user. Createa user with the user name vagrant. Addthis user to an admin group. Update thesudoers file to make sure that users inthe admin group do not need a passwordto access sudo. For a group called“admin”, it should look like this:%admin ALL=NOPASSWD:ALLAlso make sure that the Defaultsrequiretty is commented out. Youalso want to make sure you can ssh intothe Vagrant account. By default, Vagrantwill try to use the insecure SSH key withvagrant ssh. To add the key to yourVagrant account, do the following as thevagrant user:> mkdir .ssh> chmod 755 .ssh> curl -L http://github.com/mitchellh/vagrant/raw/master/➥keys/vagrant.pub> > .ssh/authorized_keys> chmod 644 .ssh/authorized_keysOnce the image is complete, you’llneed to package the image into a Box.84 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


To create a set of default configurationsfor the Box, you can create a newVagrantfile. Use vagrant package,which takes two arguments --base (thename of the image in VirtualBox that youcreated) and --include (which takesthe files you wish to include in your Box):> vagrant package --base RHEL-5.7-64 --include VagrantfileIn this example, you created apackage with the base image calledRHEL-5.7-64 and included yourcustom Vagrantfile.SummaryVagrant is a powerful tool forcreating and managing flexibledevelopment environments. In thisarticle, I covered the basic use andcreation of Vagrant images.■Jay Palat is the Lead for Mobility Applications at the University ofPittsburgh Technology Development Center (TDC). His experienceranges from fast startups like ModCloth to developing theIBM Support Site. He has an active interest in mobile, cloudand devops technology and techniques. He has a Masters ofInformation Technology from Carnegie Mellon University andBS in Computer Science from University of Pittsburgh.ResourcesVagrant’s Home on the Web: http://vagrantup.comVirtualBox: https://www.virtualbox.org/wiki/DownloadsRuby: http://www.ruby-lang.org/en/downloadsRubyInstaller (for Windows): http://rubyinstaller.orgOpscode Community Site: http://community.opscode.comChef Cookbooks:n mysql: http://community.opscode.com/cookbooks/mysqln apache2: http://community.opscode.com/cookbooks/apache2n apt: http://community.opscode.com/cookbooks/aptn openssl: http://community.opscode.com/cookbooks/opensslVagrantbox.es (a Web Site That Lists Boxes): http://vagrantbox.esWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 85


WritingSambaVFS ModulesAlthough Samba does a very good job of serving filesfor Windows (and <strong>Linux</strong>) clients, sometimes you wantit to do more. This article introduces the Samba VFSLayer and shows how to create a simple VFS moduleto extend Samba’s functionality.RICHARD SHARPEIam sure that many of you are aware of the <strong>Linux</strong> Virtual File System (VFS)layer, which allows <strong>Linux</strong> to support many different filesystems, such asEXT3, EXT4, XFS, btrfs and so on. However, I suspect you are less awarethat Samba, the Windows file and print server for UNIX, also has a VFS. Justlike the <strong>Linux</strong> VFS, the Samba VFS allows you to extend Samba’s functionalityto encompass different filesystems and perform some neat tricks. On theother hand, the Samba VFS lives in userspace, which makes it easier todevelop and debug Samba VFS modules.In this article, I introduce you to writing a Samba VFS module through theuse of a simple example. The example is that of auditing access to a specificfile. Although this example is a little simplistic for the sake of this article, itcould be extended in many ways to be more powerful.86 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


FEATURE Writing Samba VFS Modules1. As built-in and statically linkedmodules.2. As shared libraries.You tell Samba about VFS modules foreach share, and you can pass modulespecificparameters to a module. Anexample of specifying a VFS module isshown in the smb.conf fragment above.The module for the [data] share is calledvfs_demo, and it has one parametercalled audit path.If you do not specify any modules,Samba does not load any (except forthe default, discussed below).The vfs_demo ModuleIn this article, I explain how to developa simple Samba VFS module that allowsyou to specify auditing of accesses tofiles. The auditing that is performedis very simple and is not the same assecurity auditing that Windows performsusing Security ACLs (SACLs), but itserves to illustrate some of the thingsyou need to do in developing a SambaVFS. In particular, this module: 1) willallow you to specify the path prefixfor files to be audited, and 2) will logall accesses of such files to /var/log/messages using syslog.It has some deficiencies as well, whichyou might consider dealing with if youwant to improve the demo module.Various aspects of the code for thismodule, as well as how to build it andhow to debug it, are discussed belowafter a more thorough introduction tothe Samba VFS Layer.More Detail on the SambaVFS LayerSamba has a very flexible VFS systemthat allows you to combine functionalityfrom several modules to customize thebehavior of any share. Figure 2 showshow VFS modules relate to the mainSamba code.As Figure 2 suggests, a number ofcomponents are involved in the processingof requests from Windows clients:1. The main Samba code, which receivesall incoming requests and sends alloutgoing responses, interprets thesmb.conf file and so forth.2. The VFS Layer, which providesinfrastructure for the VFS modulesthemselves.3. A default VFS module, vfs_default.c,which provides default actions incase no modules have been specifiedfor a share.4. The system layer, which the defaultVFS module calls to provide standardSamba actions for all of the VFS88 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Figure 2. The Relationship between VFS Modules and the Samba Coderoutines. They mostly translate intoUNIX/<strong>Linux</strong> system calls.One of the most important aspects ofthe Samba VFS Layer is that VFS modulesare stackable and, depending on the waythey are coded, can:1. Completely replace existing Sambafunctionality.2. Augment existing Samba functionality.3. Do both, perhaps based on filenameor VFS function.WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 89


FEATURE Writing Samba VFS ModulesBecause they are stackable, you can write a VFSmodule to augment or replace a small subset of the100 or so VFS functions that Samba uses, whichmakes your life as a VFS module writer much easier.Because they are stackable, you canwrite a VFS module to augment orreplace a small subset of the 100 or soVFS functions that Samba uses, whichmakes your life as a VFS module writermuch easier.First, however, when a Windowsclient connects to a share, Samba findsthe required VFS module, and if it is ashared library, loads that shared library.It then calls the specified module’sconnect routine to allow the module toperform whatever initialization it has to.Then, during normal operation,the following sequence of steps willbe performed for requests from theWindows client:1. An SMB request is received fromthe client by Samba and undergoesinitial processing within Sambaitself. At some point, Samba thencalls a VFS function to complete theprocessing of this request.2. The VFS Layer then calls therequested VFS function in thefirst VFS module, vfs_mod_1, inthe stack that implements thatVFS function. The VFS function inthis module could perform someinitialization at that point.3. That VFS function then calls thenext module in the stack thatimplements the requested VFSfunction, which in this case isvfs_default.c.4. After performing some processing, ifneeded, vfs_default.c calls the relevantSystem Layer function in Samba.5. Again, after performing relevantsetup or other processing, thecalled System Layer function issuesa system call.6. After performing the system call, thekernel returns to the System Layerfunction that called it. Here, furtherprocessing can be performed.7. The System Layer function first called90 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


disconnect routine. The remaining onesrelate to opening files and directories,creating files and directories, andremoving directories.Note: for reasons pointed out in theSamba VFS documentation (see Resources),you should call your registration routine_init. This allowsyour module to be built in the Sambasource tree or outside it and as ashared module or a static module.In this case, I called it vfs_demo_initand the Makefile that is generatedbelow will take care of ensuringthat the correct name is used in ashared module (as long as the patchmentioned below is applied).Building Your VFS ModuleNow that you have some code, it is timeto think about building your module.Although you can add your module tothe Samba source code, doing that ismore complicated than the approachsuggested here, which builds your VFSmodule outside the Samba source tree.However, you will need to have theSamba source code handy.First, download the latest Sambasource code (Samba 3.6.4 at the timeof this writing) using:wget http://ftp.samba.org/pub/samba/old-versions/samba-3.6.4.tar.gzNext, unpack the source and build it(for those who have not done this before,here are the steps):tar xvf samba-3.6.4.tar.gzcd samba-3.6.4/source3./configure.developermakeThen, as root:make installNote: if you do not want to replacethe Samba version on your developmentsystem, you will have to download thesource packages for your <strong>Linux</strong> distro(source RPMs for those distros based onRPMs) and build that RPM. Here I assumeyou are happy to build Samba fromsource and install it on your system. Thismight mean that you have to erase thedistro-supplied Samba packages to avoidconfusion with them.In addition, you need to tell thesystem how to find the Samba sharedlibraries that are needed. You can dothis as root with:echo /usr/local/samba/lib > /etc/ld.so.conf.d/samba-local-build.confldconfigYou can use ldconfig -v to verifythat it found your shared libraries.Now that you have the Samba sourceand have built Samba, you can create aWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 93


FEATURE Writing Samba VFS Modulesdirectory and add the code for the demomodule to it:mkdir demo_vfscd demo_vfs# Use vim to add the codeNext, you need to copy several filesfrom the examples/VFS directory of theSamba source:cp ../samba-3.6.4/examples/VFS/config* .cp ../samba-3.6.4/examples/VFS/autogen.sh .cp ../samba-3.6.4/examples/VFS/install-sh .cp ../samba-3.6.4/examples/VFS/configure.in .cp ../samba-3.6.4/examples/VFS/Makefile.in .branch, but it has not yet made it into the3.6.x branch. The patch makes it easier foryour module to be moved into the Sambasource tree if it proves useful enough andmakes documenting modules easier. Yousimply could edit Makefile.in and append-D$*_init=init_samba_moduleto the line shown rather than creating apatch file.)These files allow you to build yourmodule; however, you must perform afew actions similar to those performedabove when building the Samba source:./autogen.sh./configure --with-samba-source=../samba-3.6.4/source3After that, you should apply thefollowing patch to Makefile.in to allowyour code to be built as a shared module:--- Makefile.in <strong>2012</strong>-04-22 17:30:45.631698392 -0700+++ Makefile.in.new <strong>2012</strong>-04-22 17:30:20.619140189 -0700@@ -36,7 +36,7 @@%.$(OBJEXT): %.c@echo "Compiling $


Listing 2. demo_connect Functionstatic int demo_connect(vfs_handle_struct *handle,const char *service,const char *user){int res = 0;struct demo_struct *ctx = NULL;/** Allow the next module to handle connections first* If we get an error, don't do any of our initialization.*/res = SMB_VFS_NEXT_CONNECT(handle, service, user);if (res) {return res;}/** Get some memory for the dir we are interested in* watching and our other context info.*/ctx = talloc_zero(handle, struct demo_struct);if (!ctx) {DEBUG(0, ("Unable to allocate memory for our context,can't proceed!\n"));errno = ENOMEM;return -1;}ctx->audit_path = lp_parm_const_string(SNUM(handle->conn),DEMO_MODULE_NAME,"audit path",NULL);DEBUG(10, ("audit path is \"%s\"", ctx->audit_path));res = sem_init(&ctx->send_sem, 0, LOG_QUEUE_SIZE);if (res) {DEBUG(1, ("Unable to initialize send sem: %s\n",strerror(errno)));goto error_no_thread;}res = sem_init(&ctx->recv_sem, 0, 0);if (res) {DEBUG(1, ("Unable to initialize recv sem: %s\n",strerror(errno)));goto error_no_thread;}res = pthread_create(&ctx->log_thread,NULL,logging_thread,ctx);if (res) {DEBUG(1, ("Unable to create our background thread: %s\n",strerror(errno)));goto error_no_thread;}SMB_VFS_HANDLE_SET_DATA(handle, ctx, NULL,struct demo_struct, goto error);error:return res;error_no_thread:talloc_free(ctx);return res;}WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 95


FEATURE Writing Samba VFS Modulesfunctions all rely on a background threadto send auditing messages to syslog sothey can be added to /var/log/messages.Listing 2 is the demo_connect function.The key points to note here are:1. You should use talloc for allocatingmemory for your module, and youshould choose an appropriate context.When allocating memory that should lastfor the duration of the connection to theshare, use the VFS handle for a context.2. You should use lp_parm_const_stringto parse module parameters.3. You should use theSMB_VFS_HANDLE_SET_DATAmacro to set up the context that allyour other routines need when theyare called.4. You should call the next VFSroutine in the stack, in this caseSMB_VFS_NEXT_CONNECT, to give it achance to do its module initializationwork as well. (However, see theSamba VFS document mentionedbelow for cases where you do notwant to call the Next module.)While in general you can call the nextVFS routine before, after or during yourprocessing, in this case, it is called firstto make cleanup easier in the connectroutine, because you are creating athread to perform deferred work.The next function to look at is thedemo_disconnect function (Listing 3).In this case, you:1. Call the next VFS function,SMB_VFS_NEXT_DISCONNECT.2. Use a function called create_cmd totell your background thread.3. Use pthread_join to wait for thebackground thread to exit.4. Free up the context you created.At this point, it is worth mentioning thatall of the VFS functions have names like thoseseen above (such as SMB_VFS_CONNECTand SMB_VFS_NEXT_CONNECT), andtheir meanings are:1. Functions like SMB_VFS_CONNECT,SMB_VFS_GET_NT_ACL and so oncall that VFS function from the top ofthe stack of VFS modules. Within aspecific VFS function, you would notcall that function recursively; however,you might call other VFS functions toperform useful actions. For example,the vfs_acl_xattr.c module (insamba-3.6.4/source3/modules) callsSMB_VFS_GETXATTR to get extendedattributes on files.96 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Listing 3. demo_disconnect Functionstatic void demo_disconnect(vfs_handle_struct *handle){int res = 0, *thread_res = NULL;struct demo_struct *ctx;struct cmd_struct *cmd;/* Let the next module do any cleanup it needs to */SMB_VFS_NEXT_DISCONNECT(handle);SMB_VFS_HANDLE_GET_DATA(handle,ctx,struct demo_struct,goto ctx_error);/** Tell the background thread to exit*/cmd = create_cmd(ctx, LOG_EXIT, NULL);if (!cmd || !send_cmd(cmd)) {return; /* Not much more to do here ... kill the thread? */}res = pthread_join(ctx->log_thread, (void **)&thread_res);if (res || *thread_res) {DEBUG(10, ("Error cleaning up thread: res: %s, ""thread_res: %s\n",strerror(errno), strerror(*thread_res)));return;}/** This is not absolutely needed since that structure used* the handle as a talloc context ...*/talloc_free(ctx);return;ctx_error:DEBUG(10, ("Error getting context for connection!\n"));return;}WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 97


FEATURE Writing Samba VFS Modules2. Functions like SMB_VFS_NEXT_CONNECT,SMB_VFS_GET_NT_ACL and so oncall the next function of the sametype in the stack of modules.You usually would call the nextfunction for your VFS functionsomewhere in your VFS functionunless you have completelyhandled everything or an errorhas occurred and there is nolonger any need to call thenext function.Listing 4. demo_opendir Functionstatic DIR *demo_opendir(vfs_handle_struct *handle,const char *fname,const char *mask,uint32 attr){DIR *res;struct demo_struct *ctx;SMB_VFS_HANDLE_GET_DATA(handle,ctx,struct demo_struct,return NULL);if (ctx->audit_path && strstr(fname, ctx->audit_path)) {struct cmd_struct *cmd;DEBUG(10, ("Found %s in the path %s\n",ctx->audit_path, fname));}cmd = create_cmd(ctx, LOG_LOG, fname);if (!cmd || !send_cmd(cmd)) {DEBUG(1, ("Error logging. Continuing!\n"));}}/* Allow the next module to handle the OPENDIR as we are done */res = SMB_VFS_NEXT_OPENDIR(handle, fname, mask, attr);return res;98 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Finally, Listing 4 shows one of theactual functions for performing auditingwhen a file has been accessed.Testing and DebuggingOnce you have all the functionsimplemented (either by entering themfrom scratch or downloading themfrom GitHub or by taking an existingmodule and modifying it), you canbuild the VFS module and install it.Building the module is usually assimple as running make. If any errors arereported, fix the errors and re-run makeuntil the build completes successfully.The output from the build process youare using here is a shared module calledvfs_demo.so. This file needs to be movedto the directory where Samba looks forshared modules. This is usually /usr/local/samba/lib/vfs. You will need to copyvfs_demo.so to that location as root:sudo cp vfs_demo.so /usr/local/samba/lib/vfsOnce you have done that, you cantest whether it works. First, you needto ensure that you have a share definedthat uses the newly created VFS module.The following smb.conf file defines sucha share and includes some additionalparameters to help with debugging:[global]workgroup = workgroupserver string = %h serversecurity = userlog level = 10log file = /var/log/samba/%m.logpanic action = sleep 99999[data]path = /path/to/dataread only = novfs objects = vfs_demovfs_demo:audit path = aaaThe interesting things to note here are:1. The log level has been set to 10,giving you lots of information aboutwhat is going on and going wrongwith your new VFS module.2. The log file has been set to /var/log/samba/%m.log, which will create aseparate log file for each client thatconnects. This will make it easier foryou to see errors during testing.3. Finally, you have specified a panic actionin case your module causes Samba tocrash (for example, by seg faulting).The panic action causes the smbd thatcrashed to sleep for a long time, givingyou time to attach with gdb to see whyyou crashed (although oftentimes youWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 99


FEATURE Writing Samba VFS Modulescan tell why you crashed by looking atthe traceback in the log file).Once you have created that smb.confin /usr/local/samba/etc/smb.conf andstarted Samba as root with:/usr/local/samba/sbin/smbdyou can check to see if it is running with:ps -ax | grep smbdThere should be two instances of smbdrunning. If they are not running, checkfor problems in /var/log/samba/smbd.log(although if the problems occur earlyenough, you will have to start Sambawith /usr/local/samba/sbin/smbd-i -F to determine why it failed beforeit could start logging information).Once Samba is running happily, youcan test your VFS module with smbclient:smbpasswd some-userwhere you will be prompted to enterthe password. Once you have set thepassword for your user, you can trysmbclient as shown above.If all is working correctly, youshould expect to see something likethe following:Domain=[WORKGROUP] OS=[Unix] Server=[Samba 3.6.4...]smb: \>Then you can issue commands to testyour VFS module’s functions. The sort ofsmbclient commands to try include thingslike put /path/to/some-file aaa(where the target file matches the auditpath specified above). If your moduleis working correctly, you would see thefollowing in /var/log/messages:Apr 29 13:02:27 localhost Samba Audit: File aaa accessedApr 29 13:02:27 localhost Samba Audit: File aaa accessedsmbclient //localhost/data -Usome-user%some-passwordOf course, you will have to ensurethat the user you are testing with existsand that the user’s password is set. Youcan do this with:smbpasswd -a some-userThen you can set the user’spassword with:As you can see, one deficiency ofthe module currently is that thereare multiple messages logged forsome accesses.Obtaining the Full Source CodeYou can obtain the full source code forthis demo module from GitHub via:git clone git@github.com:RichardSharpe/demo-samba-vfs.git100 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


FEATURE Writing Samba VFS ModulesIn the directory retrieved, you will findsource code for the Samba master branchas well as the Samba 3.6 branch. Most ofwhat has been discussed here relates tothe Samba 3.6 branch. You also will findsome README files.The code in that repository buildsagainst the Samba master branch andagainst Samba 3.6.x.Final ThoughtsOne of the issues you should be awareof is that the Samba VFS has changedover time; however, it is allowed tochange only when a new major versionis released. For example, there aredifferences between the Samba 3.5.xVFS, the Samba 3.6.x VFS and the VFS inthe Samba master branch.Indeed, the Samba master branch’sVFS interface has been changed tomake the VFS routine names moreconsistent as well as introducing atleast two new VFS functions:1. An FSCTL function so that VFSwriters now can write FSCTL handlingfunctions.2. A FILE AUDIT function so that VFSwriters can handle file access auditingusing the Windows approach.However, you now should be able tounderstand the Samba VFS Layer wellenough to understand what existingmodules are doing and even write yourown VFS modules when the need arises.■Richard Sharpe is a software engineer in Silicon Valley where heworks on storage systems. He has been an open-source contributorfor a long time, and he has contributed code to Ethereal/Wireshark,Samba and SCST. Richard recently has gotten back intocontributing to Samba and has made a number of improvementsto the Samba VFS Layer, some to ease building VFS modules outof the Samba source tree and some to improve functionality.ResourcesA Detailed Introduction to the Samba VFS: http://samba.org/~sharpe/The-Samba-VFS.pdf“Introducing Samba” by John Blair: http://www.linuxjournal.com/article/2716Samba-3 by Example: Practical Exercises to Deployment by John H. Terpstra:http://www.informit.com/title/0131472216The Official Samba-3 HOWTO and Reference Guide, edited by John H. Terpstra andJelmer R. Vernooij: http://www.informit.com/title/0131882228102 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


If You Use <strong>Linux</strong>, You Should BeReading LINUX JOURNAL❯❯ In-depth informationproviding a full 360-degree look at featuredtopics relating to <strong>Linux</strong>❯❯ Tools, tips and tricks youwill use today as well asrelevant information forthe future❯❯ Advice and inspiration forgetting the most out ofyour <strong>Linux</strong> system❯❯ Instructional how-tos willsave you time and moneySubscribe now for instant access!For only $29.50 per year—lessthan $2.50 per issue—you’ll haveaccess to <strong>Linux</strong> <strong>Journal</strong> eachmonth as a PDF, in ePub & Kindleformats, on-line and through ourAndroid & iOS apps. Wherever yougo, <strong>Linux</strong> <strong>Journal</strong> goes with you.SUBSCRIBE NOW AT:WWW.LINUXJOURNAL.COM/SUBSCRIBE


INDEPTHTarsnap: On-lineBackups for theTruly ParanoidIf you’ve ever worried about your data in the cloud orwished backups were easier to automate, the high-security,command-line-friendly approach of Tarsnap backups mightmake you rest a lot easier at night. ANDREW FABBROStoring backups in the cloud requires alevel of trust that not everyone is willingto give. While the convenience and lowcost of automated, off-site backups isvery compelling, the reality of puttingpersonal data in the hands of completestrangers will never sit quite right withsome people.Enter Tarsnap—“on-line backupsfor the truly paranoid”. Tarsnap is thebrainchild of Dr Colin Percival, a formerFreeBSD Security Officer. In 2006, hebegan research and development on anew solution for “encrypted, snapshottedremote backups”, culminating in therelease of Tarsnap in 2008.Unlike other on-line backup solutions,Tarsnap uses an open, documentedcryptographic design that securelyencrypts your files. Rather than trusting avendor’s cryptographic claims, you havefull access to the source code, which usesopen-source libraries and industry-vettedprotocols, such as RSA, AES and SHA.Tarsnap provides a command-lineclient that operates very much likethe traditional UNIX tar command.Familiar syntax, such as tarsnap cfand tarsnap xvf works as userswould expect, except that instead ofmanipulating local tarballs, the client isworking with cloud-based archives. Thesearchives are stored on Amazon S3 withEC2 servers to handle client connections.To minimize bandwidth costs, Tarsnapuses smart rsync-like block-oriented104 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


INDEPTHUnlike other on-line backup solutions, Tarsnapuses an open, documented cryptographic designthat securely encrypts your files.snapshot operations that uploadonly data set changes and minimizetransmission costs.Although you could roll your own backupsystem by encrypting your data, runningan rsync server off-site and so on, Tarsnap’sscriptability, professional cryptographicdesign and utility pricing make it anattractive service for clueful users.Open-Source Strong CryptoTarsnap compresses, encrypts andcryptographically signs every byteyou send to it. No knowledge ofcryptographic protocols is required,but if you are an interested enthusiastor feel more comfortable if you canlook under the hood, you’ll find thesource code and full design thoroughlydocumented on http://tarsnap.com.During installation, the user creates akeyfile, which optionally can be passphraseprotected.The file contains two 2048-bitRSA keys for signing and encryption.AES-256 session keys are created at runtimeand encrypted with the RSA keys. TheHMAC-SHA-256 hash function is used toverify data blocks and prevent tampering,and the same hash function is usedelsewhere to verify communicationbetween the client and server.If those cryptographic terms make youreyes glaze over, the simple descriptionis that Tarsnap encrypts and verifiesyour data in motion and at rest usingindustry-standard protocols. From a userperspective, the only thing that needsto be safeguarded is the keyfile. As youmight expect, if it’s lost, the keys todecrypt data stored with Tarsnap areunavailable and your data is no longerusable. After generation, this keyfileideally should be stored securely off-site.Efficient DesignTarsnap backs up variable-length,deduplicated blocks of data. This meansthat if you change one file in a directory,only that file is backed up. Indeed, onlythe changes to that file are backed up ifthe file previously existed. If you movefiles around, Tarsnap is smart enough torecognize and skip previously backed-upblocks. Anytime you wish, you can takea backup of a directory, which creates anew archive. Behind the scenes, TarsnapWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 105


INDEPTHThis is not just “pay by the drink” cloud pricing—it’s practically “pay by the atom”.will retain whatever files are necessary tomake that archive whole.Suppose you have a directory with100MB of files and a 10% daily changerate. If you create a Tarsnap archive everyday of that directory, you will see thefollowing consumption pattern:n Monday (initial): 100MB archive created(100MB uploaded, 100MB stored).n Tuesday: 10MB archive created(10MB new data uploaded, 110MBtotal data stored).n Wednesday: 10MB archive created(10MB new data uploaded, 120MBtotal data stored).If on Thursday you delete the Mondayarchive, the Tuesday archive will be“made whole” by retaining whateverfiles from Monday are needed. Thisgives you tremendous flexibility to pickrestore points. Of course, you can keepmultiple archives with however manyrestore points you wish.Your Current Account Balance Is$4.992238237884881224Tarsnap works on a prepaid utilitymeteredmodel. Subscribers deposita minimum of $5.00 and are chargedonly for the storage and bandwidththey consume. Although the cost ishigher than plain Amazon S3 service,it reflects both the cryptographic,compression and deduplicationvalue-add of Tarsnap. At the time ofthis writing, Tarsnap costs 30 centsper gigabyte-month for storage and30 cents per gigabyte transmitted.This cost may make Tarsnapinfeasible for large, whole-serverterabyte-size backups. However, it isideal for critical, sensitive files thatmust be durable, available and safein the event an attacker succeededin compromising them. With nominimum charge or monthly fees,Tarsnap is very economical for smalldata sets or for data that compresseswell. Some examples:n Backing up 100MB of files with 10%daily change rate for a month wouldcost only 30 cents.n A gigabyte that is backed up weeklywith a 20% change rate would cost$1.40 a month.Tarsnap bills based on attodollars(quintillionths of a dollar) to avoid106 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


INDEPTHprofiting through rounding. This meansyour account balance is tracked to 18decimal places. This is not just “pay bythe drink” cloud pricing—it’s practically“pay by the atom”. Some users findthat a small deposit lasts them monthsor years.Important FlexibilityOne of Tarsnap’s best features is howeasy it is to script. The ability to puta tarsnap cf command into a shellscript makes use in cron jobs verystraightforward, which encouragesunattended, automated backups—thebest kind.Crucially, Tarsnap also supports adivision of responsibilities. You can usethe tarsnap-keymgmt tool to createkeyfiles with limited authority. Youmay have one keyfile that lives onyour server with permission to createarchives, but not the authority todelete them. A master key with fullprivileges could be kept off-site, sothat if attackers were to compromiseyour server, they would be unable todestroy your backups.Using TarsnapTo get started with Tarsnap, register attarsnap.com, deposit some funds intoyour account, and download the client.The client is available only as source,but the straightforward ./configure; make install process is very easy.The client is supported on all major<strong>Linux</strong> distributions (as well as BSDbasedsystems). Take a quick peek atthe download page to make sure youhave the required operating systempackages, as some of the developmentpackages are not installed in typical<strong>Linux</strong> configurations.If you are using a firewall, be awarethat Tarsnap communicates via TCP onport 9279.There are only two criticalconfiguration items: the location ofyour keyfile and the location of yourTarsnap cache. Both are set in /usr/local/etc/tarsnap.conf. A tarsnap.conf.example is provided, and you probablycan just copy the example as is. Itdefines your Tarsnap key as /root/tarsnap.key and your cache directoryas /usr/local/tarsnap-cache, whichwill be created if it doesn’t exist.The cachedir is a small state-trackingdirectory that lets Tarsnap keep trackof backups.Next, register your machine asfollows. In this case, I’m setting upTarsnap service for a machine calledhelicarrier. The e-mail address andpassword are the ones I used whenI signed up for service with Tarsnap:# tarsnap-keygen --keyfile /root/tarsnap.key➥--user andrew@fabbro.org --machine helicarrierEnter tarsnap account password:#I have a directory I’d like to back upWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 107


INDEPTHwith Tarsnap:changes, the backup will proceed veryquickly and use little additional space:# ls -l /docstotal 2092-rw-rw---- 1 andrew 1833222 Jun 14 16:38 2011 Tax Return.pdf-rw------- 1 andrew 48568 Jun 14 16:41 andrew_passwords.psafe3# tarsnap cf docs.<strong>2012</strong>0702 /docstarsnap: Removing leading '/' from member namesTotal size Compressed size-rw------- 1 tina14271 Jun 14 16:42 tina_passwords.psafe3All archives 4264650 3631796-rw-rw-r-- 1 andrew 48128 Jun 14 16:41 vacation_hotels.doc-rw-rw-r-- 1 andrew 46014 Jun 14 16:35 vacation_notes.doc-rw-rw-r-- 1 andrew 134959 Jun 14 16:44 vacation_reservation.pdf(unique data) 2132770 1816935This archive 2132325 1815898New data 445 1037To back up, I just tell Tarsnapwhat name I want to call my archive(“docs.<strong>2012</strong>0701” in this case) andwhich directory to back up. There’s norequirement to use a date string in thearchive name, but it makes versioningstraightforward, as you’ll see:# tarsnap cf docs.<strong>2012</strong>0701 /docstarsnap: Removing leading '/' from member namesTotal size Compressed sizeAll archives 2132325 1815898(unique data) 2132325 1815898This archive 2132325 1815898New data 2132325 1815898In my tarsnap.conf, I enabled theprint-stats directive, which gives theaccount report shown. Note the compression,which reduces storage costs and improvescryptographic security. The “compressedsize” of the “unique data” shows howmuch data is actually stored at Tarsnap,and you pay only for the compressed size.The next day, I back up docs again to“docs.<strong>2012</strong>0702”. If I haven’t made manyAs you can see, although the amountof data for “all archives” has grown,the actual amount of “unique data”has barely increased. Tarsnap is smartenough to avoid backing up data thathas not changed.Now let’s list the archives Tarsnaphas stored:# tarsnap --list-archivesdocs.<strong>2012</strong>0701docs.<strong>2012</strong>0702To demonstrate Tarsnap’s smartapproach to storage further, I will deletethe oldest backup:# tarsnap df docs.<strong>2012</strong>0701Total size Compressed sizeAll archives 2132325 1815898(unique data) 2132325 1815898This archive 2132325 1815898Deleted data 445 1037The “all archives” number hasdropped because now I have only one108 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


INDEPTHIf I have a local calamity and want to restore thatdata, it is just another simple Tarsnap command toget my files back.archive, but the “unique data” hasnot changed much because it is stillretaining all files necessary to satisfymy “docs.<strong>2012</strong>0702” archive. If I listit, I can see my data is still there:# tarsnap tvf docs.<strong>2012</strong>0702drwxrwxr-x 0 andrew 0 Jun 14 20:52 docs/-rw------- 0 andrew 48568 Jun 14 16:41 docs/andrew_passwords.psafe3-rw-rw-r-- 0 andrew 46014 Jun 14 16:35 docs/vacation_notes.doc-rw-rw-r-- 0 andrew 134959 Jun 14 16:44 docs/vacation_reservation.pdf-rw-rw-r-- 0 andrew 48128 Jun 14 16:41 docs/vacation_hotels.doc-rw------- 0 tina14271 Jun 14 16:42 docs/tina_passwords.psafe3-rw-rw---- 0 andrew 1833222 Jun 14 16:38 docs/2011 Tax Return.pdfI use a date string for convenientversioning, but I could just as easily useany naming convention for the archive,such as “docs.1”, “docs.2” and so on.For my personal backups, I have a cronjob that invokes Tarsnap nightly with adate-string-named archive:tarsnap cf docs.`date '%+Y%m%d'` /docscurrent working directory:# cd /# rm -rf docs# tarsnap xvf docs.<strong>2012</strong>0702x docs/x docs/andrew_passwords.psafe3x docs/vacation_notes.docx docs/vacation_reservation.pdfx docs/vacation_hotels.docx docs/tina_passwords.psafe3x docs/2011 Tax Return.pdfTipsIf you want to run Tarsnap as anonroot user, create a .tarsnaprc filein your home directory. The syntax isidentical to the tarsnap.conf discussedabove. For example:$ cat ~/tarsnap.confcachedir /home/andrew/tarsnap-cachekeyfile /home/andrew/tarsnap.keyprint-statsIf I have a local calamity and wantto restore that data, it is just anothersimple Tarsnap command to get myfiles back. Note that like traditionaltar, Tarsnap removes the leading slashso all files are restored relative to theIf you have other services orusers contending for your Internetconnection, use --maxbw-rate tospecify a maximum bytes per secondthat Tarsnap will be allowed to use.The print-stats command givesWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 109


INDEPTHyou account status information whenused interactively, but for batchoperations (such as running Tarsnapout of cron), you can suppress theoutput by removing that directivefrom your tarsnap.conf or by invokingTarsnap with --no-print-stats.Finally, you can play with the--dry-run and -v flags to simulateTarsnap backup operations withoutactually burning network and disk.Once you’ve got your commandconstructed exactly as you want it,remove --dry-run.LINUX JOURNALon yourAndroid deviceDownloadapp now inthe AndroidMarketplaceLicenseTarsnap is not distributed under anopen-source license, although all clientsource is provided (and compiled bythe user during install). However, thecompany regularly contributes back toprojects whose code it utilizes, suchas libarchive. Tarsnap also has opensourcedsome of its own projects,including the scrypt package, thespiped secure pipe dæmon and thekivaloo NoSQL data store.Further InformationThe Tarsnap home page (http://tarsnap.com)has a wealth of documentation andinformation, as well as links to theTarsnap IRC channel, mailing list andFAQ. The “Technical Details” section isabsorbing reading for those interested inthe deep details of Tarsnap’s cryptographicapproach and history.Tarsnap also pays significant cashbounties for bugs found in the product,ranging from a few dollars for smallcosmetic bugs up to a couple thousanddollars if someone finds a serious securityflaw. This transparent approach is furthercomfort for the truly paranoid.Tarsnap’s current version is 1.0.32,released on February 22, <strong>2012</strong>, for <strong>Linux</strong>,BSD, OS X, Solaris, Minix and Cygwin.■www.linuxjournal.com/androidAndrew Fabbro is a senior technologist living inthe Portland, Oregon, area. He’s used <strong>Linux</strong> sinceSlackware came on floppies and presently works forCon-way, a Fortune 500 transportation company.110 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


INDEPTHQ&A with Dr Colin PercivalQ. In May <strong>2012</strong>, youretired as FreeBSDSecurity Officerto focus more onTarsnap. Soundslike Tarsnap is doingvery well. Can youshare any stats onyour growth to date?A: I’ve heard “startupcompany” defined as“time to double revenueor the number of usersis measured in months”,and I’ve heard “highlysuccessful startupcompany” defined as“time to double revenueor the number of usersis measured in weeks”.By those definitions,Tarsnap is a startupcompany, but not ahighly successful one.And yes, that isdodging the question, but Tarsnap ismy primary source of income, and Icome from a culture that considerssomeone’s income to be a privatematter, so I don’t want to publishprecise numbers.Q. Looking at your BugBounties page, you’ve paidDr Colin Percivalout more than $2,000 tousers who’ve submitted bugreports. Why a “bug bounty”system as opposed to thetraditional bug reporting inopen-source projects?A: I gave a talk about this atBSDCan’12 and AusCERT’12 (inconsecutive weeks, no less—a wordWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 111


INDEPTHof advice, don’t ever try to attendback-to-back conferences 10,000miles apart). In short, bug bountieshelp get more people looking atcode, and help encourage peopleto report anything they see thatseems wrong.Probably a better question iswhy I offered bounties for all bugsrather than just for security bugs—aside from Knuth’s famous prizes, Idon’t know of any other case wherebounties extend beyond securityvulnerabilities. The answer here isroughly the same though—there’sa lot of people who won’t look forsecurity bugs because they don’thave a security background, butoffering cash for all bugs gets theminterested...and my experiencewith FreeBSD is that many securityvulnerabilities aren’t found bysecurity people, but rather by otherdevelopers just saying “somethinghere looks weird”.Q. What do you think arethe weakest points of theTarsnap design, from asecurity perspective? Is thereanything that should keep thetruly paranoid up at night?The weakest link in Tarsnap’ssecurity, without question, is me.I wrote the Tarsnap client code toencrypt and sign everything on theclient side, but how do you knowthat it does what I claim?If you’re paranoid, you shouldlook at the code yourself andmake sure it does what I claim itdoes. And when I release a newversion, you should look at thattoo—compare against the previousversion and make sure that thechanges are sensible.If the CIA kidnaps me andthreatens to torture me until Idecrypt someone’s data, I won’tbe able to do anything to helpthem. But if the CIA kidnaps meand threatens to torture me unlessI insert a Trojan horse into thenext version of Tarsnap...well, I’mnot optimistic about my ability towithstand torture.Q. Tarsnap’s client is veryUNIX-nerd-friendly, with itsfamiliar tar syntax. Do youhave any interest or plansfor a graphical interface forless-sophisticated users?A: This is absolutely something I planon doing...in the future. I have verylittle experience with anything GUI, soI’ll probably end up paying someoneto produce a GUI—ideally an opensourceGUI for tar, since Tarsnap usesalmost exactly the same command-112 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


line interface, and I think a goodGUI front end to tar would beuseful for its own sake too.LINUX JOURNALon youre-ReaderQ. Why attodollars andnot femtodollars orzeptodollars?A: With attodollars, I canexpress everything as a pair of64-bit integers.Q. Do you have anynew features in theworks for upcomingTarsnap releases?A: I’m mostly working onperformance improvementsthese days. There’s a fewfrequently requested featuresthat I might add, but in general,there are good reasons whenfeatures don’t exist—eitherthey’re impossible (for example,server-side expiration of oldarchives—the server has no wayto know which blocks shouldbe freed when an old archiveis deleted) or would requirea substantial redesign (forexample, renaming archives—the client-server protocol haswrite transactions and deletetransactions, but renamingwould need to atomically writesome files and delete others).e-ReadereditionsFREEfor SubscribersCustomizedKindle and Nookeditionsnow availableLEARN MORE


EOFDOC SEARLSDebuggingFree MarketsThat Still Aren’tFree markets should value free customers. Today’s mass marketsstill value captive ones. Some of us are starting to fix that.I’ve been writing for <strong>Linux</strong> <strong>Journal</strong>since 1996, and putting my learningsto work along the way.For example, in 1998, when a bunch ofgeeks got together and decided to expandthe free software conversation to includea broader category they called opensource (http://www.catb.org/~esr/open-source.html), I watched—and learned—in amazement as theirstrategy succeeded, in spades. BeforeFebruary 1998, “open source” washardly uttered outside a few specializedcircles, such as the military. Thanksto Eric S. Raymond and other hackerevangelists, the term became commonparlance throughout civilization.So, a year later, when I got togetherwith three other guys and put TheCluetrain Manifesto up on the Web(http://cluetrain.com), our first thesiswas “markets are conversations”, in partbecause we saw it proven already. Thatsummer, when we expanded the Website into a whole book, we said this inChapter Four:What’s more, these new conversationsneedn’t just happen at random. Theycan be created on purpose. “Wehackers were actively aiming to createnew kinds of conversations outsideof traditional institutions”, Raymondsays. “This wasn’t an accidentalbyproduct of doing neat techie stuff;it was an explicit goal for many of usas far back as the 1970s. We intendedthis revolution.”Nice job.114 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


EOFAbove Cluetrain’s 95 theses, was analpha clue, written by Chris Locke andaddressed to marketers who were busyfailing to understand how profoundly thisnew thing, the Internet, changed marketsby making its human occupants—and notjust marketers—more powerful. It said this:we are not seats or eyeballs or endusers or consumers. we are humanbeings and our reach exceeds yourgrasp. deal with it.For years the number increased by anaverage of one per day, until it reachedfive thousand, where it stands. Google’smeter is pegged there.Yet for all the successes of free software,<strong>Linux</strong>, open source and Cluetrain (in thatorder), a problem persists. As humanbeings, our reach does not yet exceedthe grasp of marketers. And, in manyways, the problem we saw at the heightof the dot-com bubble is worse today,as we are lifted by marketing gas to theBefore February 1998, “open source” was hardlyuttered outside a few specialized circles, such asthe military. Thanks to Eric S. Raymond and otherhacker evangelists, the term became commonparlance throughout civilization.More than any other single line, thatadrenalized us. It also placed Cluetrainclearly on the side of people. As JakobNielsen put it to me at the time (inwords close to exactly these), “Youguys defected from marketing, andsided instead with people in their fightagainst marketing.” (We were all techies,but three of us were also marketingwretches, fed up with the whole thing.)When Google Books made possiblesearch across countless books, I was ableto observe with it the number of booksin which the word “cluetrain” appeared.height of a new bubble variously called“Facebook”, “social media”, “big data”and “personalized advertising”.So, in 2006, when I was given afellowship with the Berkman Center forInternet and Society at Harvard University,I decided to launch a development projectthere. The purpose of the project was togive human beings a reach that exceededmarketers’ grasp. We called it ProjectVRM(http://blogs.law.harvard.edu/vrm),in which the last three letters stood forVendor Relationship Management. Theterm was meant as the customer-sideWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 115


EOFcounterpart of Customer RelationshipManagement, the $18 billion enterprisesoftware and services business that givesthe world call centers, junk mail andother ways of creating distance betweencustomers and the companies that purportto serve them. The idea wasn’t to fightCRM, but to engage it—and organizationsof all kinds—from the customer side, byproviding individuals with means for bothindependence and engagement.As of today, there are dozens of VRMdevelopment efforts, operating in dozensof countries. All are still at the narrow endof the adoption curve, but it’s clear wherethey’re all headed. Once VRM succeeds, Ibelieve, we will enjoy an Intention Economy,based on what each of us intends assovereign human beings participating in afree and open marketplace. In it I claim thatfree customers will prove more valuablethan captive ones, because our reach willoutperform vendors’ grasp—and renderthat grasping obsolete.So, starting a couple years ago, I beganwork on a book titled The IntentionEconomy: When Customers Take Charge,for Harvard Business Review Press, basedon progress in VRM development. Itcame out in May of this year, and it’savailable in all the usual places andforms, including digital and audio ones.With permission of the publisher, I wouldlike to share with you one chapter fromthe book. There are other chapters thatactually go into free software, opensource, <strong>Linux</strong> and other topics moretypical of <strong>Linux</strong> <strong>Journal</strong>, but mostly what Ido in those chapters is leverage the stuffI learned here, along with the rest of you.This chapter is one that visits a largerproblem rooted in the ethos of Businessas Usual, especially on the Web.■Doc Searls is Senior Editor of <strong>Linux</strong> <strong>Journal</strong>. He is also a fellowwith the Berkman Center for Internet and Society at HarvardUniversity and the Center for Information Technology and Societyat UC Santa Barbara.116 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


YOUR CHOICE OF CAPTORYOUR CHOICE OF CAPTORExcerpt from The Intention Economy“Find out just what any people will quietly submit to and you havethe exact measure of the injustice and wrong which will be imposedon them.”—Frederick Douglass“The term ’client-server’ was invented because we didn’t want tocall it ’master-slave’.”—Craig BurtonDOC SEARLSThe ArgumentThe World Wide Web has become aWorld Wide Ranch, where we serve ascalves to Web sites’ cows, which feedus milk and cookies.Wholly cowThe Internet and the Web are not thesame things. The Internet is a collectionof disparate networks whose differencesare transcended by protocols that putevery end at zero functional distancefrom every other end. The World WideWeb is one application that runs onthe Internet. Others include e-mail,messaging, file transfer, chat andnewsgroups, to name just a few. But theWorld Wide Web is the big one—so bigthat we tend to assume it’s the wholething, especially since the Web is wherewe spend most of our time online andwhen it has absorbed so many otherformerly separate activities. E-mailing,for example.Sir Tim Berners-Lee, who invented theWeb in 1989, later said it aspired to be“a universal linked information system,in which generality and portability aremore important than fancy graphicstechniques and complex extra facilities.The aim would be to allow a place to befound for any information or referencewhich one felt was important, anda way of finding it afterwards.” Hedid not intend to create a vast onlineshopping mall, an industrial park ofadvertising mills, or home to a billionmember“social network” owned by onecompany. But that’s what the Web istoday. True, somewhere in the midst ofall that other stuff, you can still find thesimple, linked collection of documentsthat Sir Tim meant for the Web to bein the first place. But that collection isWWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 117


YOUR CHOICE OF CAPTORgetting harder and harder to find, mostlybecause there’s no money in it.Thus, the Web we know today islargely the commercial one that appearedin 1995, when Netscape and Microsoftcreated the first retailing Web serversalong with the first widely used Webbrowsers. The commercial Web’s earlyretailing successes—notably Amazonand eBay—remain sturdy exemplars ofselling online done well. A key to theircontinued success is personalization,made possible by something called thecookie: a small text file, placed in yourbrowser by the Web site, containinginformation to help you and the siteremember where you were the last timeyou visited.Those ur-cookies have since evolved(among other things) into breeds ofTrojan marketing files. Variously called“Flash cookies” (based on Adobe’sproprietary Flash technology), “trackingbugs”, “beacons”, and other names,they track your activities as you go aboutyour business on the Web, reportingback what they’ve found to one ormore among thousands of advertisingcompanies, most of which you’ve neverheard of (but some of which we visited inthe last chapter).The online advertising business saysthe purpose of these tracking methodsis to raise the quality of the advertisingyou get. But most of us care less aboutthat than we do about being followed bycompanies and processes we don’t knowand wouldn’t like if we did.While much complaining is correctlyaddressed to the creepy excesses ofthe online advertising business, littleattention has been paid to the underlyingproblem, which is the design of thecommercial Web itself. That design isclient-server, which might also be calledcalf-cow. Clients are the calves. Serversare the cows. Today, there are billions ofcommercial cows, each mixing invisiblecookies into the milk they feed to visitingcalves.Client-server by design is submissivedominant,meaning it puts servers in theposition of full responsibility for definingrelationships and maintaining detailsabout those relationships. And, sincethis is all we’ve known since 1995, mostof the “solutions” we’ve come up withare more sites and site-based services. Inother words, better cows.What are your names?Back before computing got personal,and all the computing that matteredwas done in enterprises, one of thebiggest problems was a proliferationof namespaces. (In technical terms, anamespace is an identifier or a directoryfor them.) Different software systems118 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


YOUR CHOICE OF CAPTORan extension of the FacebookPlatform called Facebook Connect[http://www.facebook.com/help/?page=229348490415842].Facebook Connect makes it easierfor you to take your online identitywith you all over the Web, sharewhat you do online with your friendsand stay updated on what they’redoing. You won’t have to createseparate accounts for every website,just use your Facebook loginwherever Connect is available.To call your Facebook identity your“online identity” is delusional as wellas presumptuous, because none of usinteract only with Facebook on theWeb. But that by itself isn’t the onlyproblem. The larger problem withFacebook Connect is unintended dataspillage. For example, what if youdon’t want to share “what you doonline with friends”, or if you thinkyou’re just taking Facebook’s shortcutwhen you log in for the first time atsome other site? In other words, whatif you only want Facebook Connectto be what it presents itself as: asimple login option for you at anothersite—with no “social”. Or, if you dowant to be “social”, how about beingselective about what Facebook datayou’re willing to share—exposing somedata to some sites and different data (orno data at all) to other sites? In otherwords, what if your “privacy preferences”aren’t just the one-set-fits-all selectionsyou’ve made at Facebook?The site I Shared What?!?[http://ISharedWhat.com], createdby VRM developer Joe Andrieu, doesan excellent job of simulating whatyou reveal about yourself and otherswhen you use Facebook Connect.Here’s what it just told me:You’ve just shared: your basicdetails, your news feed, yourfriends on Facebook, the activitieslisted on your profile, the interestslisted on your profile, music listedon your profile, books listed onyour profile, movies listed on yourprofile, television shows listed onyour profile, all the pages you have“liked”, your shared links.That’s after I jiggered my privacysettings in Facebook to reveal as littleas possible. The results were the sameboth times, before and after. I’m nodummy, but—as of this writing—I haveno idea how to keep from spilling riversof personal data when using FacebookConnect. So I just avoid it.But I’m the exception. As of December2010, a quarter-billion people were120 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


YOUR CHOICE OF CAPTORlogging into 2 million Web sites usingFacebook Connect, with 10,000 moreWeb sites being added every day.Privacy is the marquee issue here,but the deeper problem is control.Right now you don’t have muchcontrol over your identity on the Web.Kim Cameron, father of the crossplatformIdentity Metasystem when hewas an Architect at Microsoft, says[http://www.identityblog.com/?p=838]your “natural identity”—who you are andhow you wish to be known to others—getsno respect on the commercial Web. Everycow thinks you are its calf, “branded”almost literally. In the World Wide Dairy, allof us walk around with a brand for everyWeb site that gives us milk and cookies.Our virtual hides are crowded with morelogos than a NASCAR racer.Who you are to the other cows is notof much interest to any one cow, unlessthey “federate” your identity with othercows. Think of federation as “largecompanies having safe sex with customerdata”. That’s what I called it in a keynote<strong>Linux</strong> JournaLnow availablefor the iPad andiPhone at theApp Store.linuxjournal.com/iosFor more information about advertising opportunities within <strong>Linux</strong> <strong>Journal</strong> iPhone, iPad andAndroid apps, contact Rebecca Cassity at +1-713-344-1956 x2 or ads@linuxjournal.com.


YOUR CHOICE OF CAPTORI gave at Digital ID World in the early2000s. It still applies.Tough teatsA similar problem comes up whenyou have multiple accounts with onesite or service, and therefore multiplenamespaces, each with its own loginand password. For example, I use fourdifferent Flickr accounts, each with itsown photo directory:n Doc Searls—http://www.flickr.com/photos/docsearlsn <strong>Linux</strong> <strong>Journal</strong>—http://www.flickr.com/photos/linuxjournaln Berkman Center—http://www.flickr.com/photos/berkmancentern Infrastructure—http://www.flickr.com/photos/infrastructureThe first is mine alone. The second Ishare with other people at <strong>Linux</strong> <strong>Journal</strong>.The third I share with other people atthe Berkman Center. The fourth I sharewith other people who also write for thesame blog.Flickr in each case calls me by thesecond person singular “you” and doesnot federate the four. To them, I amfour different individuals: one cow,four calves. (Never mind that three ofthose sites have many people uploadingpictures, each pretending to be the samecalf.) My only choice for dealing withthis absurdity is deciding which kindof four-headed calf I wish to be. EitherI use one browser with four differentlogins and passwords, or I use fourdifferent browsers, each with its ownjar of cookies. Both choices are awful,but I have to choose one. So I take thesecond option and use one browser peraccount—on just one laptop. When I useother laptops, or my iPhone, my Android,or the family Nokia N900, an iPod Touch,or an iPad, I’m usually the first kind ofcalf, using one browser to log in andlog out every time I post pictures to adifferent account. Which I mostly don’tdo at all, because it’s one big pain inmy many asses.Crazy as all this is, it hardly mattersfrom the cow side of the cow-calfrelationship. In fact, the condition isgenerally considered desirable. Afterall, the World Wide Ranch is a realmarketplace: one where cows compete forcalves. Hey, what could be more natural?The media, even online, are no helpeither. Business publications, bothonline and off, love to cover what I call“vendor sports”, in which customersare prizes for corporate trophy cases.122 / AUGUST <strong>2012</strong> / WWW.LINUXJOURNAL.COM


Advertiser IndexThat’s why business writers often regard“owning” customers as a desirablething. For example, in “Rim, CarriersFight Over Digital Wallet”, from theMarch 18, 2011, Wall Street <strong>Journal</strong>,Phred Dvorak and Stuart Weinbergwrite, “RIM and carriers like RogersCommunications Inc. in Canada, andAT&T Inc. and T-Mobile in the U.S.,disagree over exactly where on the phonethe credentials should reside—and thuswho will control the customers, revenueand applications that grow out of mobilepayments.” The italics are mine.In the dairy farm model of themarketplace, captive customers are morevaluable than free ones. Since this hasbeen an ethos of mass marketing for acentury and a half, the milk-and-cookiessystem of customer control has provenappealing to businesses outside the onlineworld as well as within it. As of today,both worlds are worse off for it as well.Thank you as always for supporting ouradvertisers by buying their products!ADVERTISER URL PAGE #1&1 http://www.1and1.com 31CloudOps <strong>2012</strong>: Cloud http://www.frallc.com 51Computing for Asset ManagersDrupalCon Munich http://munich<strong>2012</strong>.drupal.org 33Emac, Inc. http://www.emacinc.com 57HPC on Wall Street http://www.flaggmgmt.com/hpc 101iXsystems, Inc. http://www.ixsystems.com 7Kiwi PyCon http://nz.pycon.org 59<strong>Linux</strong>Con North America https://events.linuxfoundation.org/ 2events/linuxconMicroway, Inc. http://www.microway.com 66, 67Silicon Mechanics http://www.siliconmechanics.com 3Texas <strong>Linux</strong> Fest http://texaslinuxfest.org 41Usenix OSDI Conference https://www.usenix.org/conference/osdi12 17So, thenFor free markets to mean more than“your choice of captor”, we need newsystems that operate on the principle thatfree customers are more valuable—toboth sellers and themselves—than captiveones. Improving slavery does not makepeople free. We need full emancipation.That’s the only way we’ll get free marketsworthy of the name.■ATTENTION ADVERTISERSThe <strong>Linux</strong> <strong>Journal</strong> brand’s following hasgrown to a monthly readership nearlyone million strong. Encompassing themagazine, Web site, newsletters andmuch more, <strong>Linux</strong> <strong>Journal</strong> offers theideal content environment to help youreach your marketing objectives. Formore information, please visithttp://www.linuxjournal.com/advertising.WWW.LINUXJOURNAL.COM / AUGUST <strong>2012</strong> / 123

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!