13.07.2015 Views

ebook on regular expressions for Google Analytics - LunaMetrics

ebook on regular expressions for Google Analytics - LunaMetrics

ebook on regular expressions for Google Analytics - LunaMetrics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Google</strong> <strong>Analytics</strong> is <strong>on</strong>e of themost widely used tools to measure andevaluate websites. The GA team hasworked hard to make it easier and moreintuitive than ever be<strong>for</strong>e. However,you may still feel that you are limitedby the <strong>Google</strong> <strong>Analytics</strong> out-of-the boxfuncti<strong>on</strong>ality.If so, it’s time <strong>for</strong> you to learnabout Regular Expressi<strong>on</strong>s and how GAuses them.j Usta story about Regular Peopleand Regular Expressi<strong>on</strong>s . . .When I first starting working with <strong>Google</strong><strong>Analytics</strong>, I was an analyst. A marketingpers<strong>on</strong>. Not a techie.Back then, the <strong>Google</strong> <strong>Analytics</strong>documentati<strong>on</strong> kept referencing somethingcalled Regular Expressi<strong>on</strong>s. I could see thatmy goals and filters weren’t doing whatthey needed to do, but, not being a techie, Ididn’t know how to implement RegEx andfix them.(In fact, I knew so little about this spacethat when a friend referred to RegularExpressi<strong>on</strong>s as “RegEx,” I w<strong>on</strong>dered whathe was talking about.)Slowly I taught them to myself, with thehelp of Wikipedia and a friend in Australia.Then I began to blog about them, usingn<strong>on</strong>-techie language. I got a letter froma trainer <strong>on</strong> the other side of the p<strong>on</strong>dwho told me that when he trained peoplein Regular Expressi<strong>on</strong>s, he turned themloose <strong>on</strong> the <strong>LunaMetrics</strong> blog. Eventually,<strong>Google</strong> invited my company to becomea <strong>Google</strong> <strong>Analytics</strong> Certified Partner.Our company helped rewrite the <strong>Google</strong><strong>Analytics</strong> Help Center secti<strong>on</strong> <strong>on</strong> RegularExpressi<strong>on</strong>s. And to this day, I get randomemails from random people, asking me totroubleshoot their RegEx.2


wH y use reg Ex?In <strong>Google</strong> <strong>Analytics</strong>, you can useRegular Expressi<strong>on</strong>s to ...create filters. Many filters require Regular Expressi<strong>on</strong>s. Ifyou d<strong>on</strong>’t know what filters are, you can start learning about them here.create <strong>on</strong>e goal that matches multiple goal pages.Perhaps your “thank you” page has many names, but to you, all leads are the samegoal. So you can use Regular Expressi<strong>on</strong>s to “roll them up.”fine-tune your funnel steps so that you can get exactly what youneed. Remember, Regular Expressi<strong>on</strong>s can be specific.What are Regular Expressi<strong>on</strong>s, anyway? Regular Expressi<strong>on</strong>s are about“power matching.” If you need to create a goal that matches multiple thank-youpages – that is power matching. If you need to write a filter that matches multipleURLs, but <strong>on</strong>ly know what a piece of each URL looks like – again, that is powermatching.But what about Advanced Segments?Can’t I skip this whole RegEx thing now that<strong>Google</strong> <strong>Analytics</strong> has Advanced Segments?A word about language.What would a how-to guide be without somedicti<strong>on</strong>ary-type advice? Here are some of thec<strong>on</strong>venti<strong>on</strong>s we may use:GA: The abbreviati<strong>on</strong> <strong>for</strong> <strong>Google</strong> <strong>Analytics</strong>RegEx: the abbreviati<strong>on</strong> <strong>for</strong> RegularExpressi<strong>on</strong>s (singular and plural)Plain text: Not Regular Expressi<strong>on</strong> TextString: any assembly of characters and/orspaces. A word could be a string, a sentencecould be a string, a URL could be a string.Target String: the string you are attemptingto match with your RegEx.Example: when I use robb?(y|i)n to matchmy name, Robbin (the <strong>on</strong>e with all thefunny characters), robb?(y|i)n is theRegex, and my name, Robbin, is a targetstring.A word about <strong>for</strong>mat.I just hate when I read a post or book andthey write that the keyword is “sodapop.”Or “vanilla.” Or anything that has quotati<strong>on</strong>marks around it. Because you never know ifthe quotati<strong>on</strong> marks are part of the stuff youare working <strong>on</strong>, or are just used to separatethat word from the rest of the sentence.C<strong>on</strong>sequently, I put all target strings andall RegEx in boldface – no quotati<strong>on</strong> marksneeded.Well, no. Advanced Segments are lovely, and they often make filtersunnecessary. But they d<strong>on</strong>’t work the same way as filters do. And youwill still need Regular Expressi<strong>on</strong>s to create interesting and complicatedgoals, and to accommodate your website designer who doesn’t do thingsthe <strong>Google</strong> <strong>Analytics</strong>-friendly way. And sometimes, you will want touse Regular Expressi<strong>on</strong>s in your Advanced Segments. 3


IndExRegular Expressi<strong>on</strong>s page 5Learn About RegEx page 6Let’s Practice page 21NOT The End page 22\|?()[-]{}.*+The Backslash page 7The Pipe page 8The Questi<strong>on</strong> Mark page 9Parentheses page 10Square Brackets & Dashes page 12Braces page 14The Dot page 15The Plus Sign page 16The Star page 17The Dot Star page 18^. *$The Caret page 19The Dollar Sign page 204


Regular Expressi<strong>on</strong>s match as much as possible.Most of this <str<strong>on</strong>g>ebook</str<strong>on</strong>g> will be about all the characters that make upthe Regular Expressi<strong>on</strong>s “toolset.” But first – you’ll be lost if youd<strong>on</strong>’t understand the c<strong>on</strong>cept of matching as much as possible.Sometimes you will be very surprised at how much it matches, justas I was when I was new at RegEx.Let’s start with an example: Our company often wants to see allthe keywords that include our company name (branded search)and, more often, we want to see all the keywords that d<strong>on</strong>’t includeour company name (unbranded search). Below are two of theRegular Expressi<strong>on</strong>s we use when we are including or excludingour company name:LunaMetricA friend wrote a questi<strong>on</strong> <strong>on</strong> a <strong>for</strong>um. He wantedto exclude all traffic that was coming from <strong>Google</strong> referringlinks, all around the world. He tried to create a reallyfancy RegEx that would include all countries, something likewww\.google\.(com|co\.(au|uk|il))/(docs|analytics|reader) etc.There are two interesting questi<strong>on</strong>s here, and you might feel somerighteous indignati<strong>on</strong> as you ask them. I know that I felt this waywhen I was first learning RegEx:1. Why would they match? Not a single <strong>on</strong>e of those phrasesincludes the entire name.2. How can those possibly be Regular Expressi<strong>on</strong>s? There are noRegEx characters!answErs1. They match because Regular Expressi<strong>on</strong>s in GA will match andmatch until they aren’t allowed to any more. That’s why Metricmatches the target string, <strong>LunaMetrics</strong> – if it matches any partof the word, it will match the whole word.2. And the characters? You d<strong>on</strong>’t need to have those charactersjust to have a Regular Expressi<strong>on</strong>, and having the charactersdoesn’t necessarily make it a RegEx. All you need to do is put theexpressi<strong>on</strong> into a field that is sensitive to RegEx. For example,when you write a <strong>Google</strong> <strong>Analytics</strong> goal, you get to choose “headmatch,” “exact match” or “Regular Expressi<strong>on</strong>.” As so<strong>on</strong> as youchoose “Regular Expressi<strong>on</strong>,” the field becomes sensitive toRegEx, and all the rules of RegEx apply. You often need to uselittle RegEx characters – but not always.I wrote back: “Why d<strong>on</strong>’t you just usethe word google as your RegEx?”5


How do I lEarnaboUt reg Ex?of course, reading this eBook will help you, but you can <strong>on</strong>lyget so far by reading. Ultimately, understanding and writing Regular Expressi<strong>on</strong>s(RegEx) is a little bit like getting your first job. You can’t get hired withoutexperience, and you can’t get experience without getting hired.With RegEx, you d<strong>on</strong>’t really understand them until you use them, and you can’treally use them until you understand them. So you have to learn a little bit, and thenuse a little bit and get them wr<strong>on</strong>g, and then go back to the book and learn a little bitmore.The other problem you will have with RegEx is that each character is easy. Put themall together and you get this:There are a lot of great resourcesto check your Regular Expressi<strong>on</strong>s. If you use thePC, I str<strong>on</strong>gly recommend the RegEx Coach:http://www.weitz.de/regex-coach/Un<strong>for</strong>tunately, it is unavailable <strong>for</strong> Mac, so analternative would be the RegEx Match Maker:http://source<strong>for</strong>ge.net/projects/quregexmm/There are many others./\?cid=[0-9]{3,3}.And that <strong>on</strong>e wasn’t very hard. The more you work with them, the easier they’ll get.So master each step, put a couple together, make some mistakes and get going.So<strong>on</strong> you’ll be a RegEx pro.6


get startedThe b AckslAshI always encourage people to start their“RegEx career” by learning the characters, and the best <strong>on</strong>eto start with is the backslash. A backslash is different fromall the other characters, as you will see. It provides a bridgebetween Regular Expressi<strong>on</strong>s and plain text.A backslash “escapes” a character. What does “escape”mean? It means that it turns a Regular Expressi<strong>on</strong> characterinto plain text. If that doesn’t make sense to you yet, hold <strong>on</strong>– I have a few examples coming.Perhaps /folder?pid=123 is your goal page. The problemwe have is that the questi<strong>on</strong> mark already has another use inRegular Expressi<strong>on</strong>s – but we need <strong>for</strong> it to be an ordinaryquesti<strong>on</strong> mark. (We need it to be plain text.)We can do it like this:/folder\?pid=123Notice how there is a backslash in fr<strong>on</strong>t of the questi<strong>on</strong>mark – it turns it into a plain questi<strong>on</strong> mark.Example Alert: Now, when we are creating a <strong>Google</strong><strong>Analytics</strong> search and replace filter <strong>for</strong> the page above, wecan use that backslash and know that GA understands weare looking to match to a real questi<strong>on</strong> mark, like so:\Why are backslashesabout getting started?1. You will use them more than any other RegEx character.2. They turn special RegEx charactersinto everyday, plain characters.7


not-very-wild cardsThe p Ipe|The pipe is the simplest of Regular Expressi<strong>on</strong>s, and it isthe <strong>on</strong>e that all Regular People (that’s you and me) shouldlearn. It means or .Here’s a simple example: Coke|Pepsi . A soft-drinkblog might use that expressi<strong>on</strong> with their <strong>Google</strong> <strong>Analytics</strong>keyword report to find all examples of searches that cameto their blog using either the keyword Coke or the keywordPepsi.Here’s another example: Let’s say you have two thankyoupages, and you need to roll them up into <strong>on</strong>e goal.The first <strong>on</strong>e is named thanks, and the sec<strong>on</strong>d is namedc<strong>on</strong>firmati<strong>on</strong>. You could create your goal like this:c<strong>on</strong>firmati<strong>on</strong>|thanksHere you can see a screen shot of a <strong>Google</strong> <strong>Analytics</strong> goal setupthat includes the two pages suggested above, with a pipe:That says, match either of those pages. Notice that the pipematches everything <strong>on</strong> either side of it.And where is the Pipe,anyway? Keyboardsdiffer, but you’ll mostlikely find it above yourEnter key.8


not-very-wild cardsThe q uESti<strong>on</strong> MarKA questi<strong>on</strong> mark means, “The last item (which, <strong>for</strong>now, we’ll assume is the last character) is opti<strong>on</strong>al.” Soremember how I wrote that people misspelled my name,often spelling it with <strong>on</strong>e b instead of two? With a RegEx ofRobb?in, I can capture both Robbin and Robin. That’sbecause the questi<strong>on</strong> mark means that the expressi<strong>on</strong> willmatch even if the sec<strong>on</strong>d “b” isn’t in the target string –because it is opti<strong>on</strong>al.In case you’re w<strong>on</strong>dering – I often use this RegEx to filterkeywords <strong>on</strong> my company’s website, so that I can quicklypull out keywords that were clearly looking <strong>for</strong> me instead ofour services.That way, I capture both the people who spell my namecorrectly and those who misspell. In fact, when I do a reportthat excludes branded keywords, I usually try to excludeboth our company name and my name. Like this:?9


groupingpaRentHeseS( )Parentheses in Regular Expressi<strong>on</strong>s work the same way that theydo in mathematics. This falls into the category of “Things I shouldhave learned had I been paying attenti<strong>on</strong> in grade school.”Now, be<strong>for</strong>e I get into this l<strong>on</strong>g explanati<strong>on</strong>, I want to provide anexample. This is because I user-tested this whole <str<strong>on</strong>g>ebook</str<strong>on</strong>g>, and thetesters complained, “Robbin, sometimes we just need to see theexample up fr<strong>on</strong>t.” So here is an example of parentheses:/folder(<strong>on</strong>e|two)/thanksThis matches two URLs, folder<strong>on</strong>e/thanks and foldertwo/thanks.OK, <strong>on</strong> with the explanati<strong>on</strong>. Remember, we were talking aboutthings we should have learned had we been paying attenti<strong>on</strong> inschool.Remember how your math teacher said that if you had anequati<strong>on</strong>, the divisi<strong>on</strong> and multiplicati<strong>on</strong> got d<strong>on</strong>e be<strong>for</strong>e thesubtracti<strong>on</strong> and additi<strong>on</strong>? Well, since I wasn’t paying attenti<strong>on</strong> inMrs. Petrowski’s 4th-grade class, I pulled out my old notes, andhere is what I found:If you wanted it to execute differently, you had to <strong>for</strong>ce theequati<strong>on</strong> with parentheses – like this:(2 + 3) x 5 = 25Above, I’ve changed the same numbers to become 2 plus 3 equals5, times 5 equals 25. And that’s the value of parentheses in math. Isee from my very old notes that Mrs. Petrowski called it the Orderof Operati<strong>on</strong>s.So what about Regular Expressi<strong>on</strong>s? Why would we needparentheses there? In order to understand our need, we have tolook at other expressi<strong>on</strong>s (just like we had to understand the mathoperati<strong>on</strong>s symbols in order to understand why parentheses areneeded.)2 + 3 x 5 = 17(Right? 3 times 5 equals 15, plus 2 equals 17.)10


groupingsquArE BraCkeTs&daShesI usually like to introduce just <strong>on</strong>e character at atime. But these two are a little like salt and pepper. True, you canuse just salt or just pepper, but they get served together so oftenthat they are a set. And so we have both square brackets [ ] anddashes –With square brackets, you can make a simple list, like this: [aiu] .This is a list of items and includes three vowels <strong>on</strong>ly. Note: Unlesswe use other expressi<strong>on</strong>s to make this more complicated, <strong>on</strong>ly <strong>on</strong>eof them will work at a single time.So p[aiu]n will match pan, pin and pun. But it will not matchpain, because that would require us to use two items from the[aiu] list, and that is not allowed in this simple example.You can also use a dash to create a list of items, like this:[a-z] – all lower-case letters in the English alphabet[A-Z] – all upper-case letters in the English Alphabetpt. 1[a-zA-Z0-9] – all lower-case and upper-case letters, and digits.(Notice they are not separated by commas.)Dashes are <strong>on</strong>e way of creating a list of items quickly, as you cansee above.[-]a D va N Ced tI pCharacters that are usually special, like $ and ?, no l<strong>on</strong>ger arespecial inside of square brackets. The excepti<strong>on</strong>s are the dash,the caret (more <strong>on</strong> this <strong>on</strong>e later) and the backslash, which stillworks like all backslashes do inside the square brackets.12


SquArE bRacKeTs dAsheS&pt. 2Here is an example of how you might use square brackets bythemselves. Let’s say you have a product group, sneakers,and each product name has a number appended to it in theURL (we see this a lot with industrial products where theyd<strong>on</strong>’t have zippy names). So you might have sneakers450,sneakers101, etc.The product manager <strong>for</strong> sneakers450 throughsneakers458 wants a special profile of visits that <strong>on</strong>lyincluded his product pages. So you might include a filterlike the <strong>on</strong>e below, which makes it easy to include all theproduct names, using square brackets:13


groupingbraCes{}Braces are not covered in the <strong>Google</strong> <strong>Analytics</strong> documentati<strong>on</strong>, butthey are supported by GA. Braces are curly brackets, {like this}.Braces repeat the last “piece” of in<strong>for</strong>mati<strong>on</strong> a specific number oftimes. They can be used with two numbers, like this: {1,3}, or with<strong>on</strong>e number, like this: {3}. When there are two numbers in thebraces, such as {x,y}, it means, repeat the last “item” at least x timesand no more than y times. When there is <strong>on</strong>ly <strong>on</strong>e number in thebraces, such as {z}, it means, repeat the last item exactly z times.Here is my two-number-in-braces example:Lots of companies want to take all visits from their IP address out oftheir analytics. Many of those same companies have more than <strong>on</strong>eIP address – they often have a block of numbers. So let’s say thattheir IP addresses go from 123.145.167.0 through 123.145.167.99– how would we capture that range with braces?Our <strong>regular</strong> expressi<strong>on</strong>s would be:123\.145\.167\.[0-9]{1,2}Notice that we actually used four different RegEx characters: Weused a backslash to turn the magic dot into an everyday dot, we usedbrackets as well as dashes to define the set of allowable choices, i.e.the last “item”, and we used braces to determine how many digitscould be in the final part of the IP address.On the other hand, if there is <strong>on</strong>ly <strong>on</strong>e number in the braces,the match will <strong>on</strong>ly work if you have exactly the right number ofcharacters.So here’s an example: Let’s say you are the area manager <strong>for</strong>Allegheny County (where I live), and you want to see all the visitsfrom people who touched a page <strong>on</strong> the site that had 152XX in theURL – that’s the Allegheny County ZIP code. You could create anAdvanced Segment using a Regular Expressi<strong>on</strong> with square bracketsand a brace, like this:This means, if the visit included a trip to a page that included 152and any other two digits (i.e. 152XX) in the url, include the visit inthis segment.14


wilder cardsThE d oT.A dot matches any <strong>on</strong>e character.When I was new at Regular Expressi<strong>on</strong>s, dots were perhaps thestrangest thing <strong>for</strong> me to deal with. I couldn’t figure out what “<strong>on</strong>echaracter” meant. And I couldn’t understand why they matteredmuch. They didn’t seem very wild, as wild cards go.Ultimately, I learned that the list of characters included all thecharacters I could find <strong>on</strong> my keyboard or any other – the alpha,the numeric, the special characters too. A dot even matches awhitespace. I also learned there aren’t that many uses <strong>for</strong> dots bythemselves, but they are very powerful when combined with otherRegEx characters.Let me start with some examples of how the dot can be used al<strong>on</strong>e.Take this Regular Expressi<strong>on</strong>:.ite…. It would match site, lite, bite, kite. It would also match%ite and #ite (because % and # are characters, too.) However, itwouldn’t match ite. Why not? A dot matches <strong>on</strong>e character, andite includes zero characters <strong>for</strong> the dot to match (i.e., it didn’tmatch any).So let’s go to a GA example. Let’s say your company owned a blockof IP addresses:123.45.67.250 through 123.45.67.255You want to create a Regular Expressi<strong>on</strong> that will match the entireblock, so that you can take your company data out of your <strong>Google</strong><strong>Analytics</strong>. Since <strong>on</strong>ly the last character changes, you could do itwith this expressi<strong>on</strong>:123\.45\.67\.25.This would match all the required IP addresses. Note that it wouldalso match 123.45.67.256 and 123.45.67.25% (because a dotmatches any <strong>on</strong>e character). However, each grouping (each octet)in the IP address <strong>on</strong>ly goes up to 255, and we never see percentsigns or other n<strong>on</strong>-numbers in IP addresses, so you would be safewith this.Have you ever noticed that you can typeindex.php into a search box that is sensitive to RegularExpressi<strong>on</strong>s and it works – even though you didn’tescape the dot, the way you were supposed to – i.e.,index\.php ? That’s because the dot matches any <strong>on</strong>echaracter, and <strong>on</strong>e of the characters the dot matches is… a dot. And since it is unlikely that a n<strong>on</strong>-dot will getin there, it usually works just fine.15


wilder cardsThE p lUs SIgn+A plus sign matches <strong>on</strong>e or more of the <strong>for</strong>mer items, which, asusual, we’ll assume is the previous character. (It can be morecomplicated, but let’s start easy.) So the list of possible matches isclear: the <strong>for</strong>mer character. And the number of matches is clear:<strong>on</strong>e or more.Here’s an example from the world of literature: When a charactertrips and falls <strong>on</strong> his face, he often says Aaargh! Or maybe it’sAaaargh! or just Aargh! In any case, you could use a plus sign tomatch the target string, like this: aa+rgh. That will match aarghand aaargh and aaaaaaaaargh … well, you understand. Notice,however, that it w<strong>on</strong>’t match argh. Remember, it is <strong>on</strong>e or moreof the <strong>for</strong>mer items.Does anybody really use plussigns anymore? Sure they do– you may find that you usethis expressi<strong>on</strong> rarely, but the<strong>on</strong>e time you need it, it will bevery valuable.16


wilder cardsThE S tAr*People really misuse stars. They have specificmeanings in other applicati<strong>on</strong>s. They d<strong>on</strong>’t mean the same thingin RegEx as they do in some of those other applicati<strong>on</strong>s, so becareful here.Stars will match zero or more of the previous items. They functi<strong>on</strong>just like plus signs, except they allow you to match ZERO (ormore) of the previous items, whereas plus signs require at least<strong>on</strong>e match. For the time being, let’s just define “previous item” as“previous character.”Since stars are so much like plus signs, I’ll start with the sameexample and point out the differences.So <strong>on</strong>ce again, when a character trips and falls <strong>on</strong> his face, heoften says Aaargh! Or maybe it is Aaaargh! or (unlike last time)just Argh! In any case, you could use a star to match the targetstring, like this: aa*rgh. That will match aargh and aaargh andaaaaaaaaargh – and the big difference from the plus sign is thatit will also match argh (i.e. no extra “a’s” added).You’ll notice that I d<strong>on</strong>’t include any screen shots <strong>for</strong> stars (or plussigns, <strong>for</strong> that matter.) While they are both very important, theyare used very extensively with dots as the wildest cards.17


wildest card. *ThE d ot StArThere are two Regular Expressi<strong>on</strong>sthat, when put together, mean “geteverything.” They are a dot followedby a star, like this:/folder<strong>on</strong>e/.*index\.phpIn this example, our RegularExpressi<strong>on</strong> will match to everythingthat starts with folder<strong>on</strong>e/ andends with index.php . This meansif you have pages in the /folder<strong>on</strong>edirectory that end with .html (I seethat a lot), they w<strong>on</strong>’t be a match tothe above RegEx.Now that you have an example, you might be interested in why thisworks. A dot, you may remember, means get any character. A starmeans repeat the last character zero or more times.This means that the dot could match any letter in the alphabet,any digit, any number <strong>on</strong> your keyboard. And the star right afterit matches the ability of the dot to match any single character, andkeep <strong>on</strong> going (because it is zero or MORE) – so it ends up matchingeverything.Hard to wrap your head around this <strong>on</strong>e? Trust me, it works.a D va N Ced tI pCustom Advanced filters in <strong>Google</strong> <strong>Analytics</strong> often require you to “getall” and make it a variable. For example, when you want to add thehostname to the Request URL (a very comm<strong>on</strong> filter), do it like this:Notice how we put parentheses around the .*, like this: (.*) – youcan see this in the boxes <strong>on</strong> the right side of the filter. To GA, in theadvanced filter secti<strong>on</strong>, this means get all, and put it in a variable. Sowe get the entire hostname (in a variable), the entire request URL ina variable, and then in the bottom field, we are able to specify get thefirst variable in Field A, and to it, add the first variable in Field B.18


anchorsThE c areT^When you use a caret in your Regular Expressi<strong>on</strong>,you <strong>for</strong>ce the Expressi<strong>on</strong> to match <strong>on</strong>ly strings that start exactlythe same way your RegEx does.I see carets misused all the time <strong>on</strong> various <strong>Google</strong> <strong>Analytics</strong> helpgroups. It goes something like this:“I want to include <strong>on</strong>ly subfolder2 in my profile. My URLs looklike: http://www.mysite.com/folder1/subfolder2/index.html. Icreated the include filter and the Regular Expressi<strong>on</strong>, which lookslike this: ^/subfolder2/index\.html. Why isn’t it working?”The reas<strong>on</strong> it doesn’t work is that <strong>Google</strong> <strong>Analytics</strong> starts readingyour URL right after the .com (or .edu, or .net, as it were). The GAsees the above as /folder1/subfolder2/index.html. So whenthe pers<strong>on</strong> who wrote the questi<strong>on</strong> above starts his RegEx with^/subfolder2, he shoots himself in the foot. In the eyes of GA,/folder1 comes be<strong>for</strong>e /subfolder2, yet the caret mandates that/subfolder2 must be at the beginning of the target string. (Andit’s not! So it will never match.)a D va N Ced tI pWhen you put a caret inside square brackets at thevery beginning, it means match <strong>on</strong>ly characters thatare not right after the caret. So [^0-9] meansif the target string c<strong>on</strong>tains a digit,it is not a match.19


anchorsThE d oLLar siGnA dollar sign means d<strong>on</strong>’t match if the target string has anycharacters bey<strong>on</strong>d where I have placed the dollar sign in myRegular Expressi<strong>on</strong>.So let’s say you want to include your homepage in your funnel.(Not such a great idea, if you ask me, but this book is about RegExand not best analysis practices.) <strong>Google</strong> <strong>Analytics</strong>, by default, callsyour homepage this:$/i.e., just a slash. The problem with including just a slash in yourfunnel is that it will match everything. Every single URL <strong>on</strong> yoursite has a slash in it, so it automatically matches them all, sinceRegEx are greedy. In order to match just your homepage, youshould follow it with a dollar-sign anchor, like so:/$That means the page has to end with a slash.But wait … aren’t there pages that end with a slash that aren’t yourhome page? Like so: /mysite.com/folder/So the very best way you can be sure to get just the slash page(homepage) is with a beginning and an ending anchor:^/$Use this cautiously, because you might havequery parameters that are “shut out” of thematch. For example, let’s say your affiliatemarketers append code <strong>on</strong>ly to the end of theURL, so they can get credit <strong>for</strong> the sale:http://www.mysite.com/productpage123?affiliate=storenameIn the eyes of <strong>Google</strong> <strong>Analytics</strong>, that looks like:/?affiliate=storename… and it doesn’t match the ^/$ RegEx in theexample <strong>on</strong> this page, because the target stringdoesn’t start and end with the same slash.20


leT s prActicEWhen I was a newbie to RegEx, I used to see this <strong>on</strong> the RegEx Wikipedialisting (which I pored over c<strong>on</strong>stantly, trying to understand what these little characters were):((great )*grand)?((fa|mo)ther)Like most things in life, <strong>on</strong>ce you understand them, they are trivial … but be<strong>for</strong>e you understand them, they are daunting. So I will take this apart tomake it easier to understand.The parentheses create groups, separated by a questi<strong>on</strong> mark. So we effectively have:(First Expressi<strong>on</strong> in the first set of parentheses)?(Sec<strong>on</strong>d Expressi<strong>on</strong> in the sec<strong>on</strong>d set of parentheses)Since a questi<strong>on</strong> mark means include zero or <strong>on</strong>e of the <strong>for</strong>meritem, we know that the target string will be a match to this practiceRegEx whether it <strong>on</strong>ly matches the expressi<strong>on</strong> in the sec<strong>on</strong>d setof parentheses or it matches both the expressi<strong>on</strong>s in both sets ofparentheses. (Right? That’s what questi<strong>on</strong> marks do, allowing thetarget string to match the item be<strong>for</strong>e them OR NOT.) So let’s startby looking at the sec<strong>on</strong>d half <strong>on</strong>ly, which we now know should beable to stand by itself:((fa|mo)ther)The pipe symbol | means or. So this resolves to father or mother.Now let’s go back andlook at the first half, thepart that came be<strong>for</strong>ethe questi<strong>on</strong> mark:((great)*grand)The star tells us to match zero, <strong>on</strong>e or more than <strong>on</strong>einstances of the expressi<strong>on</strong> be<strong>for</strong>e it. So it can matcha string that doesn’t include great, in which case wejust have grand; and, of course, we always have theend of the expressi<strong>on</strong>, which will be either mother orfather. So we are allowed to match to grandmotheror grandfather. It can match a string which includesgreat just <strong>on</strong>ce, in which case we have greatgrandmother or great grandfather. And it canmatch a string which includes great more than <strong>on</strong>ce,so we might end up with great great great greatgrandmother or great great grandfather.So there you have it – all your ancestorswith just <strong>on</strong>e Regular Expressi<strong>on</strong>.21


( noT )ThE endSo instead of c<strong>on</strong>cluding, let’s get started … with apop quiz. I’ll blog about the answers so<strong>on</strong> after this is published.(That way, you’ll all have a chance to say, “No, Robbin, I founda better way to write them!”) The title of the blog post will beRegEx eBook Answers, so you can search <strong>for</strong> it easily.Write a Regular Expressi<strong>on</strong> that matches both dialog anddialogueWrite a RegEx that matches two request URLs:/sec<strong>on</strong>dfolder/?pid=123 and /sec<strong>on</strong>dfolder/?pid=567(and cannot match other URLs)Write a single Regular Expressi<strong>on</strong> that matches all yoursubdomains (and doesn’t match anything else). Make it asshort as possible, since <strong>Google</strong> <strong>Analytics</strong> sometimes limits thenumber of characters you can use in a filter. Your subdomainsare subdomain1.mysite.com, subdomain2.mysite.com,subdomain3.mysite.com, subdomain4.mysite.com,www.mysite.com, store.mysite.com and blog.mysite.comWrite a funnel and goal that includes three steps <strong>for</strong> thefunnel and the final goal step (four steps in all), using RegularExpressi<strong>on</strong>s. Notice that there are two ways to achieve Step 2.Here are the three pages:➊ /store/➋ /store/creditcard or store/m<strong>on</strong>eyorder➌ /store/shipping➍ /store/thankyou22


IoSAbout <strong>LunaMetrics</strong><strong>LunaMetrics</strong> is a <strong>Google</strong> <strong>Analytics</strong> Certified Partner in Pittsburgh, PA.The company c<strong>on</strong>sults in Traffic, Analysis and Acti<strong>on</strong> – that translates toSEO/PPC, <strong>Google</strong> <strong>Analytics</strong>, and User Testing/Website Optimizer. Goodc<strong>on</strong>tact in<strong>for</strong>mati<strong>on</strong> <strong>for</strong> <strong>LunaMetrics</strong> is info@lunametrics.com or directly at+1-412-381-5500. If you downloaded this <str<strong>on</strong>g>ebook</str<strong>on</strong>g> and aren’t sure where to goback to see it <strong>on</strong>line, the site address is www.lunametrics.com©2010 by Lunametrics. Copyright holder is licensing this under Creative Comm<strong>on</strong>sAttributi<strong>on</strong>-N<strong>on</strong>commercial-Share Alike 3.0 license. Please feel free to post this <strong>on</strong> yourblog or email it to whomever you believe would benefit from reading it. Thank you.<str<strong>on</strong>g>ebook</str<strong>on</strong>g> design: Fireman CreativeAbout Robbin SteifThe author, Robbin Steif, is the owner of <strong>LunaMetrics</strong>. She drove the teamat <strong>Google</strong> <strong>Analytics</strong> a little crazy in her quest to get better GA documentati<strong>on</strong>,and they lovingly obliged by including many of her thoughts in their RegularExpressi<strong>on</strong>s help secti<strong>on</strong>s. Steif is a frequent speaker <strong>on</strong> analytics in generaland <strong>Google</strong> <strong>Analytics</strong> in particular. She has been chair of the Web <strong>Analytics</strong>Associati<strong>on</strong>’s (WAA) Marketing committee, and has served a two year WAABoard of Directors term. Steif is a graduate of Harvard College and the HarvardBusiness School.Robbin loves to hear from users, and she welcomes your questi<strong>on</strong>s andcomments. You can reach her like this:Skype: robbinsteifTwitter: robbinsteif… or the old-fashi<strong>on</strong>ed way:Ph<strong>on</strong>e: +1-412-381-5500.Some thank-yous: To Nick M, who stood in a Palo Alto bowling alley withme and discussed Regular Expressi<strong>on</strong>s <strong>for</strong> Regular People (that was the firstname <strong>for</strong> this <str<strong>on</strong>g>ebook</str<strong>on</strong>g>). To Steve in Australia and Justin, who taught me RegEx.To David Meerman Scott and J<strong>on</strong>athan Kranz – I have never met you, but youtaught me (respectively) why and how to write an <str<strong>on</strong>g>ebook</str<strong>on</strong>g>. A big thank you to myteam at Fireman Creative <strong>for</strong> the care and feeding of this <str<strong>on</strong>g>ebook</str<strong>on</strong>g>.r EgularexpressI<strong>on</strong>SFor gooGleanAlytiCs23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!