Anchors: A way to trim emptiness 59http://example.com/events/http://example.com/people/http://example.com/places/And we need a reference to a common filename:http://example.com/events/index.htmlhttp://example.com/people/index.htmlhttp://example.com/places/index.htmlAnswerFind $Replace index.htmlStripping end-<strong>of</strong>-line charactersComma-delimited files (also known as CSVs, comma-separated values) is a common text formatfor data. In order for a program like Excel to arrange CSV data into a spreadsheet, it uses the commasto determine where the “columns” are.For example:Item,Quantity,In Season?Apples,50,YesOranges,109,NoPears,12,YesKiwis,72,YesIn Excel, the data looks like this:CSV in Excel
Anchors: A way to trim emptiness 60CSV is a popular format and you’ll run into it if you get into the habit <strong>of</strong> requesting data fromgovernmental agencies. Unfortunately, it won’t always be perfect.A common problem – albeit trivial – will occur when exporting data from Excel to CSV. If the lastcolumn is meant to be empty but isn’t (not all databases are well maintained), every line in the CSVfile will have a trailing comma:Item,Quantity,In Season?,Apples,50,Yes,Oranges,109,No,Pears,12,Yes,Kiwis,72,Yes,This usually won’t cause problems, especially in modern spreadsheets such as Excel and GoogleDocs. But older database import programs might protest. And if you’re even a little OCD, thosesuperfluous commas will bother you. So let’s wipe them out with a single regex.Inefficient method: remove and replace newlines If you fully grok newline characters, yourealize you can fix the problem by targeting the pattern <strong>of</strong>: a comma followed by a newline character:Find ,\nReplace \nThis almost works, except you won’t catch the very last line:Item,Quantity,In Season?Apples,50,YesOranges,109,NoPears,12,YesKiwis,72,Yes,Why wasn’t that trailing comma from the final line deleted? <strong>The</strong> pattern we used looked only forcommas followed by a newline character. So if that final line is the final line, there is no othernewline character. So we have to delete that comma manually.Efficient method: leave the newlines alone Besides that nagging manual deletion step (whichquickly becomes annoying if you’re cleaning dozens <strong>of</strong> files), our Find-and-Replace just seemsa little wasteful. All we want is to delete the trailing comma, but we end up also deleting (andreinserting) a newline character.
- Page 1:
The Bastards Book ofRegular Express
- Page 5 and 6:
CONTENTSOptionality and alternation
- Page 7 and 8:
CONTENTSSwitching visualizations (T
- Page 9 and 10:
Regular Expressions are for Everyon
- Page 11 and 12:
Regular Expressions are for Everyon
- Page 13 and 14:
Getting Started6
- Page 15 and 16: Finding a proper text editor 8Notep
- Page 17 and 18: Finding a proper text editor 10Edit
- Page 19 and 20: Finding a proper text editor 12Text
- Page 21 and 22: Finding a proper text editor 14does
- Page 23 and 24: Finding a proper text editor 16Yest
- Page 25 and 26: Finding a proper text editor 18A do
- Page 27 and 28: A better Find-and-Replace 20The lim
- Page 29 and 30: A better Find-and-Replace 22Using R
- Page 31 and 32: Your first regex 24helloThe Find-an
- Page 33 and 34: Your first regex 26The regex syntax
- Page 35 and 36: Your first regex 28Double-bounded
- Page 37 and 38: Your first regex 30AnswerFind \bcat
- Page 39 and 40: Removing emptinessIt’s funny how
- Page 41 and 42: Removing emptiness 34Now let’s do
- Page 43 and 44: Removing emptiness 36…as opposed
- Page 45 and 46: Removing emptiness 38Replacement in
- Page 47 and 48: Match one-or-more with the plus sig
- Page 49 and 50: Match one-or-more with the plus sig
- Page 51 and 52: Match one-or-more with the plus sig
- Page 53 and 54: Match one-or-more with the plus sig
- Page 55 and 56: Match zero-or-more with the star si
- Page 57 and 58: Specific and limited repetition 501
- Page 59 and 60: Specific and limited repetition 52E
- Page 61 and 62: Specific and limited repetition 54C
- Page 63 and 64: Anchors: A way to trim emptinessIn
- Page 65: Anchors: A way to trim emptiness 58
- Page 69 and 70: Anchors: A way to trim emptiness 62
- Page 71 and 72: Matching any letter, any number 64A
- Page 73 and 74: Matching any letter, any number 66W
- Page 75 and 76: Matching any letter, any number 68[
- Page 77 and 78: Matching any letter, any number 70R
- Page 79 and 80: Matching any letter, any number 72F
- Page 81 and 82: Matching any letter, any number 74A
- Page 83 and 84: Negative character sets 76$1,200.00
- Page 85 and 86: Negative character sets 78\W+Exerci
- Page 87 and 88: Capture, Reuse 80Find (ba)+Matches
- Page 89 and 90: Capture, Reuse 82.- then the 1st ba
- Page 91 and 92: Capture, Reuse 84In English We are
- Page 93 and 94: Capture, Reuse 86ApplesOraclesOrang
- Page 95 and 96: Capture, Reuse 8805-14-8912-03-9803
- Page 97 and 98: Capture, Reuse 90Mary asked: "What
- Page 99 and 100: Optionality and alternationThe two
- Page 101 and 102: Optionality and alternation 94Answe
- Page 103 and 104: Optionality and alternation 96Studi
- Page 105 and 106: Optionality and alternation 98You m
- Page 107 and 108: Laziness and greediness 100Being to
- Page 109 and 110: Laziness and greediness 102Replace
- Page 111 and 112: Laziness and greediness 104With the
- Page 113 and 114: Lookarounds 106cat(?=s)- will match
- Page 115 and 116: Lookarounds 108`(?
- Page 117 and 118:
Lookarounds 110ExerciseGiven the fo
- Page 119 and 120:
Lookarounds 112City,CountryAlbuquer
- Page 121 and 122:
Lookarounds 114Given this list of c
- Page 123 and 124:
Lookarounds 116100,J.D. Salinger Av
- Page 125 and 126:
Lookarounds 1186,300|Apples|New Yor
- Page 127 and 128:
Lookarounds 120^(\d+) Record ID (\w
- Page 129 and 130:
Lookarounds 122d. Remove all asides
- Page 131 and 132:
From Data to HTML (TODO)This chapte
- Page 133 and 134:
From Data to HTML (TODO) 126Turning
- Page 135 and 136:
The ExercisesI’ve never been one
- Page 137 and 138:
Data Cleaning with the Stars 130LOR
- Page 139 and 140:
Data Cleaning with the Stars 132[GM
- Page 141 and 142:
Finding needles in haystacks (TODO)
- Page 143 and 144:
Changing phone format (TODO)Todo: T
- Page 145 and 146:
Changing phone format (TODO) 138Exe
- Page 147 and 148:
Changing phone format (TODO) 140Ans
- Page 149 and 150:
Changing phone format (TODO) 142`1-
- Page 151 and 152:
Changing phone format (TODO) 144(\d
- Page 153 and 154:
Dating, Associated Press Style (TOD
- Page 155 and 156:
Dating, Associated Press Style (TOD
- Page 157 and 158:
Dating, Associated Press Style (TOD
- Page 159 and 160:
Dating, Associated Press Style (TOD
- Page 161 and 162:
Sorting a police blotter 154Sloppy
- Page 163 and 164:
Sorting a police blotter 156I used
- Page 165 and 166:
Converting XML to tab-delimited dat
- Page 167 and 168:
Converting XML to tab-delimited dat
- Page 169 and 170:
Cleaning up Microsoft Word HTML(TOD
- Page 171 and 172:
Cleaning up OCR Text (TODO)Image sc
- Page 173 and 174:
Moving forwardThank you for taking