13.07.2015 Views

Download - The Bastards Book of Regular Expressions

Download - The Bastards Book of Regular Expressions

Download - The Bastards Book of Regular Expressions

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Lookarounds 117Mixed commas and other delimitersAgain, just to hammer home the point: data is just text, with structure. Why does that structurehave to be defined with commas? It doesn’t, so good for you for realizing that.We can basically use any symbol to structure our data. Tab-separated values, a.k.a. TSV, is anotherpopular format. In fact, when you copy and paste from a HTML table, such as this Wikipedia HTMLchart, you’ll get:TKTKAnd most modern spreadsheet programs will automatically parse pasted TSV text into columns.Copy-and-pasting from the above text will get you this in Google Docs:TKTKHeck, you can just copy-and-paste directly from the webpage into the spreadsheet:TKTKCollisions<strong>The</strong> reason why most data-providers don’t use just “any” symbol to delimit data, though, is apractical one. What happens if you use the letter a as a delimiter – nut your data includes lots<strong>of</strong> a characters naturally?You can do it, but it’s not pretty.But we don’t have to dream <strong>of</strong> that scenario, we already have that problem with using commadelimiters. Consider this example list:6,300 Apples from New York, NY $15,2304,200 Oranges from Miami, FL $20,112<strong>The</strong>re’s commas in the actual data, because they’re used as a grammatical convention: 6000, forexample, is 6,300.In this case, we don’t want to use commas as a delimiter. <strong>The</strong> pipe character, |, is a good candidatebecause it doesn’t typically appear in this kind <strong>of</strong> list.We can delimit this list by using this pattern:Find ˆ([\d,]+) (\w+) from ([\w ]+), ([A-Z]{2}) (\$[\d,]+)Replace \1|\2|\3|\4|\5And we end up with:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!