13.07.2015 Views

Download - The Bastards Book of Regular Expressions

Download - The Bastards Book of Regular Expressions

Download - The Bastards Book of Regular Expressions

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Lookarounds 116100,J.D. Salinger Ave.,City,ST,9999942,J.F.K. Blvd. New,York,NY,10555But notice the improper delimitation in the second line. <strong>The</strong> street name – J.F.K. Blvd. New –includes part <strong>of</strong> the city name – the New from New York.This happens in any case where the city name consists <strong>of</strong> more than one word:50 Fifth Ave. New York, NY 10012Becomes:50,Fifth Ave. New,York,NY,10012Instead <strong>of</strong> what we had before:50,Fifth Ave.,New York,NY,10012Why did this happen? <strong>The</strong> subpattern [\w .]+ was just greedy. We need to make it lazier so thatthe street name field doesn’t unintentionally swallow part <strong>of</strong> the city name.TODOTK: (move to laziness chapter?)Answer<strong>The</strong> pattern for the street name is now:([\w .]+?)And the complete pattern is otherwise unchanged:Find ˆ(\d+) ([\w .]+?) ([\w ]+), ([A-Z]{2}) (\d{5})How did one question mark make all the difference?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!