- Page 1 and 2: Mastering Regular Expressions - Tab
- Page 3 and 4: Mastering Regular Expressions by Je
- Page 5 and 6: Optional Items 16 Other Quantifiers
- Page 7 and 8: A Casual Stroll Across the Regex La
- Page 9 and 10: Page vii Alternation 84 Guide to th
- Page 11 and 12: POSIX and the Longest-Leftmost Rule
- Page 13 and 14: More Work for a POSIX NFA 147 Work
- Page 15 and 16: Page ix The Real "Unrolling the Loo
- Page 17 and 18: A Chapter, a Chicken, and The Perl
- Page 19 and 20: The Match Operator 246 Match-Operan
- Page 21 and 22: Page xi The Efficiency Penalty of t
- Page 23 and 24: 6-5 GNU Emacs's Search-Related Prim
- Page 25 and 26: Preface This book is about a powerf
- Page 27 and 28: Anyone who uses GNU Emacs or vi, or
- Page 29 and 30: This book tells a story, but one wi
- Page 31 and 32: • Chapter 2, Extended Introductor
- Page 33 and 34: • Chapter 7, Perl Regular Express
- Page 35 and 36: To make this useful, we can wrap Su
- Page 37: Knowing I would have to write about
- Page 41 and 42: I'd like to thank my company, Omron
- Page 43 and 44: 1 Introduction to Regular Expressio
- Page 45 and 46: Page 2 remove, isolate, and general
- Page 47 and 48: an editor would have been truly ard
- Page 49 and 50: Regular Expressions as a Language P
- Page 51 and 52: The Language Analogy Page 5 Full re
- Page 53 and 54: The goal of this book Page 6 The ch
- Page 55 and 56: Searching Text Files: Egrep Page 7
- Page 57 and 58: ytes and lines in a file, but it ge
- Page 59 and 60: Character Classes Matching any one
- Page 61 and 62: Answers to the questions on page 8.
- Page 63 and 64: With 07.04.76 , the dots are metach
- Page 65 and 66: *Once, in fourth grade, I was leadi
- Page 67 and 68: As you can see, the alternation hap
- Page 69 and 70: Figure 1-2: Start and end of "word"
- Page 71 and 72: Optional Items Page 16 Let's look a
- Page 73 and 74: Other Quantifiers: Repetition Page
- Page 75 and 76: Page 18 This leaves us with , whic
- Page 77 and 78: easier to tell egrep to ignore case
- Page 79 and 80: Answer to the question on page 18.
- Page 81 and 82: Page 21 The metasequence to match a
- Page 83 and 84: Page 22 Also, while egrep doesn't c
- Page 85 and 86: Dollar amount (with optional cents)
- Page 87 and 88: Answer to the question on page 23.
- Page 89 and 90:
"Subexpression" The term subexpress
- Page 91 and 92:
When it comes down to it, regular e
- Page 93 and 94:
• Regular expression features—S
- Page 95 and 96:
• A negated character class is st
- Page 97 and 98:
Remember, though, that a backslash
- Page 99 and 100:
2 Extended Introductory Examples In
- Page 101 and 102:
About the Examples Page 32 Perl all
- Page 103 and 104:
A Short Introduction to Perl Page 3
- Page 105 and 106:
Here's how it looks: % perl -w temp
- Page 107 and 108:
Page 35 Note that a test such as $r
- Page 109 and 110:
Toward a More Real-World Example Pa
- Page 111 and 112:
Page 37 To keep the example unclutt
- Page 113 and 114:
Figure 2-2: Temperature-conversion
- Page 115 and 116:
Negated matches Our program logic h
- Page 117 and 118:
Page 40 ostensibly used only to gro
- Page 119 and 120:
A short aside—metacharacters galo
- Page 121 and 122:
How do [ ] * and ( *| *) compare? A
- Page 123 and 124:
Page 43 Oops! Did you notice that i
- Page 125 and 126:
Answer to the question on page 43.
- Page 127 and 128:
Modifying Text with Regular Express
- Page 129 and 130:
Just what does $var = s/\bJeff\b/Je
- Page 131 and 132:
Page 47 I boiled down my needs to '
- Page 133 and 134:
Page 48 Although I applied this to
- Page 135 and 136:
Page 49 Let's analyze this. To prin
- Page 137 and 138:
the end of the string. * We can't u
- Page 139 and 140:
A Warning about .* The expression .
- Page 141 and 142:
Page 52 header, we'll overwrite $re
- Page 143 and 144:
Page 53 The substitute searches for
- Page 145 and 146:
The "real" real world Page 54 Email
- Page 147 and 148:
Double-word example in modern Perl
- Page 149 and 150:
Page 56 The next unless before the
- Page 151 and 152:
There's nothing special about Perl
- Page 153 and 154:
Page 58 Another interesting thing i
- Page 155 and 156:
It didn't make sense to talk about
- Page 157 and 158:
* "A logical calculus of the ideas
- Page 159 and 160:
Along the way, AT&T Bell Labs added
- Page 161 and 162:
Multiply this by the passage of tim
- Page 163 and 164:
Many issues must be kept in mind, e
- Page 165 and 166:
Page: Page 65 implementation, befor
- Page 167 and 168:
Page: Page 66 standard C library fu
- Page 169 and 170:
Doing Something with the Matched Te
- Page 171 and 172:
Page 68 The early versions of awk d
- Page 173 and 174:
Page 69 expression for finding doub
- Page 175 and 176:
You can guess that the string with
- Page 177 and 178:
Page 71 in every decision the drive
- Page 179 and 180:
Page 72 Some tools add a lot of new
- Page 181 and 182:
Table 3-3: A Few Utilities and Some
- Page 183 and 184:
Octal escape \ num Page 74 Some imp
- Page 185 and 186:
Are these escapes literal? Page 75
- Page 187 and 188:
Page 76 metacharacters take precede
- Page 189 and 190:
Page 77 within a class, [^\"] must
- Page 191 and 192:
Emacs syntax classes Page 78 As an
- Page 193 and 194:
In many tools, the only metacharact
- Page 195 and 196:
POSIX bracket-expression "character
- Page 197 and 198:
POSIX bracket-expression "collating
- Page 199 and 200:
Table 3-4. String/Line Anchors, and
- Page 201 and 202:
Page 83 Each tool has its own idea
- Page 203 and 204:
Intervals {min,max} or \{min,max\}
- Page 205 and 206:
Page 85 happens is different for mo
- Page 207 and 208:
Page 86 an expert to solve Fermat's
- Page 209 and 210:
Well, what if you had an electric c
- Page 211 and 212:
Both engine types have been around
- Page 213 and 214:
On the other hand, egrep, awk, and
- Page 215 and 216:
About the Examples Page 91 This cha
- Page 217 and 218:
attempted at each position all the
- Page 219 and 220:
Engine Pieces and Parts Page 93 An
- Page 221 and 222:
Page 94 they are irrelevant to an e
- Page 223 and 224:
Page 95 If it turns out that the on
- Page 225 and 226:
So, with a variable $line holding a
- Page 227 and 228:
You might even imagine something li
- Page 229 and 230:
Page 98 first try to take as much a
- Page 231 and 232:
Regex-Directed vs. Text-Directed Th
- Page 233 and 234:
DFA Engine: Text-Directed Contrast
- Page 235 and 236:
Page 101 lady sings." in this parag
- Page 237 and 238:
Page 102 The regex-directed nature
- Page 239 and 240:
Page 103 regex from the second posi
- Page 241 and 242:
NFA: Theory vs. Reality The true ma
- Page 243 and 244:
that had been saved as the untried
- Page 245 and 246:
a num a num a num a num These repre
- Page 247 and 248:
A few observations: first, the back
- Page 249 and 250:
Page 109 Actually, since we underst
- Page 251 and 252:
Laziness? Page 110 This whole probl
- Page 253 and 254:
I then noted: Anything matched so f
- Page 255 and 256:
Is Alternation Greedy? Page 112 The
- Page 257 and 258:
Is Alternation Greedy? Page 112 The
- Page 259 and 260:
Uses for Non-Greedy Alternation Pag
- Page 261 and 262:
Answer to the question on page 112.
- Page 263 and 264:
Let me repeat: when the transmissio
- Page 265 and 266:
To match continuation lines, you mi
- Page 267 and 268:
(to|top)(o|polo)?(gical|o?logical)
- Page 269 and 270:
Differences in match speed For simp
- Page 271 and 272:
* I have seen two tools employ slig
- Page 273 and 274:
Simple, however, does not necessari
- Page 275 and 276:
Page 122 Even in a script, efficien
- Page 277 and 278:
Page 123 The problem with (\\\n|[^\
- Page 279 and 280:
Page 124 simple expression he used
- Page 281 and 282:
Page 125 worth the trouble to be mo
- Page 283 and 284:
with the … example from page 109.
- Page 285 and 286:
Indeed, this matches such examples
- Page 287 and 288:
The solution is to add an alternati
- Page 289 and 290:
3. match the ending delimiter As I
- Page 291 and 292:
This is not a problem when using a
- Page 293 and 294:
One practical problem is that our r
- Page 295 and 296:
Stating problems in a way that make
- Page 297 and 298:
The transmission kicks in and retri
- Page 299 and 300:
This is a good time to point out th
- Page 301 and 302:
if [regexp -indices .*/ $WholePath
- Page 303 and 304:
One overriding rule regardless of e
- Page 305 and 306:
Constructing a regex for a specific
- Page 307 and 308:
Toward arming you well, this chapte
- Page 309 and 310:
Let's start by looking at an exampl
- Page 311 and 312:
More Advanced—Localizing the Gree
- Page 313 and 314:
Figure 5-2: Effects of an added plu
- Page 315 and 316:
Well, you get the idea—there are
- Page 317 and 318:
On a local level, backtracking is r
- Page 319 and 320:
The remembered states accumulated w
- Page 321 and 322:
As a comparison, let's replace the
- Page 323 and 324:
Figure 5-4: Failing attempt to matc
- Page 325 and 326:
As we see with alternation (among o
- Page 327 and 328:
The differences lie in what is matc
- Page 329 and 330:
Example #3 involves much less overh
- Page 331 and 332:
The explanation lies in another rea
- Page 333 and 334:
One moral to learn from all this is
- Page 335 and 336:
GNU Emacs performs very few of the
- Page 337 and 338:
Somewhat related to a fixed-string
- Page 339 and 340:
Finally, how do we explain the resu
- Page 341 and 342:
If a regex is used in a situation w
- Page 343 and 344:
The compiled form can be used multi
- Page 345 and 346:
Page 160 keeps a cache of the inter
- Page 347 and 348:
Page 161 The doublequotes are for t
- Page 349 and 350:
Page 162 If the engine had the matc
- Page 351 and 352:
possible, but we can still identify
- Page 353 and 354:
First, the special and normal subex
- Page 355 and 356:
+ \{[^}]*\} Plugging this into the
- Page 357 and 358:
• ( *\$[0-9]+)* , to match space-
- Page 359 and 360:
If a subdomain is [a-z]+ and we wan
- Page 361 and 362:
• more speed The regex "flows" we
- Page 363 and 364:
Our pseudo comments, with /x and x/
- Page 365 and 366:
Let's try to fix these regexes. Wit
- Page 367 and 368:
For efficiency's sake, let's look a
- Page 369 and 370:
Other Quantifiers: Repetition Page
- Page 371 and 372:
The Freeflowing Regex Page 173 We j
- Page 373 and 374:
Consider: set COMMENT {/\*[^*]*\*+(
- Page 375 and 376:
Page 175 So rather than letting the
- Page 377 and 378:
These changes yield "($OTHER+|$DOUB
- Page 379 and 380:
Think! Figure 5-9: Well-guided opti
- Page 381 and 382:
Page 178 On the other hand, if the
- Page 383 and 384:
Page 179 the string where [cdsuw] c
- Page 385 and 386:
Page 180 When it comes down to it,
- Page 387 and 388:
Yes, even something as simple as gr
- Page 389 and 390:
The -i command-line option seems to
- Page 391 and 392:
Page 184 string when a regex matche
- Page 393 and 394:
With versions of awk that support h
- Page 395 and 396:
Some implementations have limitatio
- Page 397 and 398:
string = "awk" gsub(/(nothing)*/, "
- Page 399 and 400:
Tcl* uses Henry Spencer's NFA regex
- Page 401 and 402:
How Tcl parses a script is one of t
- Page 403 and 404:
Here, the -nocase is an option that
- Page 405 and 406:
Despite the documentation to the co
- Page 407 and 408:
Emacs has long used a Traditional N
- Page 409 and 410:
Table 6-6: GNU Emacs's String Metac
- Page 411 and 412:
Page 195 The syntax influences the
- Page 413 and 414:
Page 196 For example, during a POSI
- Page 415 and 416:
Page 197 The match-string and repla
- Page 417 and 418:
Code for Emacs function for Chapter
- Page 419 and 420:
7 Perl Regular Expressions In this
- Page 421 and 422:
Page 200 Perhaps more important tha
- Page 423 and 424:
The Perl Way Page 201 Table 7-1 sum
- Page 425 and 426:
The richness of variety and options
- Page 427 and 428:
In the same article, Larry also wro
- Page 429 and 430:
Perl's full toolbox offers many sol
- Page 431 and 432:
Page 206 These are easy enough to u
- Page 433 and 434:
Page 207 Well, almost all the field
- Page 435 and 436:
Page 208 Of course, being able to s
- Page 437 and 438:
Page 209 Those comments are actuall
- Page 439 and 440:
Regex-Related Perlisms Page 210 The
- Page 441 and 442:
Page 211 Sometimes, the type of an
- Page 443 and 444:
Dynamically scoped values Page 212
- Page 445 and 446:
Page 213 An automatic save and rest
- Page 447 and 448:
Page 214 You'd get the same effect
- Page 449 and 450:
Dynamic Scope Example # Process "th
- Page 451 and 452:
Page 216 The benefits of dynamic sc
- Page 453 and 454:
Page 217 Perlmaster.) This code cou
- Page 455 and 456:
$1, $2, $3, etc. Page 218 The text
- Page 457 and 458:
Using $1 within a regex? Page 219 T
- Page 459 and 460:
The doublequotes are really operato
- Page 461 and 462:
Page 221 Thus, you might consider ^
- Page 463 and 464:
Page 222 once when the script is fi
- Page 465 and 466:
Figure 7-1: Perl parsing, from prog
- Page 467 and 468:
the operand this is different from
- Page 469 and 470:
Perl's Regex Flavor Page 225 Now th
- Page 471 and 472:
Let's say you want to specially pro
- Page 473 and 474:
• The possibility of greater inte
- Page 475 and 476:
Matches a number if not followed by
- Page 477 and 478:
Using capturing parentheses within
- Page 479 and 480:
Page 231 (?#…) appeared in versio
- Page 481 and 482:
String Anchors Anchors are indispen
- Page 483 and 484:
• Whether you can work with data
- Page 485 and 486:
To allow greater flexibility, Perl5
- Page 487 and 488:
Combining both /s and /m has /m tak
- Page 489 and 490:
Keeping the match in synch with exp
- Page 491 and 492:
Page 238 bump-along-and-retry happe
- Page 493 and 494:
Page 239 These methods work, but fr
- Page 495 and 496:
Page 240 This example is simple, bu
- Page 497 and 498:
a \W character). For example, an $i
- Page 499 and 500:
If compiled with appropriate librar
- Page 501 and 502:
Perl's class sublanguage is unique
- Page 503 and 504:
Sorting is a practical use of this
- Page 505 and 506:
Second-class metacharacters As far
- Page 507 and 508:
Match-Operand Delimiters Page 247 I
- Page 509 and 510:
Page 248 This can be useful if you
- Page 511 and 512:
Match Modifiers Page 249 The match
- Page 513 and 514:
Page 250 Except when used with the
- Page 515 and 516:
Page 251 true or false value that i
- Page 517 and 518:
Match Operator Return Value Page 25
- Page 519 and 520:
List context, with the /g modifier
- Page 521 and 522:
Outside Influences on the Match Ope
- Page 523 and 524:
The Substitution Operator Page 255
- Page 525 and 526:
Answer to the question on page 254.
- Page 527 and 528:
The /e Modifier Page 257 Only the s
- Page 529 and 530:
eieio Page 258 Perhaps useful only
- Page 531 and 532:
conditional of an if), the return v
- Page 533 and 534:
Page 260 rather than m/…/ is used
- Page 535 and 536:
Advanced Split Page 261 Because spl
- Page 537 and 538:
Page 262 list are stripped; others
- Page 539 and 540:
Page 263 nothingness at the end of
- Page 541 and 542:
Any general scalar expression as th
- Page 543 and 544:
lems as the full regex becomes some
- Page 545 and 546:
choices. Page 266 • Interpolation
- Page 547 and 548:
and not very "interesting" in and o
- Page 549 and 550:
Regex Compilation, the /o Modifier,
- Page 551 and 552:
Now, let's consider the following s
- Page 553 and 554:
Page 270 Unfortunately, it's usuall
- Page 555 and 556:
Page 271 This example takes advanta
- Page 557 and 558:
# Create a function to check a bunc
- Page 559 and 560:
Page 273 It's a nice idea, but ther
- Page 561 and 562:
Answer to the question on page 273.
- Page 563 and 564:
Particularly with the /e modifier,
- Page 565 and 566:
Recall that the match operator retu
- Page 567 and 568:
Basic Split split is an operator th
- Page 569 and 570:
Perl must make a copy to support an
- Page 571 and 572:
On the other hand, in true worst-ca
- Page 573 and 574:
If you can be sure these variables
- Page 575 and 576:
The result of these two steps is a
- Page 577 and 578:
Page 280 By the way, I forgot to me
- Page 579 and 580:
Final words about the /i penalty Pa
- Page 581 and 582:
is then computed ('100C' in this ca
- Page 583 and 584:
Figure 7-2: Applying s//Tom/ to 'De
- Page 585 and 586:
Page 284 Any substitution using the
- Page 587 and 588:
Page 285 The biggest change is that
- Page 589 and 590:
Literal text cognizance Page 286 Ma
- Page 591 and 592:
stclass ':kind' plus Page 287 Indic
- Page 593 and 594:
Page 288 When you study a string, P
- Page 595 and 596:
When study can help Page 289 study
- Page 597 and 598:
Along with the CSV question, many o
- Page 599 and 600:
Adding Commas to a Number People of
- Page 601 and 602:
It's challenging to see how crisply
- Page 603 and 604:
With the same tests as Chapter 5's
- Page 605 and 606:
ut it's not so easy. While evaluati
- Page 607 and 608:
$username = '\w+'; $hostname = '\w+
- Page 609 and 610:
That last item, $atom, might need s
- Page 611 and 612:
You'll not find comment, item 22, e
- Page 613 and 614:
aperiod >; # Item 2: addr-spec is l
- Page 615 and 616:
Unlike all the other constructs so
- Page 617 and 618:
$esc . (?: com | edu | gov | … |
- Page 619 and 620:
$sep = qq< (?: [$space$tab + # for
- Page 621 and 622:
Back in $route, after the first $do
- Page 623 and 624:
^Subject|From|Date: (.*) is very di
- Page 625 and 626:
Another feature I've found an occas
- Page 627 and 628:
Page 217 The special variables $&,
- Page 629 and 630:
Perl4 Note #10 Page 247 With Perl4,
- Page 631 and 632:
Perl4 Note #10 Page 247 With Perl4,
- Page 633 and 634:
Page 255 Although Perl4's match ope
- Page 635 and 636:
Page 264 With Perl4, the default op
- Page 637 and 638:
O'Reilly & Associates Page 310 O'Re
- Page 639 and 640:
http://www.cs.umd.edu/users/dfs/jav
- Page 641 and 642:
http://www.python.org/ The Python L
- Page 643 and 644:
; (?: $quoted_pair | $Cnested ) # s
- Page 645 and 646:
# Item 9. sub-domain is a domain-re
- Page 647 and 648:
; )* $phrase_char * # more "normal"
- Page 649 and 650:
Enjoy!
- Page 651 and 652:
xx, 40, 42, 187, 197 (see also \t (
- Page 653 and 654:
introduction 42 equivalence to char
- Page 655 and 656:
i (see Perl, modifier: case insensi
- Page 657 and 658:
. (see dot) ? (see question mark) [
- Page 659 and 660:
analogy backtracking bread crumbs 1
- Page 661 and 662:
anchor (see line anchor; word ancho
- Page 663 and 664:
line anchors 82 mawk 90, 161, 185 o
- Page 665 and 666:
POSIX NFA example 147 pseudo-backtr
- Page 667 and 668:
ignoring differences (see case-inse
- Page 669 and 670:
character, encoding (cont'd) ISO-20
- Page 671 and 672:
CheckNaughtiness 279 checkpoint (se
- Page 673 and 674:
non-greedy vs. negated character cl
- Page 675 and 676:
delimiter (see regex delimiter) Del
- Page 677 and 678:
doublequoted string allowing escape
- Page 679 and 680:
efficiency (cont'd) egrep 7 substit
- Page 681 and 682:
email for MacOs 312 match-data 196-
- Page 683 and 684:
errata (for this book) 309 escape 2
- Page 685 and 686:
a*((ab)*|b*) 113 [a-zA-Z_][a-zA-Z_0
- Page 687 and 688:
'M.I.T.' 83 [.\n] 131 'NE14AD8' 83
- Page 689 and 690:
ExtUtils::Liblist module 278 ExtUti
- Page 691 and 692:
patterns (globs) 4 prepending to li
- Page 693 and 694:
freeflowing regex 173-174 free-form
- Page 695 and 696:
GNU (cont'd) egrep 120 and backrefe
- Page 697 and 698:
H in awk 68, 187 Haertel, Mike 90 H
- Page 699 and 700:
shortest branch first 39 vs. while
- Page 701 and 702:
Japanese character encoding 26 font
- Page 703 and 704:
ignoring case (see case-insensitive
- Page 705 and 706:
list context 210 literal text 5 mat
- Page 707 and 708:
Masuda, Keith xxiii match (see Perl
- Page 709 and 710:
neverending 144, 160, 162, 175, 296
- Page 711 and 712:
doublequoted string in Emacs 76 ema
- Page 713 and 714:
Perl, line mode) modifier (see Perl
- Page 715 and 716:
newgetopt.pl package 278 newline an
- Page 717 and 718:
nonregular sets 104 normal (see unr
- Page 719 and 720:
first-character discrimination 151,
- Page 721 and 722:
O'Reilly & Associates lex & yacc 8
- Page 723 and 724:
Pascal 32, 54, 120 format 35 matchi
- Page 725 and 726:
people (cont'd) Hietaniemi, Jarkko
- Page 727 and 728:
Spencer, Henry xxii-xxiii, 62, 120-
- Page 729 and 730:
s/…/…/ (see Perl, substitution)
- Page 731 and 732:
$0 218 $1 216-219, 257, 282-283 ?
- Page 733 and 734:
processing 221 variables appearing
- Page 735 and 736:
Perl (cont'd) line mode and /m 231-
- Page 737 and 738:
modifier: case insensitive (/i) 42,
- Page 739 and 740:
Text:: ParseWords module 207, 278 m
- Page 741 and 742:
-n 32 -p 47 -w 34, 213, 285 parenth
- Page 743 and 744:
s/…/…/ (see Perl, substitution)
- Page 745 and 746:
Perl, substitution (cont'd) other m
- Page 747 and 748:
acket expressions 79 C library supp
- Page 749 and 750:
Communications of the ACM 60 Compil
- Page 751 and 752:
Q group(1) 58, 70, 83, 96 gsub 58 h
- Page 753 and 754:
x{0,0} 84 minimal matching 83 multi
- Page 755 and 756:
questions (cont'd) slicing a 24-hou
- Page 757 and 758:
listing of programs and types 90 te
- Page 759 and 760:
Python in Tcl 75, 189, 193 named su
- Page 761 and 762:
efficiency 118 essence 102 and gree
- Page 763 and 764:
scalar context (see Perl, context)
- Page 765 and 766:
single-line mode (see Perl, modifie
- Page 767 and 768:
string (cont'd) in Emacs 69, 75 in
- Page 769 and 770:
chart of shorthands 73 flavor chart
- Page 771 and 772:
Test::Harness module 278 testlib mo
- Page 773 and 774:
unmatching (see backtracking) unrol
- Page 775 and 776:
while vs. foreach vs. if 256 whites
- Page 777 and 778:
About the Author Page 343 Jeffrey F
- Page 779 and 780:
Page 344 their prey in dim or dark