- Page 1: php|architect’s Guide to Web Scra
- Page 7 and 8: vi ” CONTENTS Referring URLs . .
- Page 9 and 10: viii ” CONTENTS HTTP Authenticati
- Page 11: x ” CONTENTS Chapter 14 — PCRE
- Page 15 and 16: xiv ” CONTENTS pleted. Each had a
- Page 18 and 19: For ewor d W eb scraping is the fut
- Page 21 and 22: Chapter 1 Introduction If you are l
- Page 23 and 24: Introduction ” 3 in some instance
- Page 25: Introduction ” 5 • Chapters 3-7
- Page 28 and 29: 8 ” HTTP R equests The HTTP proto
- Page 30 and 31: 10 ” HTTP http://en.wikipedia.org
- Page 32 and 33: 12 ” HTTP i Query String Limits M
- Page 34 and 35: 14 ” HTTP Server: Apache X-Powere
- Page 36 and 37: 16 ” HTTP set, it will persist fo
- Page 38 and 39: 18 ” HTTP Content Caching Two met
- Page 40 and 41: 20 ” HTTP as 0-499. To specify fr
- Page 42 and 43: 22 ” HTTP • Initialize a reques
- Page 44: 24 ” HTTP W rap-U p At this point
- Page 49 and 50: HTTP Streams W rapper ” 29 Let
- Page 51 and 52: HTTP Streams W rapper ” 31 Error
- Page 53:
HTTP Streams W rapper ” 33 ); ?>
- Page 56 and 57:
36 ” cURL Extension Simple R eque
- Page 58 and 59:
38 ” cURL Extension Setting M ult
- Page 60 and 61:
40 ” cURL Extension • CURLOPT_R
- Page 62 and 63:
42 ” cURL Extension containing th
- Page 64 and 65:
44 ” cURL Extension operate unpre
- Page 66:
46 ” cURL Extension • The sessi
- Page 70 and 71:
50 ” pecl_http PECL Extension bal
- Page 72 and 73:
52 ” pecl_http PECL Extension •
- Page 74 and 75:
54 ” pecl_http PECL Extension Deb
- Page 76 and 77:
56 ” pecl_http PECL Extension ass
- Page 78 and 79:
58 ” pecl_http PECL Extension );
- Page 81 and 82:
Chapter 6 P EAR::HTTP_Client The PH
- Page 83 and 84:
PEAR::HTTP_Client ” 63 • sendRe
- Page 85 and 86:
PEAR::HTTP_Client ” 65 • By def
- Page 87 and 88:
PEAR::HTTP_Client ” 67 } ?> $url
- Page 89:
PEAR::HTTP_Client ” 69 • http:/
- Page 92 and 93:
72 ” Zend_Http_Client // Another
- Page 94 and 95:
74 ” Zend_Http_Client Configurat
- Page 96 and 97:
76 ” Zend_Http_Client getLastResp
- Page 98:
78 ” Zend_Http_Client HTTP A uthe
- Page 102 and 103:
82 ” Rolling Y o u Own r $stream
- Page 104 and 105:
84 ” Rolling Y o u Own r Logic to
- Page 106:
86 ” Rolling Y o u Own r See RFC
- Page 110 and 111:
90 ” T i d y Extension direct inp
- Page 112 and 113:
92 ” T i d y Extension public fun
- Page 114 and 115:
94 ” T i d y Extension There are
- Page 116:
96 ” T i d y Extension Output Obt
- Page 120 and 121:
100 ” DOM Extension T y p of P e
- Page 122 and 123:
102 ” DOM Extension ties include
- Page 124 and 125:
104 ” DOM Extension // A slightly
- Page 126 and 127:
106 ” DOM Extension // Also retur
- Page 128 and 129:
108 ” DOM Extension • //@id add
- Page 130:
110 ” DOM Extension • DOM Level
- Page 134 and 135:
114 ” SimpleXML Extension The co
- Page 136 and 137:
116 ” SimpleXML Extension foreach
- Page 138:
118 ” SimpleXML Extension W r a
- Page 142 and 143:
122 ” XMLReader Extension Loading
- Page 144 and 145:
124 ” XMLReader Extension false o
- Page 146 and 147:
126 ” XMLReader Extension cate to
- Page 149 and 150:
Chapter 13 CSS Selector Libraries T
- Page 151 and 152:
CSS Selector Libraries ” 131 Abou
- Page 153 and 154:
CSS Selector Libraries ” 133 •
- Page 155 and 156:
CSS Selector Libraries ” 135 •
- Page 157 and 158:
CSS Selector Libraries ” 137 •
- Page 159 and 160:
CSS Selector Libraries ” 139 It
- Page 163 and 164:
Chapter 14 PCRE Extension There are
- Page 165 and 166:
PCRE Extension ” 145 Anchors Y o
- Page 167 and 168:
PCRE Extension ” 147 // Matches
- Page 169 and 170:
PCRE Extension ” 149 if (preg_mat
- Page 171 and 172:
PCRE Extension ” 151 The first wa
- Page 173 and 174:
PCRE Extension ” 153 • T ouse a
- Page 177 and 178:
T i p sand T r i c k s Chapter 15 C
- Page 179 and 180:
T i p s and T r i c ” k 159 s not
- Page 181 and 182:
T i p s and T r i c ” k 161 s W e
- Page 185 and 186:
A p p e n d i x A Legality of W e S
- Page 187:
Legality of W e b Scraping ” 167
- Page 190 and 191:
170 ” M u l t i p r o c e s s i n