apache-solr-ref-guide-4.6.pdf

More documents

Recommendations

Info

Documents, Fields, and Schema Design This section discusses how Solr organizes its data into documents and fields, as well as how to work with the Solr schema file, schema.xml. It includes the following topics: Overview of Documents, Fields, and Schema Design: An introduction to the concepts covered in this section. Solr Field Types: Detailed information about field types in Solr, including the field types in the default Solr schema. Defining Fields: Describes how to define fields in Solr. Copying Fields: Describes how to populate fields with data copied from another field. Dynamic Fields: Information about using dynamic fields in order to catch and index fields that do not exactly conform to other field definitions in your schema. Schema API: Use curl commands to read various parts of a schema or create new fields and copyField rules. Other Schema Elements: Describes other important elements in the Solr schema: Unique Key, Default Search Field, and the Query Parser Operator. Putting the Pieces Together: A higher-level view of the Solr schema and how its elements work together. DocValues: Describes how to create a docValues index for faster lookups. Schemaless Mode: Automatically add previously unknown schema fields using value-based field type guessing. Overview of Documents, Fields, and Schema Design The fundamental premise of Solr is simple. You give it a lot of information, then later you can ask it questions and find the piece of information you want. The part where you feed in all the information is called indexing or updating. When you ask a question, it's called a query. One way to understand how Solr works is to think of a loose-leaf book of recipes. Every time you add a recipe to the book, you update the index at the back. You list each ingredient and the page number of the recipe you just added. Suppose you add one hundred recipes. Using the index, you can very quickly find all the recipes that use garbanzo beans, or artichokes, or coffee, as an ingredient. Using the index is much faster than looking through each recipe one by one. Imagine a book of one thousand recipes, or one million. Solr allows you to build an index with many different fields, or types of entries. The example above shows how to build an index with just one field, ingredients. You could have other fields in the index for the recipe's cooking style, like Asian, Cajun, or vegan, and you could have an index field for preparation times. Solr can answer questions like "What Cajun-style recipes that have blood oranges as an ingredient can be prepared in fewer than 30 minutes?" The schema is the place where you tell Solr how it should build indexes from input documents. How Solr Sees the World Solr's basic unit of information is a document, which is a set of data that describes something. A recipe document would contain the ingredients, the instructions, the preparation time, the cooking time, the tools needed, and so on. A document about a person, for example, might contain the person's name, biography, favorite color, and shoe size. A document about a book could contain the title, author, year of publication, number of pages, and so on. In the Solr universe, documents are composed of fields, which are more specific pieces of information. Shoe size could be a field. First name and last name could be fields. Fields can contain different kinds of data. A name field, for example, is text (character data). A shoe size field might be a floating point number so that it could contain values like 6 and 9.5. Obviously, the definition of fields is flexible (you could define a shoe size field as a text field rather than a floating point number, for example), but if you define your fields correctly, Solr will be able to interpret them correctly and your users will get better results when they perform a query. You can tell Solr about the kind of data a field contains by specifying its field type. The field type tells Solr how to interpret the field and how it can be queried. When you add a document, Solr takes the information in the document's fields and adds that information to an index. When you perform a query, Solr can quickly consult the index and return the matching documents. Field Analysis Field analysis tells Solr what to do with incoming data when building an index. A more accurate name for this process would be processing or even digestion, but the official name is analysis. Apache Solr Reference Guide 4.6 28
Consider, for example, a biography field in a person document. Every word of the biography must be indexed so that you can quickly find people whose lives have had anything to do with ketchup, or dragonflies, or cryptography. However, a biography will likely contains lots of words you don't care about and don't want clogging up your index—words like "the", "a", "to", and so forth. Furthermore, suppose the biography contains the word "Ketchup", capitalized at the beginning of a sentence. If a user makes a query for "ketchup", you want Solr to tell you about the person even though the biography contains the capitalized word. The solution to both these problems is field analysis. For the biography field, you can tell Solr how to break apart the biography into words. You can tell Solr that you want to make all the words lower case, and you can tell Solr to remove accents marks. Field analysis is an important part of a field type. Understanding Analyzers, Tokenizers, and Filters is a detailed description of field analysis. Solr Field Types The field type defines how Solr should interpret data in a field and how the field can be queried. There are many field types included with Solr by default, and they can also be defined locally. Topics covered in this section: Field Type Definitions and Properties Field Types Included with Solr Working with Currencies and Exchange Rates Working with Dates Working with Enum Fields Working with External Files and Processes Field Properties by Use Case Related Topics SchemaXML-DataTypes FieldType Javadoc Field Type Definitions and Properties A field type includes four types of information: The name of the field type An implementation class name If the field type is TextField, a description of the field analysis for the field type Field attributes Field Type Definitions in schema.xml Field types are defined in schema.xml, with the types element. Each field type is defined between fieldType elements. Here is an example of a field type definition for a type called text_general: Apache Solr Reference Guide 4.6 29
Page 1 and 2: Apache Solr Reference Guide Coverin
Page 3 and 4: About This Guide This guide describ
Page 5 and 6: 3. 4. 5. it must be named solr.war.
Page 7 and 8: is exactly the same as building any
Page 9 and 10: a request further constraining the
Page 11 and 12: Best of all, this talk about high-v
Page 13 and 14: Using the Solr Administration User
Page 15 and 16: The Main Logging Screen While this
Page 17 and 18: The final option is "Dump", which a
Page 19 and 20: Inspecting a thread You can also ch
Page 21 and 22: Dataimport Screen The Dataimport sc
Page 23 and 24: solrpingquery all The Ping
Page 25 and 26: hl Click this button to enable high
Page 27: The screen provides a great deal of
Page 31 and 32: Lucene index back-compatibility is
Page 33 and 34: Point queries Range queries Functio
Page 35 and 36: NOW The NOW parameter is used inter
Page 37 and 38: The keyField attribute defines the
Page 39 and 40: termVectors termPositions termOffse
Page 41 and 42: P L H2 7 Beginning with Solr 4, si
Page 43 and 44: Output The samples below have been
Page 45 and 46: Example output in XML: 0 5 exa
Page 47 and 48: { ... } "fields": [ { "indexed": tr
Page 49 and 50: { ... } "dynamicFields": [ { "index
Page 51 and 52: Examples Input Get a list of all fi
Page 53 and 54: List Copy Fields GET / collection/s
Page 55 and 56: Output { "responseHeader":{ "status
Page 57 and 58: Input Get the similarity implementa
Page 59 and 60: curl http://localhost:8983/solr/col
Page 61 and 62: collection The collection (or core)
Page 63 and 64: Finally, for faceting, use the prim
Page 65 and 66: curl "http://localhost:8983/solr/up
Page 67 and 68: Understanding Analyzers, Tokenizers
Page 69 and 70: which the token occurs in the field
Page 71 and 72: The "@" character is among the set
Page 73 and 74: With an n-gram size range of 4 to 5
Page 75 and 76: Arguments: pattern: (Required) The
Page 77 and 78: The following sections describ
Page 79 and 80:
Common Grams Filter This filter cre
Page 81 and 82:
Example: In: "jump jumping jump
Page 83 and 84:
only lowercase words. The default i
Page 85 and 86:
Arguments: minGramSize: (integer, d
Page 87 and 88:
In: "cat foo1234 9987 blah1234fo
Page 89 and 90:
Factory class: solr.RemoveDuplicate
Page 91 and 92:
language: (default "English") The n
Page 93 and 94:
For the following examples, assume
Page 95 and 96:
Word Delimiter Filter This filter s
Page 97 and 98:
This filter creates org.apache.luce
Page 99 and 100:
A sample stemdict.txt with comments
Page 101 and 102:
Another example: ... ... ... Th
Page 103 and 104:
... ... ASCII Folding Filter This
Page 105 and 106:
Catalan Solr can stem Catalan using
Page 107 and 108:
Factory class: solr.SnowballPorterF
Page 109 and 110:
Out: "le", "chat", "le", "chat" Gal
Page 111 and 112:
Factory class: solr.ItalianStemFilt
Page 113 and 114:
Tokenizer to Filter: "tirgiem", "ti
Page 115 and 116:
In: ""studenta studenci" Tokenizer
Page 117 and 118:
Factory class: solr.SwedishStemFilt
Page 119 and 120:
Empty Analysis screen We want to te
Page 121 and 122:
Apache Solr Reference Guide 4.6 121
Page 123 and 124:
Many of the instructions and exampl
Page 125 and 126:
Topics covered in this section: Upd
Page 127 and 128:
The rollback command rolls back all
Page 129 and 130:
cd example/exampledocs curl 'http:/
Page 131 and 132:
In addition the default configurati
Page 133 and 134:
Start the Solr example server: cd e
Page 135 and 136:
last_modified ignored_ /my/path/
Page 137 and 138:
curl "http://localhost:8983/solr/up
Page 139 and 140:
You can have multiple DIH configura
Page 141 and 142:
Datasources can still be specified
Page 143 and 144:
The FieldReaderDataSource can take
Page 145 and 146:
Example:
Page 147 and 148:
The example below shows the combina
Page 149 and 150:
Attribute clob sourceColName Descri
Page 151 and 152:
... /> In this example, rege
Page 153 and 154:
The first is atomic updates. This a
Page 155 and 156:
true id false name,features,cat so
Page 157 and 158:
langid.map Boolean false no Enables
Page 159 and 160:
org/apache/uima/desc/OverridingPara
Page 161 and 162:
Searching This section describes ho
Page 163 and 164:
Velocity Search UI Solr includes a
Page 165 and 166:
Through faceting, query filters, an
Page 167 and 168:
The default value is 10. That is, b
Page 169 and 170:
from that query and used to filter
Page 171 and 172:
Solr supports a variety of term mod
Page 173 and 174:
OR || Requires that either term (or
Page 175 and 176:
Local Parameters in Queries Other P
Page 177 and 178:
Once the list of matching documents
Page 179 and 180:
A Boolean parameter indicating if l
Page 181 and 182:
If the 'magic field' name _val_ is
Page 183 and 184:
1 Solr adds block join support par
Page 185 and 186:
phrase query if appropriate. The pa
Page 187 and 188:
{!prefix f=myfield}foo This would b
Page 189 and 190:
, for the field. Example: {!term f=
Page 191 and 192:
hl.regex.maxAnalyzedChars 10000 Ins
Page 193 and 194:
10 .,!?\t\n Related Content High
Page 195 and 196:
final approach is to use it as a re
Page 197 and 198:
To use facet queries in a syntax ot
Page 199 and 200:
The facet.enum.cache.minDf Paramete
Page 201 and 202:
the smallest possible upper bound g
Page 203 and 204:
when faceting on the same field mul
Page 205 and 206:
group.offset integer Specifies an i
Page 207 and 208:
{ "responseHeader":{ "status":0, "Q
Page 209 and 210:
Result Clustering The clustering (o
Page 211 and 212:
TWINX2048-3200PRO VS1GB400C3 VDBDB1
Page 213 and 214:
true true id doctitle content 10
Page 215 and 216:
Providing Defaults The default attr
Page 217 and 218:
meaning that the terms are always u
Page 219 and 220:
spellcheck Turns on or off SpellChe
Page 221 and 222:
had been specified. The spellcheck.
Page 223 and 224:
lookupImpl Lookup implementation. C
Page 225 and 226:
Copying Fields Function Queries Fun
Page 227 and 228:
if Enables conditional function que
Page 229 and 230:
scale sqedist sqrt strdist Scales v
Page 231 and 232:
With Solr 4, there are two field ty
Page 233 and 234:
RPT incorporates the basic features
Page 235 and 236:
terms.regex No null Restricts match
Page 237 and 238:
{ } "terms": { "name": [ "ata", 1,
Page 239 and 240:
The Stats Component accepts the fol
Page 241 and 242:
0.0 2199.0 5251.2699999999995 15
Page 243 and 244:
In this example, the query "AAA" wo
Page 245 and 246:
JSON Response Writer A very commonl
Page 247 and 248:
id,cat,name,popularity,price,score
Page 249 and 250:
curl http://localhost:8983/solr/upd
Page 251 and 252:
The Well-Configured Solr Instance T
Page 253 and 254:
${solr.core.name} All implicit p
Page 255 and 256:
Once Solr is restarted, the existin
Page 257 and 258:
When using Solr in for Near Real Ti
Page 259 and 260:
openSearcher Whether to open a new
Page 261 and 262:
Both LRUCache and FastLRUCache use
Page 263 and 264:
false maxWarmingSearchers This para
Page 265 and 266:
etagSeed This value of this attribu
Page 267 and 268:
There are other request handlers de
Page 269 and 270:
may not be defined under an existin
Page 271 and 272:
${socketTimeout:0} ${connTimeout:0
Page 273 and 274:
distribUpdateConnTimeout distribUpd
Page 275 and 276:
In Solr 4.4 and on, the presence of
Page 277 and 278:
http://localhost:8983/solr/admin/co
Page 279 and 280:
split.key The key to be used for sp
Page 281 and 282:
Managing Solr This section describe
Page 283 and 284:
Change the Solr Listening Port Port
Page 285 and 286:
Begin by creating a file jetty/logg
Page 287 and 288:
http://docs.oracle.com/javase/1.5.0
Page 289 and 290:
SolrCloud Apache Solr includes the
Page 291 and 292:
Because this node isn't the oversee
Page 293 and 294:
In this way, the data is available
Page 295 and 296:
In this section, we'll discuss gene
Page 297 and 298:
Node C will automatically become a
Page 299 and 300:
proceed. If the a replica is too fa
Page 301 and 302:
Pointing Solr at the ZooKeeper inst
Page 303 and 304:
It's important to keep these files
Page 305 and 306:
0 3764 0 3450 newCollection_
Page 307 and 308:
Output Output Content The output wi
Page 309 and 310:
Create a Shard Shards can only cre
Page 311 and 312:
Input Create an alias named "testal
Page 313 and 314:
eplica string Yes Replica Examples
Page 315 and 316:
java -classpath example/solr-webapp
Page 317 and 318:
many collections in the same set of
Page 319 and 320:
If single queries are currently fas
Page 321 and 322:
1333 1.0 686 342 1.0 602 The f
Page 323 and 324:
A Solr index can be replicated acro
Page 325 and 326:
The code below shows how to configu
Page 327 and 328:
To replicate configuration files, l
Page 329 and 330:
syncd daemon Efficiently synchroniz
Page 331 and 332:
the shards in your configuration. T
Page 333 and 334:
Client APIs This section discusses
Page 335 and 336:
Flare Rails http://wiki.apache.org/
Page 337 and 338:
Using the ConcurrentUpdateSolrServe
Page 339 and 340:
Further Assistance There is a very
Page 341 and 342:
F Facet The arrangement of search r
Page 343 and 344:
Synonyms generally are terms which
Page 345 and 346:
your indices. In a master/slave con
Page 347:
Errata Errata For This Documentatio
show all

apache-solr-ref-guide-4.6.pdf

Create successful ePaper yourself

Delete template?

Save as template?