11.05.2016 Views

Apache Solr Reference Guide Covering Apache Solr 6.0

21SiXmO

21SiXmO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The entity attributes unique to the MailEntityProcessor are shown below.<br />

Attribute<br />

processor<br />

user<br />

password<br />

host<br />

protocol<br />

fetchMailsSince<br />

folders<br />

recurse<br />

include<br />

exclude<br />

processAttachement<br />

Use<br />

Required. Must be set to "MailEntityProcessor".<br />

Required. Username for authenticating to the IMAP server; this is typically the email<br />

address of the mailbox owner.<br />

Required. Password for authenticating to the IMAP server.<br />

Required. The IMAP server to connect to.<br />

Required. The IMAP protocol to use, valid values are: imap, imaps, gimap, and gimaps.<br />

Optional. Date/time used to set a filter to import messages that occur after the specified<br />

date; expected format is: yyyy-MM-dd HH:mm:ss.<br />

Required. Comma-delimited list of folder names to pull messages from, such as<br />

"inbox".<br />

Optional (default is true). Flag to indicate if the processor should recurse all child<br />

folders when looking for messages to import.<br />

Optional. Comma-delimited list of folder patterns to include when processing folders<br />

(can be a literal value or regular expression).<br />

Optional. Comma-delimited list of folder patterns to exclude when processing folders<br />

(can be a literal value or regular expression); excluded folder patterns take precedence<br />

over include folder patterns.<br />

Optional (default is true). Use Tika to process message attachments.<br />

or<br />

processAttachments<br />

includeContent<br />

Optional (default is true). Include the message body when constructing <strong>Solr</strong> documents<br />

for indexing.<br />

Importing New Emails Only<br />

After running a full import, the MailEntityProcessor keeps track of the timestamp of the previous import so that<br />

subsequent imports can use the fetchMailsSince filter to only pull new messages from the mail server. This<br />

occurs automatically using the Data Import Handler dataimport.properties file (stored in conf). For instance, if you<br />

set fetchMailsSince=2014-08-22 00:00:00 in your mail-data-config.xml, then all mail messages that occur after<br />

this date will be imported on the first run of the importer. Subsequent imports will use the date of the previous<br />

import as the fetchMailsSince filter, so that only new emails since the last import are indexed each time.<br />

GMail Extensions<br />

When connecting to a GMail account, you can improve the efficiency of the MailEntityProcessor by setting the<br />

protocol to gimap or gimaps. This allows the processor to send the fetchMailsSince filter to the GMail server to<br />

have the date filter applied on the server, which means the processor only receives new messages from the<br />

server. However, GMail only supports date granularity, so the server-side filter may return previously seen<br />

messages if run more than once a day.<br />

The TikaEntityProcessor<br />

<strong>Apache</strong> <strong>Solr</strong> <strong>Reference</strong> <strong>Guide</strong> <strong>6.0</strong><br />

217

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!