How do I enable debug logging?

Set the following in piler.conf, then restart all piler daemons

verbosity=5

After upgrading from openssl 1.1 to 3.x reindex crashes

https://bitbucket.org/jsuto/piler/issues/1284/reindexer-has-problems-in-digest_string

What timezone should I use?

I suggest to use UTC.

Archive size mismatch

Piler stores emails and attachments as distinct files in the filesystem. The filesystem type (eg. ext4, xfs, …) and its settings affects the used disk space. There's a difference between the apparent size and the size occupied on disk. The latter might be significantly larger especially with bigger block sizes when you have mostly small messages.

If you know that's the case for you in advance, then try using xfs with blocksize of 1024.

See more at https://bitbucket.org/jsuto/piler/issues/961/archive-size-mismatch

What performance can piler achieve?

The available performance depends on the HW capacities you have, eg. number of CPU cores, memory, disk IO, network speed, as well as application settings (especially mysql innodb parameters, see below) etc.

I've done the following performance testing on various configurations:

I was running the following command to test:

smtp-source -c -s 20 -m 10000 -l 100000 127.0.0.1:25

It sent 10,000 of 100 kB messages to piler to the localhost using 20 concurrent connections. The aim of this was to eliminate any network issue. The messages were generated by smtp-source (part of the postfix package). Note that they were artificial emails without any attachment to prevent any external helper utility to introduce a performance penalty.

I've used Scaleway to host the various instances and run Ubuntu Xenial 64-bit.

A) VC1S instance (2 X86 64bit Cores, 2GB memory), 3 piler child processes: #1: 4:11 = 2340 msg/s #2: 4:05 = 2400 msg/s #3: 4:03 = 2460 msg/s

B) VC1L instance (6 X86 64bit Cores, 8GB memory), 7 piler child processes: #1: 2:19 = 4260 msg/s #2: 2:08 = 4680 msg/s #3: 2:01 = 4920 msg/s

TODO: Add some explanation, etc.

How to change the retention for all emails?

Connect to the piler mysql database, then run the following SQL statement to set the retained column to 180 days:

update metadata set retained = sent + 86400*180;

pilerimport: IMAP username cannot contain spaces

Use double quotes inside of single quotes, eg.

pilerimport -u '“User Name”' ...

See https://bitbucket.org/jsuto/piler/issues/952/pilerimport-imap-username-cannot-contain for more.

What are the recommended mysql settings?

Several key information, data are stored in mysql, so you have to make sure mysql has all the resources it needs to perform properly. Quite often the default settings are not enough beyond a certain archive size, because you need bigger buffers, etc.

As a guideline, consider my settings for a not that large archive. Your mileage may wary, so be sure to check out the mysql documentation for innodb!

innodb_buffer_pool_size = 256M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=64M
innodb_log_file_size=64M
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_log_files_in_group=2

innodb_file_per_table

How to backup and restore the archive?

You have two options:

#1: every day export yesterday emails, eg.

cd /opt/backup
pilerexport -a 2017.09.18 -b 2017.09.18
tar cfz /path/to/backup-2017-09-18.tar.gz *
rm *

The restore procedure in this case is to deploy piler from scratch, then import all the saved emails.

#2: backup the piler, sphinx data, and the configs

Backup regularly the following:

  • /usr/local/etc/piler directory (piler and sphinx config files, including piler.key file)
  • /var/piler/store/00 directory contents (note that only the last directory in '00' changes)
  • /var/piler/sphinx/main[1-4]* files
  • piler mysql database
  • nginx/apache configs

The restore procedure is to deploy piler, then restore the previously saved files and directories mentioned above.

mysql_stmt_execute error: *Incorrect string value: '\xF0\x9D\x91\x83\xF0\x9D…' for column 'body' at row 1* (errno: 1366)

Mysql supports utf-8 character set. However, mysql does NOT support the complete utf-8 code table, only a subset of it. If the email has a character or symbol in the unsupported region (think of the 4 byte values), then it can't store such a value if you are using mysql 5.7 or newer (or mariadb 10.2 or newer).

The solution is either to use an older mysql version (eg. 5.5 or mariadb 10.1) which is more forgiving with 'invalid' (=read valid, but not supported by mysql version's of utf-8) utf-8 character sequences. Or switch to utf8mb4 encoding which supports the whole range of the utf-8 characters.

See https://mathiasbynens.be/notes/mysql-utf8mb4 for a detailed guide on the problem, and the solution. The generic steps are (before executing any of these, be sure to read the mentioned article!):

# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Note that piler 1.3.x uses utf8mb4 by default, so this issue is fixed.

Also, you may check out https://bitbucket.org/jsuto/piler/issues/1088/provide-instructions-on-how-to-migrate for a user contributed procedure.

I can see mysql_stmt_execute error: *Out of range value for column 'retained' at row 1 errors in the mail log

See #813 fix on Bitbucket.

My older emails are gone!

Follow these instructions to figure out if you really lost your emails. Chances are that you haven't!

  • check the metadata table, and verify that your emails are there
  • find some of your “lost” emails under the /var/piler/store/00 directory. The files are named based on the their metadata.piler_id value. If you can find them, then your emails are in the archive.
  • double check with the pilerget utility, ie. if eg. pilerget 4000000057950fd127ea028c005a30f2f951 returns the email (don't forget to use your *actual* piler_id!), then the email is in the archive
  • check the sphinx database (not a mysql database, sphinx has its own database which can be accessed with mysql -h 127.0.0.1 -P 9306) if the older IDs (the same as in metadata.id) are present: select * from main1,dailydelta1,delta1 where id=123; (also use the actual id!)

Why am I getting a “cannot write current directory!” message?

Most of the piler binaries are setuid/setgid to piler:piler in order to read/write the store directory. It means that these commands run as user/group piler. The above error message is pretty self explaining: the piler user can't write the current directory. Solution: cd /tmp or cd /var/piler/imap, ie. cd to a directory where piler has rwx permissions.

What's the 'master branch'?

The master branch is the developer version of piler, kinda the nightly build. This contains all the latest (and probably unstable) code. It doesn't mean that the master branch is a buggy crap, however it does mean that the code is not in its final shape. Just with any other software, unless you are told otherwise, please use a stable version.

What exclusion (formerly: archive) and retention rule do I need to archive every email, and keep them for 5 years?

By default piler archives everything it receives, so you don't need any rules for that. In fact, the exclusion (formerly: archiving) rules define what NOT to archive. Also the default_retention_days value in piler.conf applies to all emails, unless it's overridden in the retention rules. So in this just set default_retention_days=1827, restart piler, and you are fine.

I can see lots of duplicates in the archive

The duplicate detection is based on the Message-ID header field. If the piler daemon detects an already known message-id, then it discards the email. As you can see, the Message-ID header is crucial. The piler daemon may archive a single message with a missing message-id field, and discards the subsequent messages having no message-id.

The pilerimport utility, however, accepts messages even without message-id, because it assumes you really want to archive those messages. So be sure to remove any duplicates before importing them with pilerimport, otherwise you'll have duplicates in the archive.

The only negative side effect is the extra resources required, eg. disk space, memory, etc.

Some internal email sending applications don't know how to create a valid email header, and neglect the message-id. They should be fixed ASAP!

I don't have a /var partition, how to make the disk space projection work?

Set the following in config-site.php:

$config['DATA_PARTITION'] = '/';

I've just sent a few test emails, however no search results in the gui

The piler daemon processes email as soon as they are received. However emails are indexed in every 30 mins. So an email is visible in the GUI 30 mins. You may adjust the indexing frequency by editing the cron jobs of piler.

Note that piler supports manticore as a drop-in replacement of sphinx. Also, both manticore and sphinx supports real-time (RT) index data. In that case you don't need the indexer jobs in crontab, and the archived emails are visible in the gui immediately.

I can't see any results, however the sphinx query in the maillog reports 0 hits, and >0 total found:

Aug 11 20:21:36 mailpiler piler-webui[10670]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE MATCH('@(subject,body) whatever I need') ORDER BY `sent` DESC LIMIT 0,1000 OPTION max_matches=1000' in 0.29 s, 0 hits, 25508 total found

You use an outdated and not supported version of sphinx, typically 2.0.x on older Debian or Ubuntu releases. Make sure you have a recent 3.x version, or even manticore search 6.x. You may download it from http://sphinxsearch.com/downloads/release/.

I want to see more hits!

Edit config-site.php, and set MAX_SEARCH_HITS, eg.:

$config['MAX_SEARCH_HITS'] = 2000;

Note that you should keep this number to a sane value, the bigger it is the more resources you need for both searchd and php.

I forgot the admin@local password

Connect to the piler database, and set the crypted password manually, eg:

mysql> update user set password='$5$TXL7EX$s17XtxwbCs1MDAzuulF/STauTkH0h/KJGHudlNQt3R4' where uid=0;

The above crypted password sets the admin@local password to pilerrocks

How can I delete everything from the archive?

Follow the procedure below to do so, and it will reset the archive as if it were freshly installed.

#1: Stop searchd and piler:

/etc/init.d/rc.searchd stop
/etc/init.d/rc.piler stop

#2: Remove stored emails:

rm -rf /var/piler/store/00/*

#3: Drop and recreate the piler database:

mysql> drop database piler;
mysql> create database piler character set utf8;

mysql -u piler -p piler < /path/to/piler-source-directory/util/db-mysql.sql

#4: Reset the sphinx index data

rm -rf /var/piler/sphinx/*
su - piler
indexer --all

#5: Start searchd and piler:

/etc/init.d/rc.searchd start
/etc/init.d/rc.piler start 

Qmail (using as a smarthost) rejects emails from piler saying: 451 See http://pobox.com/~djb/docs/smtplf.html

Set the following in config-site.php:

$config['EOL'] = "\r\n";

I can see only today's emails in the archive and not any single previous emails.

Some Linux distributions (notably Debian and Ubuntu) have a daily cron job to reindex everything. Unfortunately this ruins the sphinx index files piler relies on. However the older emails are not lost you still have them, they are just disappeared from the sphinx index. To bring them back, perform the following steps.

1. Edit /etc/default/sphinxsearch, and set START="no".

I recommend you to use the piler shipped init.d/rc.searchd script to start searchd. You may call it from /etc/rc.local. (Note that it starts it as user piler, so make sure /var/piler/sphinx has proper ownership.) Or you may use the systemd services for piler, see the systemd dir in the piler source code.

2. Reindex old emails. After that older emails should appear after the next indexing is done.

cd /tmp
reindex -a 

How to add support for CJK (Chinese, Japanese, and Korean) languages?

Fix the sphinx indexer settings, ie. add the following two lines to all “index” sections in sphinx.conf:

ngram_len = 1
ngram_chars = U+3000..U+2FA1F
Finally chdir to /tmp, the reindex everything as the piler user.

cd /tmp
su piler
reindex -a

I get a lot of “Format a4 is redefined”, and other messages when importing.

These warning or errors come from the external helper utilities, eg. pdftotext.

They are usually harmless, since piler is interested in only the textual part. However in case of a worst case scenario the given attachment is simply not indexed, and thus is not searchable, but the email is archived, and can be retrieved without any problem.

Outlook 2007 can't open a downloaded EML file.

Check out http://www.msoutlook.info/question/354 for a registry hack.

You get the following error: “Error: SQLSTATE[HY000] [2003] Can't connect to MySQL server on '127.0.0.1' (111) on database: sphinx”

Start searchd

I have Centos / Redhat, and when I click on the “Apply” button in the gui, it says: “add the following to /etc/sudoers: 'www-data all=NOPASSWD: /etc/init.d/rc.piler reload'“

Add the following to /etc/sudoers:

apache ALL=NOPASSWD: /etc/init.d/rc.piler reload
Defaults:%apache !requiretty

I get the following warning in mysql logs:

[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. REPLACE# SELECT is unsafe because the order in which rows are retrieved by the SELECT determines which (if any) rows are replaced. This order cannot be predicted and may differ on master and the slave. Statement: REPLACE INTO sph_counter (counter_id, max_doc_id) SELECT 1, MAX(id) FROM sph_index

Try setting MIXED format for binlog_format in the my.cnf file

binlog_format = MIXED

mysql -u piler -p piler < ./util/db-mysql.sql gives me the following error: ERROR 1071 (42000) at line 109: Specified key was too long; max key length is 1000 bytes”

Make sure you have enabled the InnoDB storage engine, and try to create innodb tables (not myisam!).

I have lots of email addresses (100+) in the search query, and the query in the mail log is truncated, not ending with LIMIT xx,yy

Increase the thread_stack variable, and restart the searchd daemon. Be sure to read https://sphinxsearch.com/docs/latest/conf-thread-stack.html!

After importing EML files extracted from a PST file, some emails have “From: 'Some Name' ” (or the To: field is “redacted”) in the email header.

It's not a piler bug, the problematic emails have bogus From and/or To headers. The solution is to fix them BEFORE importing. If you realise it only after importing, then purge the archive (if you can afford this) (see the FAQ item above on how), fix the emails, and import them again.

How much memory sphinx requires

Connect to sphinx, then run the following statements:

show index main1 status like 'ram_bytes';
show index dailydelta1 status like 'ram_bytes';
show index delta1 status like 'ram_bytes';

See the explantion at https://sphinxsearch.com/blog/2011/11/11/sphinx-memory-consumption/.