Things to check or fix before the migration

Mailman 2 was more lax about headers and we found problems which can hinder the migration.

Wrong date format

We found posts with dates using GMT+00:00, which is not a proper timezone specification, but you can easily fix this error with the following one-liner:

sed -ri 's/\(GMT00:00\)/(GMT)/' /var/lib/mailman/archives/private/*.mbox/*.mbox

Missing Message-Id

Some messages may lack a Message-Id field entirely and this information is lost. Without this field it is impossible to import.

Hyperkitty ≥ 1.2 automatically fixes it but earlier versions need the following workaround.

With this Ruby script (and associated Gemfile) you can generate fake unique Message-Id field for posts lacking it. The association is kept in the new_message_ids.yml file so it is safe to run it multiple times as generated value will be stable (useful if you sync the list mbox regularly before the final switch to production). The mbox files are found in the usual /var/lib/mailman/archives/private/ directory.

Procedure to run the script:

bundle install
bundle exec ./mailman2_archive_fix.rb

Cleaning the previous search index

If you attempted an import previously then it is recommended to purge the previous indexes, as the index regeneration would just add data and it can take quite some space.

rm -rf /var/www/mailman/fulltext_index
mkdir /var/www/mailman/fulltext_index
chown mailman_webui: /var/www/mailman/fulltext_index
chmod 0755 /var/www/mailman/fulltext_index

Launching the import process

The easy way is the All-in-one-go method but if you need to rename the lists or if you just want to re-import (after deleting the archive or you’ll get duplicates) then you can manage it manually.

When it’s done, don’t forget the post-import step.

The All-in-one-go method

To loop on each mailing-list and simplify the process it is recommended to use a script made by Fedora folks and installed by the mailman3 role:

/var/www/mailman/bin/import-mm2.py -d <mail-domain> /var/lib/mailman/

If you need to skip some lists from being imported, you can provide a comma separated list using the --exclude option.

The manual method

For each list you will need to create it, import config and then import the archives.

Creating the list:

sudo -u mailman mailman3 create -d <list-name>@<mail-domain>

list-name is the name you wish to use from now on, but it does not have to match the pre-migration name (called old-list-name later on). Moreover mail-domain may also be different.

Importing the config:

sudo -u mailman mailman3 import21 <list-name>@<mail-domain> <mm2-var-lib-mailman>/lists/pki-<old-list-name>/config.pck

Importing the archive:

django-admin hyperkitty_import --pythonpath /var/www/mailman/config --settings settings -l <list-name>@<mail-domain> --no-sync-mailman --verbosity 2 --since 1970-01-01 <mm2-var-lib-mailman>/archives/private/pki-<old-list-name>.mbox/pki-<old-list-name>.mbox 2>&1 | tee mailman_migration__<list-name>.log

Post Import

Afterwards, the search index needs to be regenerated:

ionice -c3 django-admin mailman_sync --pythonpath /var/www/mailman/config --settings settings
ionice -c3 django-admin update_index --pythonpath /var/www/mailman/config --settings settings_admin

This can take many hours depending on the size of the imported data, but the installation can go to production without waiting for it to complete.

Solutions for migration problems

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x?? in position ??: ordinal not in range(??)

This is caused by badly encoded mail headers. Currently experience showed only SPAM produced such broken emails.

On Hyperkitty 1.1.5, it is possible to skip these emails and continue importing the rest of the mailbox using this patch.

DataError: invalid byte sequence for encoding “UTF8”:…

It is a variant of the previous problem but in this case the importer script skips the bad email despite the trace.

Nevertheless the previous patch is probably necessary as the import script is probably going to stop processing further lists.

RuntimeError: maximum recursion depth exceeded while calling a Python object

Hyperkitty links the posts of every threads to be able to navigate between them. If a thread is very long (>1000 posts), then the program will crash; we found this situation in archives of CI build notifications. It is possible to increase the maximum using this patch.