12.4 mailbox -- Read various mailbox formats

Python PEP

12.4 mailbox -- Read various mailbox formats

This module defines a number of classes that allow easy and uniform access to mail messages in a (Unix) mailbox.

Access to a classic Unix-style mailbox, where all messages are contained in a single file and separated by "From "(a.k.a. "From_") lines. The file object fp points to the mailbox file. The optional factory parameter is a callable that should create new message objects. factory is called with one argument, fp by the next() method of the mailbox object. The default is the rfc822.Message class (see the rfc822 module - and the note below).

Note: For reasons of this module's internal implementation, you will probably want to open the fp object in binary mode. This is especially important on Windows.

For maximum portability, messages in a Unix-style mailbox are separated by any line that begins exactly with the string 'From ' (note the trailing space) if preceded by exactly two newlines. Because of the wide-range of variations in practice, nothing else on the From_ line should be considered. However, the current implementation doesn't check for the leading two newlines. This is usually fine for most applications.

The UnixMailbox class implements a more strict version of From_ line checking, using a regular expression that usually correctly matched From_ delimiters. It considers delimiter line to be separated by "From name time" lines. For maximum portability, use the PortableUnixMailbox class instead. This class is identical to UnixMailbox except that individual messages are separated by only "From " lines.

For more information, see Configuring Netscape Mail on Unix: Why the Content-Length Format is Bad.

A less-strict version of UnixMailbox, which considers only the "From " at the beginning of the line separating messages. The ``name time'' portion of the From line is ignored, to protect against some variations that are observed in practice. This works since lines in the message which begin with 'From ' are quoted by mail handling software at delivery-time.

Access an MMDF-style mailbox, where all messages are contained in a single file and separated by lines consisting of 4 control-A characters. The file object fp points to the mailbox file. Optional factory is as with the UnixMailbox class.

Access an MH mailbox, a directory with each message in a separate file with a numeric name. The name of the mailbox directory is passed in dirname. factory is as with the UnixMailbox class.

Access a Qmail mail directory. All new and current mail for the mailbox specified by dirname is made available. factory is as with the UnixMailbox class.

Access a Babyl mailbox, which is similar to an MMDF mailbox. In Babyl format, each message has two sets of headers, the original headers and the visible headers. The original headers appear before a line containing only '*** EOOH ***' (End-Of-Original-Headers) and the visible headers appear after the EOOH line. Babyl-compliant mail readers will show you only the visible headers, and BabylMailbox objects will return messages containing only the visible headers. You'll have to do your own parsing of the mailbox file to get at the original headers. Mail messages start with the EOOH line and end with a line containing only '\037\014'. factory is as with the UnixMailbox class.

Note that because the rfc822 module is deprecated, it is recommended that you use the email package to create message objects from a mailbox. (The default can't be changed for backwards compatibility reasons.) The safest way to do this is with bit of code:

import email
import email.Errors
import mailbox

def msgfactory(fp):
    try:
        return email.message_from_file(fp)
    except email.Errors.MessageParseError:
        # Don't return None since that will
	# stop the mailbox iterator
	return ''

mbox = mailbox.UnixMailbox(fp, msgfactory)

The above wrapper is defensive against ill-formed MIME messages in the mailbox, but you have to be prepared to receive the empty string from the mailbox's next() method. On the other hand, if you know your mailbox contains only well-formed MIME messages, you can simplify this to:

import email
import mailbox

mbox = mailbox.UnixMailbox(fp, email.message_from_file)

See Also:

Description of the traditional ``mbox'' mailbox format.
Description of the ``maildir'' mailbox format.
A description of problems with relying on the Content-Length: header for messages stored in mailbox files.


See About this document... for information on suggesting changes.