12.4 mailbox -- Read various mailbox formats
This module defines a number of classes that allow easy and uniform access to mail messages in a (Unix) mailbox.
-
Access to a classic Unix-style mailbox, where all messages are
contained in a single file and separated by "From "(a.k.a. "From_") lines. The file object fp points to the
mailbox file. The optional factory parameter is a callable that
should create new message objects. factory is called with one
argument, fp by the next() method of the mailbox
object. The default is the rfc822.Message class (see the
rfc822 module - and the note below).
Note: For reasons of this module's internal implementation, you will probably want to open the fp object in binary mode. This is especially important on Windows.
For maximum portability, messages in a Unix-style mailbox are separated by any line that begins exactly with the string
'From '
(note the trailing space) if preceded by exactly two newlines. Because of the wide-range of variations in practice, nothing else on the From_ line should be considered. However, the current implementation doesn't check for the leading two newlines. This is usually fine for most applications.The UnixMailbox class implements a more strict version of From_ line checking, using a regular expression that usually correctly matched From_ delimiters. It considers delimiter line to be separated by "From name time" lines. For maximum portability, use the PortableUnixMailbox class instead. This class is identical to UnixMailbox except that individual messages are separated by only "From " lines.
For more information, see Configuring Netscape Mail on Unix: Why the Content-Length Format is Bad.
-
A less-strict version of UnixMailbox, which considers only the
"From " at the beginning of the line separating messages. The
``name time'' portion of the From line is ignored, to
protect against some variations that are observed in practice. This
works since lines in the message which begin with
'From '
are quoted by mail handling software at delivery-time.
- Access an MMDF-style mailbox, where all messages are contained in a single file and separated by lines consisting of 4 control-A characters. The file object fp points to the mailbox file. Optional factory is as with the UnixMailbox class.
- Access an MH mailbox, a directory with each message in a separate file with a numeric name. The name of the mailbox directory is passed in dirname. factory is as with the UnixMailbox class.
- Access a Qmail mail directory. All new and current mail for the mailbox specified by dirname is made available. factory is as with the UnixMailbox class.
-
Access a Babyl mailbox, which is similar to an MMDF mailbox. In
Babyl format, each message has two sets of headers, the
original headers and the visible headers. The original
headers appear before a line containing only
'*** EOOH ***'
(End-Of-Original-Headers) and the visible headers appear after theEOOH
line. Babyl-compliant mail readers will show you only the visible headers, and BabylMailbox objects will return messages containing only the visible headers. You'll have to do your own parsing of the mailbox file to get at the original headers. Mail messages start with the EOOH line and end with a line containing only'\037\014'
. factory is as with the UnixMailbox class.
Note that because the rfc822 module is deprecated, it is recommended that you use the email package to create message objects from a mailbox. (The default can't be changed for backwards compatibility reasons.) The safest way to do this is with bit of code:
import email import email.Errors import mailbox def msgfactory(fp): try: return email.message_from_file(fp) except email.Errors.MessageParseError: # Don't return None since that will # stop the mailbox iterator return '' mbox = mailbox.UnixMailbox(fp, msgfactory)
The above wrapper is defensive against ill-formed MIME messages in the mailbox, but you have to be prepared to receive the empty string from the mailbox's next() method. On the other hand, if you know your mailbox contains only well-formed MIME messages, you can simplify this to:
import email import mailbox mbox = mailbox.UnixMailbox(fp, email.message_from_file)
See Also:
- Description of the traditional ``mbox'' mailbox format.
- Description of the ``maildir'' mailbox format.
- A description of problems with relying on the Content-Length: header for messages stored in mailbox files.
See About this document... for information on suggesting changes.