I came up with something to parse some mbox-format email files out of boredom. The final step, coming later, is taking the data and making a line graph out of it. I’m sure it can be done a lot better, but at any rate, may I present the beginnings of mail-graph.py

import mailbox
import sys
import os

months = {'Jan': '01', 'Feb': '02', 'Mar': '03',
		  'Apr': '04', 'May': '05', 'Jun': '06',
          'Jul': '07', 'Aug': '08', 'Sep': '09',
		  'Oct': '10', 'Nov': '11', 'Dec': '12'}
emails = {}
total = 0

def accumulate_counts(email):
	for message in email:
		date = message['date']
		split_date = date.split()
		day = split_date[1]
		month = split_date[2]
		year = split_date[3]
		monthnum = months.get(month, 0)
		yearmonth = year + monthnum

		count = emails.get(yearmonth, 0)
		if (count == 0):
			emails[yearmonth] = 1
		else:
			emails[yearmonth] += 1

mbox_path = sys.argv[1]

for root, dirs, fileNames in os.walk(mbox_path):
	for fileName in fileNames:
		path = os.path.join(root, fileName)

		mbox = mailbox.mbox(path)
		accumulate_counts(mbox)

for m, e in emails.iteritems():
	print m, e
	total += e

print total

Comments are welcome, just be gentle. I know some variables are short, but I’ll fix that up on the next iteration