The Conference mailing list server is down again. I’ve been monitoring disk utilization on the list server for awhile now in an attempt to keep the server up until after the building move. Once I realized that we were going to run out of space again I decided to take the server down preemptively.

GNU Mailman
We have a long running history with this software. Sometime around 2003 I was tasked with setting up a mailing list solution for the Conference. Several of our local churches had also expressed interest so I had to find something cheap, scalable and fast. GNU Mailman was the perfect choice. It’s free and open source software, you can continue throwing lists at it and it supports lists of all sizes.
The list server is my oldest Linux installation. I was a lowly Windows admin at the time so my good friend Alan Swartz helped me with the original Red Hat installation. My how times have changed. Back then I had a brief list of commands to create new lists, reboot the server and perform a few basic administrative tasks. I wish I had kept a copy of that original handwritten list but alas, it is lost to the sands of time. This software has proven robust over the years as it has moved across several physical computers and 3-4 different Linux distributions.

Victims of our own success.
It would seem that too much of a good thing always lead to problems. The mailing list server maintains an archive of all of the e-mail that is sent over each of the mailing lists. These files grow over time as new messages are sent. Over time disk space can become a problem. It took us several years of constant usage to amass a corpus of around 80 gigabytes (GB). Mailman must have changed how it stores e-mail because over the course of a year or so we shot up to around 280 GB. Maybe people realized that you can send attachments to the lists? Once things get back to normal I plan to dig into why these list archives are growing so quickly.
Everyone loves a good history lesson but why is the server down today? The simple answer is that the hard drive is full (again). Once it fills up the mailman daemon stops responding. Since I am out of the office it could take me a good while to discover that the system is down. That’s why I decided to go ahead and replace it.
On June 17th the system went down due to a full hard drive. With the building move coming up soon I decided to try temporarily remove the larger archives from the internal mailing lists. This would free up enough hard disk space to keep the server running (hopefully) until well after the building move when I could properly schedule an outage. I’ve been monitoring the disk utilization since then, moving archives as I can. Unfortunately, I’ve moved all of the larger ones and was forced to move forward with plans to switch the drive.
Last night I pulled the 320 GB drive and replaced it with a 1.5 Terabyte (TB) drive. It takes awhile to copy the archives back to the new drive however. Overnight 60 GB of the 280 GB data store copied. I expect progress all day and will bring the system back online as soon as I can. Hopefully this will buy us a good bit of time before I have to permanently retire some of the archives.
Update: Friday, July 9th 2010 @ 12:17 PM
The list server is back up! We have plenty of available disk space now. I’m hoping that this one will last us a good while. I still need to research what is eating so much disk space but moving forward we should be in good shape!