I’m not sure what that middle option means but it sure seems scary…
We’ve all had days like this. Someone makes a bone headed mistake and wipes out a ton of work. Our faithful backup systems were working as planned but the backups were corrupt. I know I’ve been through this before. I’m sure you have too. Here’s an interesting video on Pixar’s experience and just how lucky they were to have survived it relatively unscathed.
A few months ago an e-mail hit the TriLUG mailing list advertising an open source conference in Columbia, SC. The Palmetto Open Source Software Conference (POSSCON), now in it’s fourth year, brings together a who’s who of the open source movement. This conference brings together these leaders to discuss the latest technology trends with local professionals, students, academics and enthusiasts. It was very interesting to see groups of executives, developers, IT professionals and students all mingling together as a community. This more than anything drove home the breadth and depth of the open source community.
Having driven in from Raleigh to attend this conference I had a rather high set of expectations. It’s a considerable investment to leave the office for three or four days and drive three and a half hours. This conference would not disappoint! Columbia is a wonderful place to hold a conference of this size. The hotels are an easy walk from the conference center. There are a lot of excellent dining establishments all within the same area. I didn’t have to go far to attend the conference, sleep or eat. I decided to drive this time but I would have been just as well off had I flown. I didn’t really need a car once I got here.
The support that this conference has gathered in its four years of existence is simply amazing. The sponsor list included companies such as Microsoft (yes, they were here), Oracle, Red Hat, Verizon, Linode.com, Google and many others. The support from the City of Columbia was also very impressive. Mayor Benjamin welcomed us on the first day and reinforced his excitement and support for the conference. It’s obvious that Columbia is making a big push to become a technology center.
Since I help produce a few conferences a year I spent some time looking over the visible POSSCON operations. I am always looking for better ways to put together our show. Here are a few lessons that I picked up this week:
At the end of the day I am very excited to have been able to attend this conference. Unless something similar pops up in Raleigh I will likely add this to my annual list. Thanks POSSCON for putting on such a great conference! I am excited to hear about what is in store for next year!
This is an excellent video introduction to the concepts of WordPress. Pay close attention to the open source principles and references that Matt makes. Those same ideas drive everything I do at work and online.
Two of my favorite Linux projects are forking!
Mandriva -> Mageia
Mandriva is becoming Mageia. My first successful foray into the world of open source software (way back in 2002!) was made using Mandriva (called Mandrake way back then) as the operating system. At that time I was running the current version (Mandrake 9.2). I started building linux servers using MandrakeLinux 10.0. There were some rough spots in the road for those of us hosting servers with this product. Things had a nasty way of changing on us during upgrades that would bring down mission critical systems until I could figure out where the configuration files had moved to without warning. Once they changed the name to Mandriva and created a subscription based Club membership I knew that my days were numbered. I hated the thought of having to pay for the club repository so that I could install the software that I wanted. I switched to Ubuntu about three years ago and never looked back.
After following Mandriva’s various staffing decisions and financial woes it would appear that a large part of the development team has decided to fork the project. Many of the management decisions that Mandriva (the company) made over the last several years have been disastrous for the end users of Mandriva (the linux distribution). I for one am glad to see the community taking back control of this project. I am watching this project very closely, having signed up for the mailing lists and spending a good bit of time in the IRC chatroom. If this project gets off the ground I will strongly consider switching back! You can start following the project at http://mageia.org.
OpenOffice -> LibreOffice
I must admit this announcement caught me a bit by surprise:
On the morning of September 28th, a community of developers and other volunteers announced that they were forming The Document Foundation to fulfil the promise of independence written in the original OpenOffice charter. According to the group, “The Foundation will be the cornerstone of a new ecosystem where individuals and organizations can contribute to and benefit from the availability of a truly free office suite. It will generate increased competition and choice for the benefit of customers and drive innovation in the office suite market. From now on, the OpenOffice.org community will be known as ‘The Document Foundation.’”
After OpenOffice.org was organized by Sun the project got off to a good start but then stagnated. Now there is not much development and the product is falling further and further behind. After Sun was purchased by Oracle, it would appear to be falling behind even faster. All of that changes with this announcement. I hope that Oracle steps up and does the right thing by donating the name (OpenOffice.org) to the community (The Document Foundation). This would put the project in a similar arrangement to the one between The Mozilla Foundation and Firefox. I will be paying close attention to the developments of this project as well. You can follow along as well at http://www.documentfoundation.org/.
It’s a great day for freedom!
Since I had to figure out how to limit outbound traffic by domain today I thought I would post the procedure for everyone to enjoy. Listed below are the configuration changes that I made to our main postfix gateway server.
Add the following lines to /etc/postfix/master.cf. You could also copy the smtp line and rename it to something else. I use the term slow in this example.
# Outbound rate limiting
slow unix - - n - 1 smtp
Now add the following line to /etc/postfix/transport. You can rate limit as many individual domains as you wish using the transport file. Don’t forget to postmap transport when you are finished. You should also have transport_maps set in /etc/postfix/main.cf.
The last step is to add the following block of code to /etc/postfix/main.cf:
# Outbound rate limiting
slow_destination_rate_delay = 120
slow_destination_concurrency_limit = 5
slow_destination_recipient_limit = 100
slow_connection_cache_time_limit = 0
slow_never_send_ehlo = yes
slow_connect_timeout = 5
This code forces a delay of 120 seconds between connection attempts. It also forces five concurrent connections at any one time. The current postfix default is 10. I’m not sure I would go lower than three for an organization of our size. It also limits recipients to 100 per connection attempt. Don’t forget to restart the postfix daemon after making these changes!
I have been spending a good deal of time in our mailing list server archives trying to run down several permissions related problems. After doing a great deal of searching online I realized that there was no place online that listed the comprehensive required permissions for the /var/lib/mailman/archives and /var/lib/mailman/lists folders. I spent a few hours today blindly stumbling through the permissions before I got them right so I thought I would print them here for reference. This is by no means a comprehensive list of the official permissions. It is however, what is working for me.
drwxrwsr-x 50 root mailman 4.0K Jul 26 13:17 . drwxrwx--- 312 root mailman 20K Jul 29 14:04 .. drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 2010-April -rw-rw-r-- 1 root mailman 13K Jul 29 13:35 2010-April.txt drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 2010-February -rw-rw-r-- 1 root mailman 8.7K Jul 29 13:35 2010-February.txt drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 2010-January -rw-rw-r-- 1 root mailman 21K Jul 29 13:35 2010-January.txt drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 2010-July -rw-rw-r-- 1 root mailman 34K Jul 29 13:35 2010-July.txt drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 2010-June -rw-rw-r-- 1 root mailman 25K Jul 29 13:35 2010-June.txt drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 2010-March -rw-rw-r-- 1 root mailman 24K Jul 29 13:35 2010-March.txt drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 2010-May -rw-rw-r-- 1 root mailman 22K Jul 29 13:35 2010-May.txt drwxrwxr-x 569 root mailman 20K Jul 29 13:35 attachments drwxrwx--- 2 root mailman 24K Jul 29 13:36 database -rw-rw-r-- 1 root mailman 38K Jul 29 13:36 index.html -rw-rw---- 1 root mailman 2.7K Jul 29 13:36 pipermail.pck
drwxrwxr-x 2 root mailman 4.0K Jul 29 13:36 . drwxrwxr-x 94 root mailman 12K Jul 29 13:36 .. -rw-rw-r-- 1 root mailman 2.5K Jul 29 13:36 002505.html -rw-rw-r-- 1 root mailman 2.2K Jul 29 13:36 002506.html -rw-rw-r-- 1 root mailman 2.5K Jul 29 13:36 002507.html -rw-rw-r-- 1 root mailman 4.4K Jul 29 13:36 author.html -rw-rw-r-- 1 root mailman 4.4K Jul 29 13:36 date.html lrwxrwxrwx 1 root mailman 11 Jul 29 13:35 index.html -> thread.html -rw-rw-r-- 1 root mailman 4.4K Jul 29 13:36 subject.html -rw-rw-r-- 1 root mailman 5.1K Jul 29 13:36 thread.html
drwxrwx--- 2 root mailman 24K Jul 29 13:36 . drwxrwxr-x 94 root mailman 12K Jul 29 13:36 .. -rw-rw---- 1 root mailman 31K Jul 29 13:36 2010-July-article -rw-rw---- 1 root mailman 4.4K Jul 29 13:36 2010-July-author -rw-rw---- 1 root mailman 3.9K Jul 29 13:36 2010-July-date -rw-rw---- 1 root mailman 4.6K Jul 29 13:36 2010-July-subject -rw-rw---- 1 root mailman 3.9K Jul 29 13:36 2010-July-thread
drwxrwsr-x 2 root mailman 4.0K Jul 29 13:17 . drwxrwsr-x 194 root mailman 12K Jul 6 21:51 .. -rw-rw---- 1 root mailman 1.7K Jul 6 21:51 admindbpreamble.html -rw-rw---- 1 root mailman 8.9K Jul 6 21:51 config.db -rw-rw---- 1 root mailman 8.9K Jul 6 21:51 config.db.last -rw-rw---- 1 apache mailman 14K Jul 29 13:17 config.pck -rw-rw---- 1 mailman mailman 14K Jul 29 00:54 config.pck.last -rw-rw---- 1 root mailman 12K Jul 27 18:42 digest.mbox -rw-rw---- 1 root mailman 189 Jul 6 21:51 handle_opts.html -rw-rw---- 1 root mailman 1.1K Jul 6 21:51 headfoot.html -rw-rw---- 1 root mailman 3.1K Jul 6 21:51 listinfo.html -rw-rw---- 1 root mailman 4.1K Jul 6 21:51 options.html -rw-rw---- 1 mailman mailman 46 Jul 29 00:54 pending.pck -rw-rw---- 1 root mailman 2 Jul 6 21:51 request.db -rw-rw---- 1 mailman mailman 13K Jul 6 21:51 request.pck -rw-rw---- 1 root mailman 1.2K Jul 6 21:51 roster.html -rw-rw---- 1 root mailman 198 Jul 6 21:51 subscribe.html
After setting these permissions the mailman server resumed normal operations. It looks like apache will take over the files that are edited directly from the web interface. That should be ok. The main problem is giving mailman read/write access to the files that it needs to update and maintain the mailing list archives. Trust me, if mailman can’t access any of these files it will move the message quietly over to the /var/spool/mailman/shunt directory. Nobody wants that. Once you resolve any permissions issues be sure to restart the mailman daemon. To remove e-mail from the shunt directory run /usr/lib/mailman/bin/unshunt.
I have been battling a weird archives issue with our GNU Mailman mailing list server. We have some lists that archive properly when e-mail is sent to them. We have other lists where the e-mail is delivered but does not show up in the archives. We also have lists where e-mail sent to them disappears and is never heard from again. I have been hassling with this permissions issue literally for years now. I picked the baton up again today and decided to try to bring this one home. First I started in the mailman error logs:
Jul 26 12:25:43 2010 (2755) Uncaught runner exception: [Errno 13] Permission denied: ‘/var/lib/mailman/archives/private/listname/index.html’Jul 26 12:25:43 2010 (2755) Traceback (most recent call last):File “/usr/lib/mailman/Mailman/Queue/Runner.py”, line 112, in _oneloopself._onefile(msg, msgdata)File “/usr/lib/mailman/Mailman/Queue/Runner.py”, line 170, in _onefilekeepqueued = self._dispose(mlist, msg, msgdata)File “/usr/lib/mailman/Mailman/Queue/ArchRunner.py”, line 73, in _disposemlist.ArchiveMail(msg)File “/usr/lib/mailman/Mailman/Archiver/Archiver.py”, line 217, in ArchiveMailh.close()File “/usr/lib/mailman/Mailman/Archiver/pipermail.py”, line 324, in closeself.write_TOC()File “/usr/lib/mailman/Mailman/Archiver/HyperArch.py”, line 1094, in write_TOCtoc = open(os.path.join(self.basedir, ‘index.html’), ‘w’)IOError: [Errno 13] Permission denied: ‘/var/lib/mailman/archives/private/listname/index.html’Jul 26 12:25:43 2010 (2755) SHUNTING: 1280155615.876646+a19c8dce602a83897d29592d36d618fc80195ec7
>>I ran several times check_perms -f and it says all is fixed.>>> check_perms is lying (actually, there are many files, as opposed to> directories, that check_perms doesn’t check). The above file and all> files in /var/lib/mailman/archives/private/ excluding those in> /var/lib/mailman/archives/private/*/database/ need to be group> writable.>> Once you fix these permissions, you could run bin/unshunt to add the> shunted messages to the archive, but there is an issue in that the> messages have been successfully added to> /var/lib/mailman/archives/private/mylist.mbox/mylist.mbox, and> unshunting will add them again.>> Rather than trying to fix archive permissions, I suggest you verify> that /var/lib/mailman/archives/private/mylist.mbox/mylist.mbox> contains all the lists posts from inception to date, and mayby verify> there are no stray “From ” lines in message bodies with bin/cleanarch,> and then stop Mailman and rebuild the archive with>> bin/arch –wipe listname>> and then start Mailman. This way, the pipermail archive will be> completely rebuilt with correct permissions.>> This is one reason why I always recommend when moving lists to just> move the LISTNAME.mbox/LISTNAME.mbox file and build the archive on the> new machine with bin/arch.>> Note if you do this, remove the shunted messages from qfiles/shunt/ so> they don’t accidently get unshunted in the future.
The Conference mailing list server is down again. I’ve been monitoring disk utilization on the list server for awhile now in an attempt to keep the server up until after the building move. Once I realized that we were going to run out of space again I decided to take the server down preemptively.
We have a long running history with this software. Sometime around 2003 I was tasked with setting up a mailing list solution for the Conference. Several of our local churches had also expressed interest so I had to find something cheap, scalable and fast. GNU Mailman was the perfect choice. It’s free and open source software, you can continue throwing lists at it and it supports lists of all sizes.
The list server is my oldest Linux installation. I was a lowly Windows admin at the time so my good friend Alan Swartz helped me with the original Red Hat installation. My how times have changed. Back then I had a brief list of commands to create new lists, reboot the server and perform a few basic administrative tasks. I wish I had kept a copy of that original handwritten list but alas, it is lost to the sands of time. This software has proven robust over the years as it has moved across several physical computers and 3-4 different Linux distributions.
It would seem that too much of a good thing always lead to problems. The mailing list server maintains an archive of all of the e-mail that is sent over each of the mailing lists. These files grow over time as new messages are sent. Over time disk space can become a problem. It took us several years of constant usage to amass a corpus of around 80 gigabytes (GB). Mailman must have changed how it stores e-mail because over the course of a year or so we shot up to around 280 GB. Maybe people realized that you can send attachments to the lists? Once things get back to normal I plan to dig into why these list archives are growing so quickly.
Everyone loves a good history lesson but why is the server down today? The simple answer is that the hard drive is full (again). Once it fills up the mailman daemon stops responding. Since I am out of the office it could take me a good while to discover that the system is down. That’s why I decided to go ahead and replace it.
On June 17th the system went down due to a full hard drive. With the building move coming up soon I decided to try temporarily remove the larger archives from the internal mailing lists. This would free up enough hard disk space to keep the server running (hopefully) until well after the building move when I could properly schedule an outage. I’ve been monitoring the disk utilization since then, moving archives as I can. Unfortunately, I’ve moved all of the larger ones and was forced to move forward with plans to switch the drive.
Last night I pulled the 320 GB drive and replaced it with a 1.5 Terabyte (TB) drive. It takes awhile to copy the archives back to the new drive however. Overnight 60 GB of the 280 GB data store copied. I expect progress all day and will bring the system back online as soon as I can. Hopefully this will buy us a good bit of time before I have to permanently retire some of the archives.
Update: Friday, July 9th 2010 @ 12:17 PM
The list server is back up! We have plenty of available disk space now. I’m hoping that this one will last us a good while. I still need to research what is eating so much disk space but moving forward we should be in good shape!
Putting a crank-shaft on the XO laptop was a mistake, but the biggest mistake was not having Sugar run as an application “on a vanilla Linux laptop”, said OLPC founder and chairman Nicholas Negroponte.
“Sugar should have been an application [residing] on a normal operating system,” he told ZDNet Asia in an interview. “But what we did…was we had Sugar do the power management, we had Sugar do the wireless management–it became sort of an omelet. The Bios talked directly with Sugar, so Sugar became a bit of a mess.”
After spending several years working in IT as a career I have learned that there is at times a disconnect between the words of management and the actual inner workings of a product. This looked funny to me so I wondered what the actual people working behind the scenes thought of this. Turns out Sugar wasn’t as bad as advertised:
Here’s the problem: through a somewhat regrettable set of naming decisions, the name “Sugar” came to represent two entirely different things. It was the name for the new learning-oriented graphical interface that OLPC was building, but it was also the name for the entire XO operating system, one tiny part of which was Sugar the GUI, and the rest of which was mostly Fedora Linux.
Nicholas, evidently, still remains blissfully unaware of any of this. As is plain to see from his own words, what he considers to be the biggest mistake of the project has nothing to do with Sugar the GUI, and everything to do with the gross, hairy, complicated systems development work that OLPC was doing to support the XO’s special hardware features. And to be clear, I mean “short bus special”, not “shiny unicorn special”.
Let me explain something to you. For most of OLPC’s existence, we had about two guys working on Sugar the UI. They were GUI developers, with GNOME backgrounds. They were not at all the same people doing systems development work to support our hardware. No resources were taken away from systems development to do Sugar. If Sugar hadn’t happened at all, we would have still had to do all the systems work to get Linux working on the XO, and it would have still taken just as long. So if you’re looking for things to blame, Sugar is not the droid you are looking for.
In truth, the XO ships a pretty shitty operating system, and this fact has very little to do with Sugar the GUI. It has a lot to do with the choice of incompetent hardware vendors that provided half-assedly built, unsupported and unsupportable components with broken closed-source firmware blobs that OLPC could neither examine nor fix.
So we wound up with a keyboard whose keys get stuck. A dual-mode touchpad, capacitive and resistive, where one mode doesn’t work at all, and the other makes the cursor spontaneously jump around and sometimes shuts off the touchpad altogether, prompting OLPC kernel developers to beg for saner hardware in the next round. We had board engineering issues that made power management practically impossible. We had a custom display controller chip that was incomplete in some regards, and completely broken in others. We had an embedded controller that blocks keyboard events and stops machine suspend, and to which we — after a long battle — received the source, under strict NDA, only to find a jungle of nested
ifstatements, twelve levels deep, and no code history. (The company that wrote the code doesn’t use version control, see. They put dates into code comments when they make changes, and the developers mail each other zip files with new versions.) And we had a wireless chip that is so far beyond fucked, it’s just about funny.
(Each of those words is a different link. Click them all, I dare you.)
Thinking back, there’s a hardware incident I remember particularly fondly: one of our vendors sent us a kernel driver patch which enhanced support for their component in our machine. They chose to implement the enhancement by setting up a hole which allowed any unprivileged user to take over the kernel, prompting our kernel guy to send a private e-mail to the OLPC tech team demanding that, in the future, we avoid buying hardware from companies whose programmers are, direct quote, “crack-smoking hobos”.
In the end, Nicholas’ bit of interview nonsense just doesn’t pass the smell test. Customers aren’t stupid. There’s close to a million XOs out there; if Sugar was OLPC’s biggest mistake, Windows on the XO would be selling like hotcakes. Let me remind you, then, that the number of Windows-based XOs that OLPC has sold is exactly zero.
So next time you hear Nicholas break out the egg metaphors and wave his hands about the Sugar that doomed it all, shrug and smile. Hell, If I were a meaner person, I’d ask Nicholas why it is that Windows — you know, the Windows from Microsoft, mercifully unstained with the mistake of Sugar — can’t even shut down an XO without throwing up a blue screen of death.
I honestly don’t know what to say to this. It’s a shame that the top down management style of the OLPC project nearly killed it. I remember sitting around with my IT buddies excited about the future of Sugar and the XO laptop. To be honest, most of us have moved on to something else. What a shame…