tgharold.com: Tech Blog

Saturday, January 10, 2009

PostgreSQL - Backup Speed Tests

Our backup script for pgsql dumps the databases out in plain text SQL format (my preferred method for a variety of reasons). The question was, do we leave it as plain text and/or which compressor do we use?

...

Here are sample times for a plain-text SQL backup.

real 3m30.523s
user 0m14.053s
sys 0m5.132s

The raw database cluster is 22GB (22150820 KB), but that includes indexes. The resulting size of the backups is 794MB (812436 KB). The specific command line used is:

pg_dump -a -b -O -t $s.$t -x $d -f $DIR/$d/$s.$t.sql

($s, $t, $d and $DIR are variables denoting the schema, table, database, and base backup directory)

...

gzip (default compression level of "6")

pg_dump -a -b -O -t $s.$t -x $d | gzip -c > $DIR/$d/$s.$t.sql

CPU usage is pegged at 100% on one of the four CPUs in the box during this operation (due to gzip compressing the streams). So we are bottlenecked by the somewhat slow CPUs in the server.

real 3m0.337s
user 2m17.289s
sys 0m6.740s

So we burned up a lot more CPU time (user 2m 17s) compared to the plain text dump (user 14s). But the overall operation still completed fairly quickly. So how much space did we save? The resulting backups are only 368MB (376820 KB), which is a good bit smaller.

(It would be better, but a large portion of our current database is comprised of the various large "specialized" tables, which are extremely random and difficult to compress. I can't talk about the contents of those tables, but the data in them is generated by a PRNG.)

...

gzip (compression level of "9")

pg_dump -a -b -O -t $s.$t -x $d | gzip -c9 > $DIR/$d/$s.$t.sql.gz

We're likely to be even more CPU-limited here due to telling gzip to "try harder". The resulting backups are 369MB (376944 KB), which is basically the same size.

real 9m39.513s
user 7m28.784s
sys 0m12.585s

So we burn up 3.2x more CPU time, but we don't really change the backup size. Probably not worth it.

...

bzip2

pg_dump -a -b -O -t $s.$t -x $d | bzip2 -c9 > $DIR/$d/$s.$t.sql.bz2

real 19m45.280s
user 3m52.559s
sys 0m11.709s

Interesting, while bzip2 is about twice as slow as gzip (the default compression level), it didn't perform as badly as the maximum compression option of gzip. The resulting backup files are only 330MB (337296 KB), which is a decent improvement over the gzip compression level.

Now, the other interesting thing is that bzip2 took a lot longer to run then gzip, but the server is pretty busy at the moment.

...

Ultimately, we ended up going with bzip2 for a variety of reasons.

- Better compression
- The additional CPU usage was not an issue
- We could change to a smaller block size (-c2) to be more friendly to rsync

Thursday, January 08, 2009

PostgreSQL - Basic backup scheme

Here's a basic backup scheme. We're using pg_dump in plain-text mode, compressing the output with bzip2, and writing the results out to files named after the database, schema and table name. It's not the most efficient method, but allows us to go back to:

- any of the past 7 days
- any Sunday within the past month
- the last week of each month in the past quarter
- the last week of each quarter within the past year
- the last week of each year

Which is about 24-25 copies of the data, stored on the hard drive. So you'll need to make sure that you have enough space on the drive to handle all of these copies.

Most of the grunt work is handled by the include script, the daily / weekly / monthly backup scripts simply setup a few variables and then call the main include script.

backup_daily.sh

#!/bin/bash
# DAILY BACKUPS (writes to a daily folder each day)
DAYNR=`date +%w`
echo $DAYNR
DIR=/backup/pgsql/daily/$DAYNR/
echo $DIR

source ~/bin/include_backup_compressed.sh

backup_weekly.sh

#!/bin/bash
# WEEKLY BACKUPS
# Backups go to a five directories based on the day of the month
# converted into 1-5 based on modulus arithmetic.  The fifth week
# will sometimes be left over for a few months depending on how
# many weeks there are in the year.
WEEKNR=`date +%d`
echo $WEEKNR
let "WEEKNR = (WEEKNR+6) / 7"
echo $WEEKNR
DIR=/backup/pgsql/weekly/$WEEKNR/
echo $DIR

source ~/bin/include_backup_compressed.sh

backup_monthly.sh

#!/bin/bash
# MONTHLY BACKUPS
# Backups go to three directories based on the month of year
# converted into 1-3 based on modulus arithmetic.
MONTHNR=`date +%m`
echo $MONTHNR
let "MONTHNR = ((MONTHNR -1) % 3) + 1"
echo $MONTHNR
DIR=/backup/pgsql/monthly/$MONTHNR/
echo $DIR

source ~/bin/include_backup_compressed.sh

backup_quarterly.sh

#!/bin/bash
# QUARTERLY BACKUPS
# Backups go to a four directories based on the quarter of the year
# converted into 1-4 based on modulus arithmetic.
QTRNR=`date +%m`
echo $QTRNR
let "QTRNR = (QTRNR+2) / 3"
echo $QTRNR
DIR=/backup/pgsql/quarterly/$QTRNR/
echo $DIR

source ~/bin/include_backup_compressed.sh

backup_yearly.sh

#!/bin/bash
# ANNUAL BACKUPS
YEARNR=`date +%Y`
echo $YEARNR
DIR=/backup/pgsql/yearly/$YEARNR/
echo $DIR

source ~/bin/include_backup_compressed.sh

include_backup_compressed.sh

#!/bin/bash
# Compressed backups to $DIR
echo $DIR
DBS=$(psql -l | grep '|' | awk '{ print $1}' | grep -vE '^-|^Name|template[0|1]')
for d in $DBS
do
    echo $d
    DBDIR=$DIR/$d
    if ! test -d $DBDIR
    then
        mkdir -p $DBDIR
    fi
    SCHEMAS=$(psql -d $d -c '\dn' | grep '|' | awk '{ print $1}' \
        | grep -vE '^-|^Name|^pg_|^information_schema')
    for s in $SCHEMAS
    do
        echo $d.$s
        TABLES=$(psql -d $d -c "SELECT schemaname, tablename FROM pg_catalog.pg_tables WHERE schemaname = '$s';" \
            | grep '|' | awk '{ print $3}' | grep -vE '^-|^tablename')
        for t in $TABLES
        do
            echo $d.$s.$t
            if [ $s = 'public' ]
            then
                pg_dump -a -b -O -t $t -x $d | bzip2 -c2 > $DIR/$d/$s.$t.sql.bz2
            else
                pg_dump -a -b -O -t $s.$t -x $d | bzip2 -c2 > $DIR/$d/$s.$t.sql.bz2
            fi
        done
    done
done

We tried using gzip instead of bzip2, but found that bzip2 worked a little better even though it uses up more CPU. We use a block size of only 200k for bzip2 in order to be more friendly to an rsync push to an external server.

Wednesday, January 07, 2009

Very basic rsync / cp backup rotation with hardlinks

Here's a very basic script that I use with RSync that makes use of hard links to reduce the overall size of the backup folder. The limitations are:

- Every morning, a server copies the current version of all files across SSH (using scp) into a "current" folder. There are two folders on the source server that get backed up daily (/home and /local).

- Later on that day, we run the following script to rsync any new files into a daily folder (daily.0 through daily.6).

- In order to bootstrap those daily.# folders, you have to use "cp -al current/* daily.2/" on each, which fills out the seven daily backup folders with hardlinks. Change the number in "daily.2" to 0-6 and run the command once for each of the seven days. Do this after the "current" folder has been populated with data pushed by the source server.

- Ideally, the source server should be pushing changes to the "current" folder using rsync. But in our case, the current server is an old Solaris 9 server without rsync. Which means that our backups are likely to be about 2x to 3x larger then they should be.

- RDiff-Backup may have been a better solution for this particular problem (and we may switch).

- This shows a good example of how to calculate the current day of week number (0-6) as well as calculating what the previous day number was (using modulus arithmetic).

- I make no guarantees that permissions or ownership will be preserved. But since the source server strips all of that information in the process of sending the files over the wire with scp, it's a moot point for our current situation. (rdiff-backup is probably a better choice for that.)

#!/bin/bash
# DAILY BACKUPS (writes to a daily folder each day)
DAYNR=`date +%w`
echo DAYNR=${DAYNR}
let "PREVDAYNR = ((DAYNR + 6) % 7)"
echo PREVDAYNR=${PREVDAYNR}
DIRS="home local"

for DIR in ${DIRS}
do
    echo "----- ----- ----- -----"
    echo "Backup:" ${DIR}
    SRCDIR=/backup/cfmc1/$DIR/current/
    DESTDIR=/backup/cfmc1/$DIR/daily.${DAYNR}/
    PREVDIR=/backup/cfmc1/$DIR/daily.${PREVDAYNR}/
    echo SRCDIR=${SRCDIR}
    echo DESTDIR=${DESTDIR}
    echo PREVDIR=${PREVDIR}

    cp -al ${PREVDIR}* ${DESTDIR}
    rsync -a --delete-after ${SRCDIR} ${DESTDIR}

    echo "Done." 
done

It's not pretty, but it will work better once the source server starts pushing the daily changes via rsync instead of completely overwriting the "current" directory every day.

The code should be pretty self explanatory but I'll explain the two key lines.

cp -al ${PREVDIR}* ${DESTDIR}

This overwrites all files in ${DESTDIR}, which is today, with the files from yesterday, but does it by creating hard links of all files. Old files which were deleted since last week will be left behind until the rsync step.

rsync -a --delete-after ${SRCDIR} ${DESTDIR}

This then brings today's folder up to date with any changes as compared to the source directory (a.k.a. "current"). It also deletes any file in today's folder that don't exist in the source directory.

References:

Easy Automated Snapshot-Style Backups with Linux and Rsync

Local incremental snap shots with rsync

Tuesday, January 06, 2009

Setting up FreeNX/NX on CentOS 5

Quick guide to setting up FreeNX/NX. This the approximate minimums on a fresh CentOS 5.1 box. We're limiting things to using public-key authentication from the outside and we already have a second ssh daemon running (listening on localhost, allowing password authentication).

Note: If you have ATRPMs configured as a repository, make sure that you exclude nx* and freenx*. (Add/edit the exclude= line in the ATRPMs .repo file.)

# yum install nx freenx
# cp /etc/nxserver/node.conf.sample /etc/nxserver/node.conf
# vi /etc/nxserver/node.conf

Change the following lines in the node.conf file:

ENABLE_SSH_AUTHENTICATION="1"
-- remove the '#' at the start of the line

ENABLE_SU_AUTHENTICATION="1"
-- remove the '#' at the start of the line
-- change the zero to a one

ENABLE_FORCE_ENCRYPTION="1"
-- remove the '#' at the start of the line
-- change the zero to a one

Change the server's public/private key pair:

# mv /etc/nxserver/client.id_dsa.key /etc/nxserver/client.id_dsa.key.OLD
# mv /etc/nxserver/server.id_dsa.pub.key /etc/nxserver/server.id_dsa.pub.key.OLD
# ssh-keygen -t dsa -N '' -f /etc/nxserver/client.id_dsa.key
# mv /etc/nxserver/client.id_dsa.key.pub /etc/nxserver/server.id_dsa.pub.key
# cat /etc/nxserver/client.id_dsa.key

You'll need to give the DSA Private Key information to people who should be allowed to use FreeNX/NX to access the server.

You'll also need to put the new public key into the authorized_keys2 file:

# cat /etc/nxserver/server.id_dsa.pub.key >> /var/lib/nxserver/home/.ssh/authorized_keys2

# vi /var/lib/nxserver/home/.ssh/authorized_keys2

Comment out the old key, put the following at the start of the good key line.

no-port-forwarding,no-X11-forwarding,no-agent-forwarding,command="/usr/bin/nxserver"

Restart the FreeNX/NX service:

# service freenx-server restart

You should now be able to connect (assuming that you specify the proper SSH port and paste the private key into the configuration).

Setup sshd to run a second instance

In order to lock down the servers like I prefer to, yet still allow FreeNX/NX to work, I have to setup a second copy of the sshd daemon. The FreeNX/NX client requires that you have sshd running with password access (not just public key), but we prefer to only allow public-key access to our servers.

I did the following on CentOS 5, it should also work for Fedora or Red Hat Enterprise Linux (RHEL). But proceed at your own risk.

1) Create a hard link to the sshd program. This allows us to distinguish it in the process list. It also makes sure that our cloned copy stays up to date as the sshd program is patched.

# ln /usr/sbin/sshd /usr/sbin/sshd_nx

2) Copy /etc/init.d/sshd to a new name

This is the startup / shutdown script for the base sshd daemon. Make a copy of this script:

# cp -p /etc/init.d/sshd /etc/init.d/sshd_nx
# vi /etc/init.d/sshd_nx

Change the following lines:

# processname: sshd_nx
# config: /etc/ssh/sshd_nx_config
# pidfile: /var/run/sshd_nx.pid
prog="sshd_nx"
SSHD=/usr/sbin/sshd_nx
PID_FILE=/var/run/sshd_nx.pid
OPTIONS="-f /etc/ssh/sshd_nx_config -o PidFile=${PID_FILE} ${OPTIONS}"
[ "$RETVAL" = 0 ] && touch /var/lock/subsys/sshd_nx
[ "$RETVAL" = 0 ] && rm -f /var/lock/subsys/sshd_nx
if [ -f /var/lock/subsys/sshd_nx ] ; then

Note: The OPTIONS= line is probably new and will have to be added right after the PID_FILE= line in the file. There are also multiple lines that reference /var/lock/subsys/sshd, you will need to change all of them.

3) Copy the old sshd configuration file.

# cp -p /etc/ssh/sshd_config /etc/ssh/sshd_nx_config

4) Edit the new sshd configuration file and make sure that it uses a different port number.

Port 28822

5) Clone the PAM configuration file.

# cp -p /etc/pam.d/sshd /etc/pam.d/sshd_nx

6) Set the new service to startup automatically.

# chkconfig --add sshd_nx

...

Test it out

# service sshd_nx start
# ssh -p 28222 username@localhost

Check for errors in the log file:

# tail -n 25 /var/log/secure

...

At this point, I would go back and change the secondary configuration to only listen on the localhost ports:

ListenAddress 127.0.0.1
ListenAddress ::1

...

References:

How to Add a New "sshd_adm" Service on Red Hat Advanced Server 2.1

How to install NX server and client under Ubuntu/Kubuntu Linux (revised)

...

Follow-on notes:

This setup is specifically for cases where you do not want to allow password access to the server from outside. The way that the NXClient authenticates is basically:

1) NXClient uses a SSH public key pair to authenticate with the server

2) It then logs into the server using the supplied username/password via SSH.

So by setting things up this way, we keep public-key authentication as the only way to login to the server - but the NX server daemon is able to cross-connect to a localhost-only SSH daemon.

Friday, January 02, 2009

Samba3: Upgrading to v3.2 on CentOS 5

CentOS 5 currently only has Samba 3.0.28 in their BASE repository. The DAG/RPMForge projects don't have updated Samba3 RPMs either (although I do see an OpenPkg RPM). So the question that I've been dealing with for the past few weeks is "where do I get newer Samba RPMs"?

Ideally, I would get these RPMs from a repository, so that I could be notified via "yum check-update" for when there are security / feature updates. While I don't mind the occasional source package in .tar.gz or .tar.bz2 format, they rapidly become a maintenance nightmare. Especially for security-sensitive packages like Samba which tend to be attack targets.

What I've found that looks promising is:

http://ftp.sernet.de/pub/samba/recent/centos/5/

Which has a .repo file and looks like it might be usable as a repository for yum. (See "Get the latest Samba from Sernet" for confirmation of this.)

# cd /etc/yum.repos.d/
# wget http://ftp.sernet.de/pub/samba/recent/centos/5/sernet-samba.repo

Now, the major change is that the RedHat/CentOS packages are named "samba.x86_64" while the sernet.de packages are named "samba3.x86_64". Also, the sernet.de folks don't sign their packages, so you will need to add "gpgcheck=0" to the end of the .repo file.

(At least, I don't think they do...)

Note: As always, before doing a major upgrade like this, make backups. At a minimum, make sure you have good backups of your Samba configuration files. We use FSVS with a SVN backend for all of our configuration files, which makes an excellent change tracking tool for Linux servers.

# yum remove samba.x86_64
# yum install samba3.x86_64
# service smb start

With luck, you should now be up and running with v3.2 of Samba. You can verify this by looking at the latest log file in the /var/log/samba/ directory.

Thursday, January 01, 2009

LVM Maximum number of Physical Extents

Working on a 15-disk system (750GB SATA drives) so this issue came up again:

What is the maximum number of physical extents in LVM?

The answer is that there's no limit on the number of PEs within a Physical Volume (PV) or Volume Group (VG). The limitation is, instead, the maximum number of PEs that can be formed into a Logical Volume (LV). So a 32MB PE size allows for LVs up to 2TB in size.

Note: Older LVM implementations may only allow a maximum of 65k PEs across the entire PV. The VG can be composed of multiple PVs however.

So if you want to be safe, make your PE size large enough that you only end up with less then 65k PEs. Just remember that all PVs within a VG need to use the same PE size. So if you're planning for lots of expansion down the road, with very large PVs to be added later, you may wish to bump up your PE size by a factor of 4 or 8.

A good example of this is a Linux box using Software RAID across multiple 250GB disks. The plan down the road is to replace those disks with larger models, create new Software RAID arrays across the extended areas on the larger disks, then extend VGs across new PVs. At the start, you might only have a net space (let's say RAID6, 2 hot-spares, 15 total disks) of around 2.4TB. That's small enough that a safe PE size might be 64MB (about 39,000 PEs).

Except that down the road, disks are getting much larger (1.5TB drives are now easily obtainable). So if we had stuck with a 64MB PE size, our individual LVs could be no larger then 4TB. If we were to put in 2TB disks (net space of about 20TB), the number of PEs would end up growing by about 8x (312,000). We might even see 4TB drives in a 3.5" size, which would be closer to 40TB of net space.

A PE size of 256MB might have served us better when we setup that original PV area. It would allow individual LVs sized up to 16TB. The only downside is that you won't want to create LVs smaller then 256MB and you'll want to make sure all LVs are multiples of 256MB.

Bottom line, when setting up your PE sizes, plan for a 4x or 8x growth.

Example:

The following shows how to create a volume group (VG) with 512MB physical extents instead of the default 4MB extent size (also known as "PE size" in the vgdisplay command output).

# vgcreate -s 512M vg6 /dev/md10

References:

LVM Manpage - Talks about the limit in IA32.
Maximum Size Of A Logical Volume In LVM - Walker News (blog entry)
Manage Linux Storage with LVM and Smartctl - Enterprise Networking Planet

Wednesday, December 31, 2008

Backfilling

Going to do a lot of backfill posts based on what I've been working on in the past year. Most of it has to do with CentOS 5, PostgreSQL, LVM, Software RAID, with a smattering of other issues thrown in.

Tuesday, November 25, 2008

Xen - Issues with Windows DomU client clocks

Time is off by an hour in my XEN vm

Quote:

There is a RealTimeIsUniversal registry flag hidden in the windows registry that can be set (its not in by default) to let Windows interpret the RTC as UTC as well.
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation] "RealTimeIsUniversal"=dword:00000001

Summary:

The ultimate solution is to probably run an NTP client under the Windows environment to force the software clock to slave properly.

Monday, November 24, 2008

Dovecot - Upgrading Notes

At the office, we're using a virtual Dovecot server where each person's mail folders are owned by an unique user in the Linux account system.

Dovecot: Virtual Users - covers the basics

Dovecot: UserIds - explains why we use different UIDs for different accounts

Dovecot: LDA - explains how to setup the "deliver" executable to deal with multiple user IDs.

Multiple UIDs

If you're using more than one UID for users, you're going to have problems running deliver. Most MTAs won't let you run deliver as root, so for now you'll need to make it setuid root. However it's insecure to make deliver setuid-root, especially if you have untrusted users in your system. You should take extra steps to make sure that untrusted users can't run it and potentially gain root privileges. You can do this by placing deliver into a directory where only your MTA has execution access.

All of which means that whenever we update Dovecot with "yum update", we need to make sure that we fix up the Dovecot "deliver" executable file (which uses setuid) to also match.

So let's figure out which "deliver" we need to fix up each time:

# find / -name deliver
/usr/local/libexec/dovecot/lda/deliver
/usr/libexec/dovecot/deliver

Alternately, look at Postfix's master.cf file:

# grep "deliver" /etc/postfix/master.cf
# grep "deliver" /etc/postfix/master.cf
# Many of the following services use the Postfix pipe(8) delivery
# The Cyrus deliver program has changed incompatibly, multiple times.
  flags=R user=cyrus argv=/usr/lib/cyrus-imapd/deliver -e -m ${extension} ${user}
  user=cyrus argv=/usr/lib/cyrus-imapd/deliver -e -r ${sender} -m ${extension} ${user}
# Other external delivery methods.
     flags=DRhu user=vmail:vmail argv=/usr/local/libexec/dovecot/lda/deliver -f ${sender} -d ${recipient}
#    flags=DRhu user=vmail:vmail argv=/usr/lib/dovecot/deliver -d ${recipient}

The key line in that jumble being:

flags=DRhu user=vmail:vmail argv=/usr/local/libexec/dovecot/lda/deliver -f ${sender} -d ${recipient}

If we take a look at the file size, ownership, attributes and security settings (for SELinux):

# cd /usr/libexec/dovecot/
# ls -la deliver 
-rwxr-xr-x 1 root root 802824 Jul 24 06:32 deliver
# ls -lZ deliver 
-rwxr-xr-x  root root system_u:object_r:dovecot_deliver_exec_t deliver

# cd /usr/local/libexec/dovecot/
# ls -la
total 24
drwx------ 2 vmail vmail 4096 Jun 16 23:00 lda
# ls -lZ
drwx------  vmail vmail system_u:object_r:bin_t          lda

# cd /usr/local/libexec/dovecot/lda/
# ls -la deliver
-rwsr-xr-x 1 root root 802824 Aug 12 18:12 deliver
# ls -lZ deliver
-rwsr-xr-x  root root system_u:object_r:dovecot_deliver_exec_t deliver

What we see here is a couple of things regarding how the Dovecot LDA is setup.

The Postfix master.cf file controls which "deliver" gets used for local delivery of e-mail. (The "deliver" executable is part of Dovecot, so we're using Dovecot for local delivery.)

/usr/local/libexec/dovecot//lda/deliver - this is where our "setuid" version of the "deliver" executable is located

The "lda" folder is owned by vmail:vmail (limited access) and only the vmail user can access the contents of the folder. Postfix knows to use the vmail user because that's what we told it to do in the master.cf file.

Both the official "deliver" executable (in the /usr/libexec/dovecot/ directory) and our "setuid" copy have the same byte size, date/time, and are both labled as "system_u:object_r:dovecot_deliver_exec_t" for SELinux.

The steps that we take when we update Dovecot are then:

yum update dovecot - updates the Dovecot executables to the latest version over at the atrpms repository

cp --no-preserve=all /usr/libexec/dovecot/deliver /usr/local/libexec/dovecot/lda/deliver - copies the new deliver executable over to the lda folder where we will setuid on it

chmod u+s is what we use to set the setuid bit on the copy in the lda folder, but we shouldn't need to do that once we set things up initially

service dovecot restart - restarts the Dovecot service using the new executables

grep "AVC" /var/log/audit/audit.log | tail -n 50 - look for any errors relating to Dovecot

Tuesday, November 11, 2008

New smartphone?

I'm sorta in the market for a new smartphone. What I have now is a Motorola Q from 2006. I've been mostly happy with it (except that it wasn't a touchscreen device), but it's been giving me more and more trouble lately. And the cell coverage by Verizon is, frankly, horrid at my current place of residence. Which makes the phone rather useless for me at home.

Back when I bought the MotoQ, I made the mistake of not researching software requirements before buying. I didn't understand that the non-touchscreen version of Windows Mobile was not the same as the touchscreen version of Windows Mobile. Which means that there are a lot of applications that I simply cannot run on the MotoQ (especially Pocket Quicken).

So this time around, I'm going to go in the opposite direction and look at the software that I want to run, then decide what devices will run it.

Pocket Quicken

Oh look, Pocket Quicken now supports the non-touchscreen phones like MotoQ. Basically, I can go with anything except an iPhone or a Blackberry.

CityTime Alarms

Only works with Windows Mobile 5 & 6, or the PPC 2003 version. But there are also versions for SmartPhones and Palm OS.

Agenda One

Windows Mobile 5 & 6.

IM+

BlackBerry, Windows Mobile, Symbian, J2ME, Palm OS and iPhone/iPod Touch!

zaTelnet Professional

MS Smartphones and Pocket PC

Tuesday, November 04, 2008

Update #1 to Current frustrations with Thunderbird

Well, I started up Thunderbird in safe mode. If you have installed Thunderbird in the default location, that's as easy as pressing [Windows-R] (or Start, Run...) and entering the following:

C:\Program Files\Mozilla Thunderbird\thunderbird.exe -safe-mode

What I found is that, although the error still occurred, Thunderbird was a lot better about not hanging up while retrieving e-mail from large IMAP mailboxes.

So I'm going to uninstall most of my add-ons and see if I can make things work better. Disabling the add-ons didn't have any effect, so I think I'll have to completely remove most of them instead.

To give you an idea of what I consider "large". I have an IMAP account on our mail server that contains 17GB (2,000,000 messages) worth of e-mail. I have another two accounts that contain 1.7GB (40,000 messages) and 2.5GB (200,000 messages) respectively. However, none of the individual IMAP folders have more then 60,000 messages each. And most of the large folders only have 15,000 to 30,000 messages.

Monday, October 27, 2008

Current frustrations with Thunderbird

I'm currently plagued by the following showing up in my error console in Thunderbird 2.0.0.17 (20080914).

Error: uncaught exception: [Exception... "Component returned failure code: 0x80550006 [nsIMsgFolder.getMsgDatabase]"  nsresult: "0x80550006 ()"  location: "JS frame :: chrome://messenger/content/mailWidgets.xml :: parseFolder :: line 2061"  data: no]

The other thing that happens is that eventually, Thunderbird stops talking (hangs) to my IMAP mail server (over SSL). So I'm unable to send e-mail messages over SMTP/SSL (port 465), or am I able to retrieve any messages from our IMAP (Dovecot over SSL) server until I restart Thunderbird.

It can take anywhere from 5 minutes to 5 hours for this problem to occur. Starting in safe mode fixes some of the issue, but Thunderbird still chokes up after I've hit a few dozen IMAP folders to get new headers and to download messages.

Tuesday, September 30, 2008

SELinux - dealing with exceptions

So we're seeing errors in our /var/log/messages like:

Sep 28 03:52:51 fvs-pri setroubleshoot: SELinux is preventing freshclam (freshclam_t) "read" to ./main.cld (var_t). For complete SEL
inux messages. run sealert -l 10ce7bfb-6c44-473e-94a1-4691c04d2bef
Sep 28 03:52:51 fvs-pri setroubleshoot: SELinux is preventing freshclam (freshclam_t) "write" to ./clamav (var_t). For complete SELi
nux messages. run sealert -l 276efeb4-6990-497f-bcf0-6df0327c6f52

It's fairly easy to write exceptions, using audit2allow.

For example:

# cd /usr/share/selinux/devel/

# egrep "(clam)" /var/log/audit/audit.log /var/log/audit/audit.log.1 | audit2allow -M clam20081230

# /usr/sbin/semodule -i clam20080930.pp

Note: This is the very quick and dirty way of dealing with exceptions - it really doesn't fix the underlying issue.

Friday, August 08, 2008

Debugging SSL connections

We're experiencing odd delays when talking to our mail server over SMTPS (SSL). I just found this post which helps us debug it.

How to debug SSL SMTP - by Sébastien Wains

$ openssl s_client -connect mail.example.com:465

Wednesday, August 06, 2008

Linux RAID tuning and troubleshooting

Ran across this while searching another topic.

http://makarevitch.org/rant/raid/

Friday, August 01, 2008

ngg.js and fgg.js site infections

One of our users visited a website that was infected with the ngg.js and fgg.js codes (they get injected into the HTML files on the server towards the end of the page).

We've blocked it in our squid configuration by:

# squid.conf

acl blocked_urls dstdomain "/etc/squid/blocked_urls.squid"
acl blocked_regex urlpath_regex "/etc/squid/blocked_regex.squid"

# Block some URLs
http_access deny blocked_urls
http_access deny blocked_regex

# blocked_urls.squid
.bjxt.ru
.njep.ru
.uhwc.ru

# blocked_regexp.squid
/fgg\.js
/ngg\.js

I won't explain this too much except to say that the blocked_urls file is designed to block top-level domains, while the regexp file is for blocking URLs using a regular expression.

Monday, July 28, 2008

Dovecot - CMUSieve Errors

After upgrading our CentOS 5 box to the latest revisions this week (including Dovecot 1.1), we're seeing the following error message in the log files. Sieve was working fine with Dovecot 1.0.

# cat /var/vmail/dovecot-deliver.log

deliver(ruth@example.com): Jul 28 11:11:44 Error: dlopen(/usr/lib64/dovecot/lda/lib90_cmusieve_plugin.so) failed: /usr/lib64/dovecot/lda/lib90_cmusieve_plugin.so: undefined symbol: message_decoder_init
deliver(ruth@example.com): Jul 28 11:11:44 Fatal: Couldn't load required plugins

# ls -l /usr/libexec/dovecot/sievec
-rwxr-xr-x 1 root root 165152 Jun 11 03:21 /usr/libexec/dovecot/sievec

# yum list | grep "dovecot"
dovecot.x86_64 1:1.1.1-2_76.el5 installed
dovecot-sieve.x86_64 1.1.5-8.el5 installed
dovecot.x86_64 1:1.1.2-2_77.el5 atrpms
dovecot-devel.x86_64 1:1.1.2-2_77.el5 atrpms

Not sure yet what went wrong during the upgrade.

...

Update: The problem was that we had made a copy of Dovecot's "deliver" executable to make it setuid to work with virtual user local delivery. After the update, we forgot to update this copy of the exectuable.

Once we updated the setuid copy of "deliver", things worked fine.

Monday, July 14, 2008

fsvs urls or fsvs initialize results in No such file or directory (2) error

So I was setting up FSVS 1.1.16 on a new CentOS 5.1 box this week (one of the first things that I do as soon as possible before configuration starts). And I encountered the following error:

# fsvs -v urls svn+ssh://svn.example.com/sys-machinename

An error occurred at 14:40:31.865: No such file or directory (2)
in url__output_list
in url__work
in main: action urls failed

...

The fix is to create the "/etc/fsvs" folder

fsvs 1.1.16 was smart enough to remind me to create /var/spool/fsvs, but it apparently doesn't give a good error message when the "/etc/fsvs" folder does not exist.

Saturday, June 28, 2008

FSVS - Install on CentOS 5

(Note: This has been mostly superseded by my newer post FSVS: Install on CentOS 5.4)

The following should be enough (and is probably overkill) to install all of the dependencies that FSVS 1.1.16 needs on CentOS 5 (and CentOS 5.1)

# yum install subversion subversion-devel ctags apr apr-devel gcc gdbm gdbm-devel pcre pcre-devel apr-util-devel

# ./configure
configure: *** Now configuring FSVS ***
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
configure: "CFLAGS=-g -O2 -D_GNU_SOURCE=1 -D_FILE_OFFSET_BITS=64 -idirafter /usr/local/include -idirafter /usr/include -idirafter /openpkg/include -idirafter /usr/include/apr-1"
configure: "LDFLAGS= -L/usr/local/lib -L/openpkg/lib"
checking for pcre_compile in -lpcre... yes
checking for apr_md5_init in -laprutil-1... no
configure: error: Sorry, can't find APR.
See `config.log' for more details.

Note the addition of "apr-util-devel" at the end of the "yum install" line. This fixes the error when you run ./configure for FSVS and get the "can't find APR" error.

In older versions of CentOS 5, we did not need to also specify the apr-util-devel package.