Sunday, October 03, 2010

3ware 9650SE failing RAID6 array in Linux

I'm using a 3ware 9650SE 16-port controller in my Linux server, setup as a big single-disk RAID6 array. However, it's currently operating in degraded mode due to at least one disk failure. See 3ware RAID maintenance with tw_cli for a link to the 3ware documentation PDF. See also Fixing a degraded disk on the 3ware raid (blur).

# tw_cli help

Copyright (c) 2009 AMCC
AMCC/3ware CLI (version 2.00.09.012)

Commands  Description
-------------------------------------------------------------------
show      Displays information about controller(s), unit(s) and port(s).
flush     Flush write cache data to units in the system.
rescan    Rescan all empty ports for new unit(s) and disk(s).
update    Update controller firmware from an image file.
commit    Commit dirty DCB to storage on controller(s).     (Windows only)
/cx       Controller specific commands.
/cx/ux    Unit specific commands.
/cx/px    Port specific commands.
/cx/phyx  Phy specific commands.
/cx/bbu   BBU specific commands.                               (9000 only)
/cx/ex    Enclosure specific commands.                       (9690SA only)
/ex       Enclosure specific commands.                      (9KSX/SE only)

Certain commands are qualified with constraints of controller type/model
support.  Please consult the tw_cli documentation for explanation of the
controller-qualifiers.

Type help  to get more details about a particular command.
For more detail information see tw_cli's documentation. 

So if we take a look at the drives installed:

# tw_cli show

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c6    9650SE-16ML  16        7        1       1       4       4      1  

The columns are as follows:

Ports - # of drive ports on the card
Drives - # of drives connected
Units - # of RAID units created on the card
NotOpt - "not optimal"
RRate - "rebuild rate"
VRate - ???
BBU - Battery backup

# tw_cli /c6 show

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-6    DEGRADED       -       -       64K     4889.37   OFF    OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     NOT-PRESENT      -      -           -             -
p1     NOT-PRESENT      -      -           -             -
p2     NOT-PRESENT      -      -           -             -
p3     NOT-PRESENT      -      -           -             -
p4     NOT-PRESENT      -      -           -             -
p5     NOT-PRESENT      -      -           -             -
p6     OK               u0     698.63 GB   1465149168    5QD4####            
p7     OK               u0     698.63 GB   1465149168    3QD0####            
p8     NOT-PRESENT      -      -           -             -
p9     OK               u0     698.63 GB   1465149168    3QD0####            
p10    NOT-PRESENT      -      -           -             -
p11    OK               u0     698.63 GB   1465149168    5QD3####            
p12    OK               u0     698.63 GB   1465149168    3QD0####            
p13    OK               u0     698.63 GB   1465149168    5QD4####            
p14    OK               u0     698.63 GB   1465149168    3QD0####            
p15    NOT-PRESENT      -      -           -             -

As you can see, port 8 and port 10 have failed. Which means our RAID6 array is in dire shape. After testing, one of the units had failed completely, the other is merely suspect and was put back into the array. I did the rebuild in the BIOS, but when rebuilding, you will see the following:

# tw_cli /c6/u0 show all
/c6/u0 status = DEGRADED-RBLD
/c6/u0 is rebuilding with percent completion = 13%(A)
/c6/u0 is not verifying, its current state is DEGRADED-RBLD
/c6/u0 is initialized.
/c6/u0 Write Cache = off
/c6/u0 volume(s) = 1
/c6/u0 name = vg6                  
/c6/u0 serial number = 5QD40L3K00005F00#### 
/c6/u0 Ignore ECC policy = off       
/c6/u0 Auto Verify Policy = off       
/c6/u0 Storsave Policy = protection  
/c6/u0 Command Queuing Policy = on        
/c6/u0 Parity Number = 2         

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-6    DEGRADED-RBLD  13%(A)  -       -     64K     4889.37   
u0-0     DISK      OK             -       -       p14   -       698.481   
u0-1     DISK      OK             -       -       p13   -       698.481   
u0-2     DISK      OK             -       -       p12   -       698.481   
u0-3     DISK      OK             -       -       p11   -       698.481   
u0-4     DISK      DEGRADED       -       -       p10   -       698.481   
u0-5     DISK      OK             -       -       p9    -       698.481   
u0-6     DISK      DEGRADED       -       -       -     -       698.481   
u0-7     DISK      OK             -       -       p7    -       698.481   
u0-8     DISK      OK             -       -       p6    -       698.481   
u0/v0    Volume    -              -       -       -     -       4889.37

Specifically, we can see that the array is 13% through with the rebuild after only 32 minutes. I have not yet replaced port-8 as I'm going to wait for the array to finish rebuilding before I jostle it again.

Notes:

I strongly recommend that you feed the output of "tw_cli /c# show" and "tw_cli /c#/u# show all" into text files daily and parse them for issues. Or mail them to a monitoring email address. Being able to tell the technician to pull drive XYZ with a specific serial # helps eliminate errors. But that's hard to do if you don't keep track of your serial numbers.

On the systems I administer, we have a /reports/configuration folder where we consolidate all those types of reports. Things like the output of pvscan, lvscan, df, ntpq, /proc/mdstat, etc. all get dumped into text files daily and then committed to the central SVN repository for the server with FSVS. When things go bad later, we can step back through the SVN repository and look at the various reports at previous points in time.

Example of bad capacitors from a few years ago

A few years back, we bought a bunch of GeForce 6200LE PCIe video cards for various uses in servers/desktops.  I liked them at the time because they are fanless (one less moving part to break).  However, in the last 2 years, we've had a lot of them fail due to bad capacitors

And when one of these capacitors "pops", it sounds like a small firecracker going off in the room.  Very noticeable at the time. Reminds me of the old miniature pop-caps or the small paper bags of a single grain of gunpowder (or flash powder?) mixed in with a few rocks (the size of a pea) that you could throw at hard surfaces and it would make a popping noise.





Friday, October 01, 2010

FSVS: Updated install on CentOS 5.5

(Also see my older post on this: FSVS - Install on CentOS 5 or FSVS - Install on CentOS 5.4. Or the original post where I explained the power of FSVS for sysadmins.)

Once again, I'm starting with the assumption that this is a pretty bare-bones CentOS 5.5 server install, with only the "server" package group (or no package groups at all) being selected during the initial install.  The basic steps remain the same for situations where you merely want to use FSVS and SVN to keep track of changes to your system:


  1. Setup the RPMForge repository
  2. Install the packages needed for FSVS
  3. Download and compile FSVS
  4. Configure ignore patterns
  5. Do the base check-ins
For the most part, I try to do this process as early in the lifespan of the server as possible.  But there's always a few minor things that get done before I get this far (creating an initial user, doing a "yum update" to patch the system up, etc).  Even a mature server can benefit from adding FSVS, but you'll find it much more useful the longer that you use it as it gives you a quick index to "what changed and why did you change it".

Setting up RPMForge

In order to get the latest Subversion packages for your system, you'll have to add RPMForge as a source repository. The CentOS base repository only has Subversion 1.4.2 and the latest is currently 1.6.12. I recommend doing this in conjunction with the yum-priorities package.

# yum install yum-priorities

After installing the yum-priorities package, you should edit the CentOS-Base.repo file found under /etc/yum.repos.d/. For the base repositories, I recommend setting them to priority values of 1 through 3  For example, in the "[base]" section, you would add (2) lines to the end of that section:

[base]
...
priority=1
exclude=subversion-*

The "priority=" tells yum-priorities that if it finds a package in multiple repositories, that [base] should take precedence.  The "exclude=" line keeps us from pulling Subversion from the [base] repository (and we'll instead pull it from RPMForge).

Now we can install the RPMForge repository (see Using RPMForge).  You'll need to look at the release folder in order to get the RPM name.

# cd /root/
# mkdir software
# cd software
# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm
# rpm -Uhv rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm
# cd /etc/yum.repos.d/

Now you should edit the rpmforge.repo file and insert a priority= line. I recommend a value of 10 or 25. In addition, I suggest also telling the yum package manager to only pull in specific packages from RPMForge (or any 3rd party repository).  This is a bit of overkill, but yum-priorities is a bit of overkill in itself and I've run into issues in the past where priorities weren't enough and I wish I had gone with using "includepkgs" instead.  (Note that while "exclude" is used to exclude packages, the opposite term is "includepkgs" and not "include".)  This will make the end of your rpmforge.repo file look like:


[rpmforge]
...
priority=10
includepkgs=subversion-*

You can verify that you'll pull in the latest Subversion package by the following command:

# yum info subversion
Available Packages
Name       : subversion
Arch       : x86_64
Version    : 1.6.12
Release    : 0.1.el5.rf
Size       : 6.8 M
Repo       : rpmforge
Summary    : Modern Version Control System designed to replace CVS
URL        : http://subversion.tigris.org/
License    : BSD

Install the packages needed for FSVS

# yum install subversion subversion-devel ctags apr apr-devel gcc gdbm gdbm-devel pcre pcre-devel apr-util-devel

Download and compile FSVS

As always, you shouldn't compile code as the root user.

# su username
$ mkdir -p ~/software/fsvs
$ cd ~/software/fsvs
$ wget http://download.fsvs-software.org/fsvs-1.2.2.tar.bz2
$ tar xjf fsvs-1.2.2.tar.bz2
$ cd fsvs-1.2.2
$ ./configure
$ make
$ exit
# cp /home/username/fsvs/fsvs-1.2.2/src/fsvs /usr/local/bin/
# chmod 755 /usr/local/bin/fsvs

Creating the repository on the SVN server

This is how we setup users on our SVN server. Machine accounts are prefixed as "sys-" in front of the machine name. The SVN repository name matches the name of the machine. In general, only the machine account should have write access to the repository, although you may wish to add other users to the group so that they can gain read-only access.

# useradd -m sys-www-test
# passwd sys-www-test
# svnadmin create /var/svn/sys-www-test
# chmod -R 750 sys-www-test
# chmod -R g+s sys-www-test/db
# chown -R sys-www-test:sys-www-test sys-www-test

Back on the source machine (our test machine), we'll need to create an SSH key that can be used on our SVN server. You may wish to use a slightly larger RSA key (3200 bits or 4096 bits) if you're working on an extra sensitive server. But a key size of 2048 bits should be secure for another decade for this purpose.

# cd /root/
# mkdir .ssh
# chmod .ssh 700
# cd .ssh
# /usr/bin/ssh-keygen -N '' -C 'svn key for root@hostname' -t rsa -b 2048 -f root@hostname
# chmod 600 *
# cat root@hostname.pub


Copy this key into the clipboard or send it to the SVN server or the SVN server administrator. Then we'll need to create a ~/.ssh/config file to tell the user what account name, port and key file to use when talking to the SVN server.

# vi /root/.ssh/config
Host svn.tgharold.com
Port 22
User sys-www-test
IdentityFile /root/.ssh/root@hostname
# chmod 600 *


Back on the SVN server, you'll need to finish configuration of the user that will add files to the SVN repository.

# su username
$ cd ~/
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ cat >> authorized_keys
(paste in the SSH key from the other server)
$ chmod 600 *

Now you'll want to prepend the following in front of the key line in the authorized_keys file.

command="/usr/bin/svnserve -t -r /var/svn",no-agent-forwarding,no-pty,no-port-forwarding,no-X11-forwarding

That ensures (mostly) that the key can only be used to run the svnserve command and that it can't be used to access a command shell on the SVN server. Test the configuration back on the original server by issuing the "svn info" command. Alternately, you can try to ssh to the SVN repository server. Errors will usually either be logged in /var/log/secure on the source server or in the same log file on the SVN repository server. Here's an example of a successful connection:

# ssh svn.tgharold.com
( success ( 2 2 ( ) ( edit-pipeline svndiff1 absent-entries commit-revprops depth log-revprops partial-replay ) ) )

This shows that they key is running the "svnserve" command automatically.

Connect the system to the SVN repository

The very first command that you'll need to issue for FSVS is the "urls" (or "initialize") command. This tells FSVS what repository will be used to store the files.

# cd /
# mkdir /var/spool/fsvs
# mkdir /etc/fsvs/
# fsvs urls svn+ssh://svn.tgharold.com/sys-www-test/

You may see the following error, which means you need to create the /var/spool/fsvs folder, then reissue the fsvs urls command.

stat() of waa-path "/var/spool/fsvs/" failed. Does your local WAA storage area exist?

The following error means that you forgot to create the /etc/fsvs/ folder.

Cannot write to the FSVS_CONF path "/etc/fsvs/".

Configure ignore patterns and doing the base check-in

When constructing ignore patterns, generally work on adding a few directories at a time to the SVN repository. Everyone has different directories that they won't want to version, so you'll need to tailor the following to match your configuration. However, I generally recommend starting with the following (this is the output from "fsvs ignore dump", which you can pipe into a file, edit, then pipe back into "fsvs ignore load"):

group:ignore,./backup/
group:ignore,./bin/
group:ignore,./dev/
group:ignore,./etc/fsvs/
group:ignore,./etc/gconf/
group:ignore,./etc/gdm/
group:ignore,./home/
group:ignore,./lib/
group:ignore,./lib64/
group:ignore,./lost+found
group:ignore,./media/
group:ignore,./mnt/
group:ignore,./proc/
group:ignore,./root/
group:ignore,./sbin/
group:ignore,./selinux/
group:ignore,./srv/
group:ignore,./sys/
group:ignore,./tmp/
group:ignore,./usr/bin/
group:ignore,./usr/include/
group:ignore,./usr/kerberos/
group:ignore,./usr/lib/
group:ignore,./usr/lib64
group:ignore,./usr/libexec/
group:ignore,./usr/sbin/
group:ignore,./usr/share/
group:ignore,./usr/src/
group:ignore,./usr/tmp/
group:ignore,./usr/X11R6/
group:ignore,./var/cache/
group:ignore,./var/gdm/
group:ignore,./var/lib/
group:ignore,./var/lock/
group:ignore,./var/log/
group:ignore,./var/mail/
group:ignore,./var/run/
group:ignore,./var/spool/
group:ignore,./var/tmp/

Then you'll either want to ignore (or encrypt) the SSH key files.

# cd /
# fsvs ignore group:ignore,./root/.ssh
# fsvs ignore group:ignore,./etc/ssh/shadow*
# fsvs ignore group:ignore,./etc/ssh/ssh_host_key
# fsvs ignore group:ignore,./etc/ssh/ssh_host_dsa_key
# fsvs ignore group:ignore,./etc/ssh/ssh_host_rsa_key

You can check what FSVS is going to version by using the "fsvs status pathname" command (such as "fsvs status /etc"). Once you are happy with the selection in a particular path, you can do the following command:

# fsvs ci -m "base check-in" /etc

Repeat this for the various top level trees until you have checked everything in. Then you should do one last check-in at the root level that catches anything you might have missed.

Thursday, September 30, 2010

GRUB manual install into the boot sector

One of the useful things to know is how to install GRUB onto a disk when you replace one in the software RAID.

Installing GRUB natively

Caution: Installing GRUB's stage1 in this manner will erase the normal boot-sector used by an OS. 

# grub
grub> device (hd0) /dev/sdf
grub> root (hd0,0)
grub> find /grub/stage1
grub> setup (hd0)
grub> quit

Monday, September 20, 2010

Error: task md#_resync:### blocked for more than ### seconds

While building a new machine using CentOS 5.5, found this in the log files.  It also gets spammed to the console every 2-3 minutes.

kernel: INFO: task md1_resync:434 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: md1_resync    D ffff8100bebbde30     0   434     35           437   433 (L-TLB)
kernel:  ffff8100bebbdd70 0000000000000046 0000000000000000 ffff8100bed6300c
kernel:  ffff8100bed6340c 0000000000000009 ffff8100beb9d100 ffff8100bff937a0
kernel:  00000011b743e63b 000000000005ad3b ffff8100beb9d2e8 0000000000000000
kernel: Call Trace:
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] md_do_sync+0x1d8/0x833
kernel:  [] enqueue_task+0x41/0x56
kernel:  [] __activate_task+0x56/0x6d
kernel:  [] dequeue_task+0x18/0x37
kernel:  [] thread_return+0x62/0xfe
kernel:  [] autoremove_wake_function+0x0/0x2e
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] md_thread+0xf8/0x10e
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] md_thread+0x0/0x10e
kernel:  [] kthread+0xfe/0x132
kernel:  [] child_rip+0xa/0x11
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] kthread+0x0/0x132
kernel:  [] child_rip+0x0/0x11

There's a Red Hat Bugzilla entry (#18061) on the topic.  Basically, it's because I have Linux software RAID in the process of synchronizing the freshly created arrays.  On the disks in the machine, I have each one divided up into (3) partitions, and each partition slice belongs to a different mdadm RAID.  Since mdadm is smart enough not to thrash the disks, any arrays which have not yet been synchronized will be put into a "DELAYED" mode

Personalities : [raid1]
md0 : active raid1 sdc1[2] sdb1[1] sda1[0]
      256896 blocks [3/3] [UUU]
     
md2 : active raid1 sdc3[2] sdb3[1] sda3[0]
      455892480 blocks [3/3] [UUU]
      [=>...................]  resync =  6.1% (27871296/455892480) finish=323.7min speed=22033K/sec
     
md1 : active raid1 sdc2[2] sdb2[1] sda2[0]
      32234304 blocks [3/3] [UUU]
        resync=DELAYED
     
unused devices:

The error is basically a false positive and no harm is being done except for the message spam on the console and in the system log files.  It will happen every time that this system does the weekly array sync, however (until it gets fixed).

Monday, August 30, 2010

First impressions of ext4 with large file deletion

I've been using ext3 for a long time on my CentOS servers, but ext3 has one big flaw that has constantly affected me.  It takes forever and a day in order to delete large files.

On my home server, I have a very large array where I put raw video files prior to conversion into compressed/lossy formats.  This works well, but then when I go to clean off the multi-gigabyte raw files it will take 5-10 minutes to delete a few dozen gigabytes of data.  So I've been putting up with for a long time this way.

Fast forward to 2010, and ext4 is finally close to production ready in CentOS 5.5.  (It may even already be marked as production-ready...)  I switched that scratch space over to ext4 last week and it's performing admirably.  Much better performance when deleting large multi-gigabyte files.  Which helps the system not feel so slow while it's doing that delete.

I probably won't use ext4 for my primary volumes quite yet.  I still plan on leaving /home, / (root), /var, /boot as ext3 (ext2 for /boot), but I will probably put the larger user data file systems as ext4 from now on.  It's enough of a step forward to be worth it (and ext4 is pretty stable now).

Monday, August 02, 2010

Changing "ls" color so that directories don't show up in dark blue

One of the annoyances with Linux if you're using a color terminal (or a color terminal emulator like SecureCRT or PuTTY) is that the "ls" (list directory contents) command has a default of dark blue for directory entries.  Which is fine if you're using a white background, but doesn't work well if you prefer an old-school black background for your terminal.

Fortunately, this is configurable with the LSCOLORS environment variable.  The default string is "exfxcxdxbxegedabagacad" for OS X.  However, on Linux, it looks more like you may be able to configure this using /etc/DIR_COLORS instead.

On my CentOS 5.4 machines, DIR_COLORS lists the entry for directories as "DIR 01;34", which means that it uses a bold blue on the default background.  One option to ensure a bit more contrast would be to change this to always print on a white background.  So you would change this entry to look like:

DIR 01;34;47 # NEW default is Bold blue with White background

References:

change ls colour (color) in bash - GlowingApple  explains back in 2005 how to change colors on the MacOSXHints forum

COLORS Lscolors - another page that explains the LSCOLORS variable

Thursday, June 24, 2010

Server migration to LogicWeb

Due to JTLNetworks being clueless about how to reliably deliver e-mail, I've severed my use of their services and moved to LogicWeb. But, Blogger.com has also changed how their service works, so all of my old blog URLs are going to result in 404 errors for a bit.

The new URL for this blog is now:

http://techblog.tgharold.com/

(So far, I'm a lot happier with LogicWeb then with JTL. Mostly because LogicWeb gives me secure IMAP storage of e-mail over SSL.)

Friday, May 28, 2010

Migrate from Direct Partitions to LVM Volumes

Weekend Project: Migrate from Direct Partitions to LVM Volumes by Nathan Willis

This tutorial covers the basics of migrating from regular disk partitions to LVM logical volumes (LVs). It displays some of the strengths of LVM, where you can resize logical volumes, or move logical volumes to another physical device in the same volume group.

Saturday, April 24, 2010

Setting up NX Client access to Linux servers running FreeNX/NX

In order to talk to a Linux box running FreeNX/NX server...

1) Grab a copy of the NX client from NoMachine's download page. For Windows users, look for "NX Client for Windows".

2) Install the client software.

3) Start up "NX Client for Windows".

4) The "Login" field will be your username on the Linux server (i.e. "thomas"). The "Password" field is obviously, your login password on the Linux server. In the "Session" field, enter a descriptive name of the server that you are trying to connect to. (This name does not have to reflect the server's DNS name, IP address and can be any name or nickname.)

5) As you press [Tab] to leave the "Session" field, you will be prompted whether you want to "Create" a new session or "Rename" the existing session. In most cases, you'll want to pick "Create" so that you can setup a new session.

6) You will now be taken to the advanced configuration screen.

a) Server Host: Enter the fully qualified DNS name, or the IP address.
b) Server Port: Most servers run SSH on port 22.
c) Desktop: I usually use "Unix / Gnome"
d) Speed: ISDN or ADSL is a good first choice
e) Display: Choose full-screen or setup a window of any size that you want.

7) Before you click "Save" or "OK", check with your server administrator and find out whether you need to provide the NX client with a custom key. This key is located on the Linux server and is usually at the following location:

/etc/nxserver/client.id_dsa.key

If you need this key, be sure to obtain it, then use the "Server, Key" button on the advanced configuration screen. Just paste the key in to the provided box and you'll be able to talk to the server.

8) Click "Save" then "OK".

9) If everything is configured properly, you can now enter your login password and press the "Login" button to access the remote server.

...

When you are finished with your session on the remote server, you can either "Terminate" (logout) or "Suspend" it if you want to leave things running for later.

Thursday, March 25, 2010

Why I don't trust JTL Networks

Here's an example of the spurious error messages I get from the ever-so-helpful people over at JTL Networks.

For the record - they've been ignoring my requests to have this error looked into and addressed for over a year now. Very very poor customer service and technical support over there.

Return-Path: <>
Received: (qmail 26238 invoked for bounce); 13 Jun 2010 23:45:18 -0000
Date: 13 Jun 2010 23:45:18 -0000
From: MAILER-DAEMON@apache.org
To: announce-return-133-@httpd.apache.org
Subject: failure notice

Hi. This is the qmail-send program at apache.org.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

<tgh@tgharold.com>:
69.36.9.168 does not like recipient.
Remote host said: 556 Address unavailable.
Giving up on 69.36.9.168.

Sunday, December 06, 2009

SNMP and MRTG: Interesting OIDs in net-snmp

These can all be found via the "snmpwalk" command in CentOS 5.4 (or RHEL 5.4).

# snmp -v 1 -c public localhost | less

The above assumes that you have configured the SNMP agent on the server to allow read-only access to SNMP v1 clients via the "public" community string.

Approximate number of users logged in

HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 3

Number of logged in users. As you can see, this is a gauge value which means (in SNMP terms) that it is a value that can increase or decrease over time. By default, MRTG assumes that the value is monotonically increasing.

Note: Since MRTG only samples once every 5 minutes, this value is very approximate.

Approximate number of system processes

HOST-RESOURCES-MIB::hrSystemProcesses.0 = Gauge32: 171

Current number of processes running. This often makes a good second number to pair up with the number of users. Or you could choose to display them on separate graphs.

Note: Same issue as logging the number of users, MRTG only samples every 5 minutes which makes this an estimation at best.

MRTG: Reporting on processes and users

Here's a fragment from my MRTG configuration file that shows how I reported on the number of users and processes. I could not get MRTG to resolve the plain names to OIDs automatically, so I had to put in the full numeric OIDs.

### PROCESSES & USERS
Options[_]: gauge, integer, noborder, noinfo, nolegend, noo, nopercent, pngdate, printrouter, transparent
WithPeak[_]: ymw
Legend2[_]:
Legend3[_]:
Legend4[_]:

#Target[localhost.system.users]: hrSystemNumUsers.0&hrSystemNumUsers.0:public@localhost
Target[localhost.system.users]: .1.3.6.1.2.1.25.1.5.0&.1.3.6.1.2.1.25.1.5.0:public@localhost
MaxBytes[localhost.system.users]: 50
YLegend[localhost.system.users]: Users
LegendI[localhost.system.users]: Users
Legend1[localhost.system.users]: Approximate number of users logged in
ShortLegend[localhost.system.users]: ~
Title[localhost.system.users]: firewall:Users - Approximate System Users
PageTop[localhost.system.users]: <h1>firewall: Approximate System Users</h1>
    <div id="sysdetails">
    </div>

#Target[localhost.system.processes]: hrSystemProcesses.0&hrSystemProcesses.0:public@localhost
Target[localhost.system.processes]: .1.3.6.1.2.1.25.1.6.0&.1.3.6.1.2.1.25.1.6.0:public@localhost
MaxBytes[localhost.system.processes]: 5000
YLegend[localhost.system.processes]: Processes
LegendI[localhost.system.processes]: Processes
Legend1[localhost.system.processes]: Approximate number of processes
ShortLegend[localhost.system.processes]: ~
Title[localhost.system.processes]: firewall:Procs - Approximate System Processes
PageTop[localhost.system.processes]: <h1>firewall: Approximate System Processes</h1>
    <div id="sysdetails">
    </div>


Real Memory in Use

HOST-RESOURCES-MIB::hrStorageType.2 = OID: HOST-RESOURCES-TYPES::hrStorageRam
HOST-RESOURCES-MIB::hrStorageDescr.2 = STRING: Real Memory
HOST-RESOURCES-MIB::hrStorageAllocationUnits.2 = INTEGER: 1024 Bytes
HOST-RESOURCES-MIB::hrStorageSize.2 = INTEGER: 8043628
HOST-RESOURCES-MIB::hrStorageUsed.2 = INTEGER: 7962536


Swap (Virtual) Memory in Use

HOST-RESOURCES-MIB::hrStorageType.3 = OID: HOST-RESOURCES-TYPES::hrStorageVirtualMemory
HOST-RESOURCES-MIB::hrStorageDescr.3 = STRING: Swap Space
HOST-RESOURCES-MIB::hrStorageAllocationUnits.3 = INTEGER: 1024 Bytes
HOST-RESOURCES-MIB::hrStorageSize.3 = INTEGER: 4021814
HOST-RESOURCES-MIB::hrStorageUsed.3 = INTEGER: 8292


Processor Utilization

First, we need to find the OIDs of the CPUs.

# snmpwalk -v 1 -c public localhost | grep "HOST-RESOURCES" | egrep "hrDeviceProcessor"
HOST-RESOURCES-MIB::hrDeviceType.768 = OID: HOST-RESOURCES-TYPES::hrDeviceProcessor
HOST-RESOURCES-MIB::hrDeviceType.769 = OID: HOST-RESOURCES-TYPES::hrDeviceProcessor
HOST-RESOURCES-MIB::hrSWRunParameters.32755 = STRING: "hrDeviceProcessor"


That gives us 768 and 769 to look at.

# snmpwalk -v 1 -c public localhost | grep "HOST-RESOURCES" | egrep "(768|769)"
HOST-RESOURCES-MIB::hrDeviceIndex.768 = INTEGER: 768
HOST-RESOURCES-MIB::hrDeviceIndex.769 = INTEGER: 769
HOST-RESOURCES-MIB::hrDeviceType.768 = OID: HOST-RESOURCES-TYPES::hrDeviceProcessor
HOST-RESOURCES-MIB::hrDeviceType.769 = OID: HOST-RESOURCES-TYPES::hrDeviceProcessor
HOST-RESOURCES-MIB::hrDeviceDescr.768 = STRING: AuthenticAMD: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
HOST-RESOURCES-MIB::hrDeviceDescr.769 = STRING: AuthenticAMD: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
HOST-RESOURCES-MIB::hrDeviceID.768 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrDeviceID.769 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrProcessorFrwID.768 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrProcessorFrwID.769 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrProcessorLoad.768 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.769 = INTEGER: 1


So by looking at the hrProcessorLoad for nodes 768 & 769, we can track the CPU utilization on this PC. But unless you can get MRTG to load the MIBs, you'll need to use the numeric OID format.

# snmpwalk -v 1 -c public localhost -On | egrep "(768|769)" | grep "INTEGER"
.1.3.6.1.2.1.25.3.3.1.2.768 = INTEGER: 9
.1.3.6.1.2.1.25.3.3.1.2.769 = INTEGER: 17

Friday, November 27, 2009

FSVS: Install on CentOS 5.4

(Also see my older post on this: FSVS - Install on CentOS 5. Or the original post where I explained the power of FSVS for sysadmins.)

I'm going to start with the assumption that this is a base CentOS 5.4 install without *any* package groups selected during the initial install. In my case, this is a DomU that I'm setting up under Xen to serve as testing server for web development. The only thing I've done so far is setting the root password and configuring it to use a static IP address.

The basic steps will be:
  1. Setup the RPMForge repository
  2. Install the packages needed for FSVS
  3. Download and compile FSVS
  4. Configure ignore patterns
  5. Do the base check-ins
Setting up RPMForge

In order to get the latest Subversion packages for your system, you'll have to add RPMForge as a source repository. The CentOS base repository only has Subversion 1.4.2 and the latest is currently 1.6.6. I recommend doing this in conjunction with the yum-priorities package.

# yum install yum-priorities

After installing the yum-priorities package, you should edit the CentOS-Base.repo file found under /etc/yum.repos.d/. For the base repositories, I recommend setting them to priority values of 1 through 3. For example:

[base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-5
priority=1
exclude=subversion-*


I generally give the [base], [updates], [addons], [extras] repositories a priority of "1", with [centosplus] and [contrib] getting a priority of "3". In addition, you'll need to add or edit the "exclude=" line in the [base] repository section to exclude Subversion from being sourced from that repository. This will allow the Yum package manager to look in other repositories to find Subversion. Now we can install the RPMForge repository (see Using RPMForge).

# cd /root/
# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm
# rpm -Uhv rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm
# cd /etc/yum.repos.d/


Now you should edit the rpmforge.repo file and insert a priority= line. I recommend a value of 10 or 25. You can now verify that you'll pull in the latest Subversion package by the following command:

# yum info subversion
Available Packages
Name : subversion
Arch : x86_64
Version : 1.6.6
Release : 0.1.el5.rf
Size : 6.8 M
Repo : rpmforge


Install the packages needed for FSVS

# yum install subversion subversion-devel ctags apr apr-devel gcc gdbm gdbm-devel pcre pcre-devel apr-util-devel

Download and compile FSVS

As always, you shouldn't compile code as the root user.

# su username
$ mkdir ~/fsvs
$ cd ~/fsvs
$ wget http://download.fsvs-software.org/fsvs-1.2.1.tar.bz2
$ tar xjf fsvs-1.2.1.tar.bz2
$ cd fsvs-1.2.1
$ ./configure
$ make
$ exit
# cp /home/username/fsvs/fsvs-1.2.1/src/fsvs /usr/local/bin/
# chmod 755 /usr/local/bin/fsvs


Creating the repository on the SVN server

This is how we setup users on our SVN server. Machine accounts are prefixed as "sys-" in front of the machine name. The SVN repository name matches the name of the machine. In general, only the machine account should have write access to the repository, although you may wish to add other users to the group so that they can gain read-only access.

# useradd -m sys-www-test
# passwd sys-www-test
# svnadmin create /var/svn/sys-www-test
# chmod -R 750 sys-www-test
# chmod -R g+s sys-www-test/db
# chown -R sys-www-test:sys-www-test sys-www-test


Back on the source machine (our test machine), we'll need to create an SSH key that can be used on our SVN server. You may wish to use a slightly larger RSA key (3200 bits or 4096 bits) if you're working on an extra sensitive server. But a key size of 2048 bits should be secure for another decade for this purpose.

# cd /root/
# mkdir .ssh
# chmod .ssh 700
# cd .ssh
# /usr/bin/ssh-keygen -N '' -C 'svn key for root@hostname' -t rsa -b 2048 -f root@hostname
# cat root@hostname.pub


Copy this key into the clipboard or send it to the SVN server or the SVN server administrator. Then we'll need to create a ~/.ssh/config file to tell the user what account name, port and key file to use when talking to the SVN server.

# vi /root/.ssh/config
Host svn.tgharold.com
Port 22
User sys-www-test
IdentityFile /root/.ssh/root@hostname


Back on the SVN server, you'll need to finish configuration of the user that will add files to the SVN repository.

# su username
$ cd ~/
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ cat >> authorized_keys
(paste in the SSH key from the other server)
$ chmod 600 authorized_keys


Now you'll want to prepend the following in front of the key line in the authorized_keys file.

command="/usr/bin/svnserve -t -r /var/svn",no-agent-forwarding,no-pty,no-port-forwarding,no-X11-forwarding

That ensures (mostly) that the key can only be used to run the svnserve command and that it can't be used to access a command shell on the SVN server. Test the configuration back on the original server by issuing the "svn info" command. Alternately, you can try to ssh to the SVN repository server. Errors will usually either be logged in /var/log/secure on the source server or in the same log file on the SVN repository server. Here's an example of a successful connection:

# ssh svn.tgharold.com
( success ( 2 2 ( ) ( edit-pipeline svndiff1 absent-entries commit-revprops depth log-revprops partial-replay ) ) )


This shows that they key is running the "svnserve" command automatically.

Connect the system to the SVN repository

The very first command that you'll need to issue for FSVS is the "urls" (or "initialize") command. This tells FSVS what repository will be used to store the files.

# cd /
# mkdir /var/spool/fsvs
# mkdir /etc/fsvs/
# fsvs urls svn+ssh://svn.tgharold.com/sys-www-test/


You may see the following error, which means you need to create the /var/spool/fsvs folder, then reissue the fsvs urls command.

stat() of waa-path "/var/spool/fsvs/" failed. Does your local WAA storage area exist?

The following error means that you forgot to create the /etc/fsvs/ folder.

Cannot write to the FSVS_CONF path "/etc/fsvs/".

Configure ignore patterns and doing the base check-in

When constructing ignore patterns, generally work on adding a few directories at a time to the SVN repository. Everyone has different directories that they won't want to version, so you'll need to tailor the following to match your configuration. However, I generally recommend starting with the following:

# cd /
# fsvs ignore group:ignore,./dev
# fsvs ignore group:ignore,./etc/fsvs/
# fsvs ignore group:ignore,./etc/gconf/
# fsvs ignore group:ignore,./etc/gdm/
# fsvs ignore group:ignore,./home/
# fsvs ignore group:ignore,./lost+found
# fsvs ignore group:ignore,./media/
# fsvs ignore group:ignore,./mnt/
# fsvs ignore group:ignore,./proc
# fsvs ignore group:ignore,./root/.gconf
# fsvs ignore group:ignore,./root/.nautilus
# fsvs ignore group:ignore,./selinux/
# fsvs ignore group:ignore,./srv
# fsvs ignore group:ignore,./sys
# fsvs ignore group:ignore,./tmp/
# fsvs ignore group:ignore,./usr/tmp/
# fsvs ignore group:ignore,./var/gdm/
# fsvs ignore group:ignore,./var/lib/mlocate/
# fsvs ignore group:ignore,./var/lock/
# fsvs ignore group:ignore,./var/log/
# fsvs ignore group:ignore,./var/mail/
# fsvs ignore group:ignore,./var/run/
# fsvs ignore group:ignore,./var/spool/
# fsvs ignore group:ignore,./var/tmp/


Then you'll either want to ignore (or encrypt) the SSH key files.

# cd /
# fsvs ignore group:ignore,./root/.ssh
# fsvs ignore group:ignore,./etc/ssh/shadow*
# fsvs ignore group:ignore,./etc/ssh/ssh_host_key
# fsvs ignore group:ignore,./etc/ssh/ssh_host_dsa_key
# fsvs ignore group:ignore,./etc/ssh/ssh_host_rsa_key


You can check what FSVS is going to version by using the "fsvs status pathname" command (such as "fsvs status /etc"). Once you are happy with the selection in a particular path, you can do the following command:

# fsvs ci -m "base check-in" /etc

Repeat this for the various top level trees until you have checked everything in. Then you should do one last check-in at the root level that catches anything you might have missed.

Saturday, November 21, 2009

FSVS ignore patterns (1.2.0)

Here's an example of a more complex FSVS ignore/take pattern.

On our mail server, we store all mail in MailDir folders under the structure of:

/var/vmail/domainname/username/

We keep our user-specific Sieve scripts in a "Home" folder under that location.

/var/vmail/domainname/username/Home/

So obviously, we want to version the Home folder under each user. But we don't want to version the other MailDir folders at all. The trick to this is that because our folder structure is predictable, we can do it in a handful of FSVS ignore patterns.

# cd /
# fsvs ignore dump >> /root/fsvs-ignore-yyyymmdd.txt


That makes a backup of your current rules, just in case you decide that you don't like your changes (they can be reloaded with "fsvs ignore load < filename").

# cd /
# fsvs ignore group:ignore,./var/vmail/lost+found
# fsvs ignore group:take,./var/vmail/*
# fsvs ignore group:take,./var/vmail/*/*
# fsvs ignore group:take,./var/vmail/*/*/Home
# fsvs ignore group:take,./var/vmail/*/*/Home/**
# fsvs ignore group:ignore,./var/vmail/**


Line 1 "group:ignore,./var/vmail/lost+found": In our setup /var/vmail is a separate mount point, so we'll want to ignore the lost+found folder.

Line 2 "group:take,./var/vmail/*": This tells FSVS to version anything at or below /var/vmail.

Line 3 "ignore group:take,./var/vmail/*/*": Grabs the next directory level and files below /var/vmail.

Line 4 "group:take,./var/vmail/*/*/Home": Now we grab just the "Home" folder inside of the user's MailDir directory. This lets us ignore the new|cur|tmp folders as well as the other hidden MailDir folders (such as .Junk).

Line 5 "group:take,./var/vmail/*/*/Home/**": Grab everything inside of Home and below that point. This will grab all of the Sieve scripts or other files that are located there. If you wanted to exclude certain files in Home, you would insert that ignore rule above this line.

Line 6 "group:ignore,./var/vmail/**": This is the clean-up rule. Anything not explicitly mentioned above here will now be ignored. This keeps us from versioning the messages inside the user's MailDir folders.

Friday, November 20, 2009

Getting started with GPG4Win

GNU Privacy Guard for Windows Home Page (GPG4Win) - The GPG4Win project recently released version 2.0.1 of their product, so I figured it was a good time to reexamine GPG4Win. There have been a few changes since version 1, most notable for me is that WinPT is no longer part of the GPG4Win distribution.

Installation

For getting started, I strongly recommend using the gpg4win-light package at first as you probably won't need Kleopatra or the german-only manuals). As for the optional modules, I'd recommend installing GPA and GPGEx at a minimum. Note that GPGOL is still only compatible with Outlook 2003 and Outlook 2007, so you may wish to not install that module if you use other versions of Microsoft Outlook. In addition, you probably won't need Claws Mail at first.

By default, GPG4Win puts your key files under (or wherever your HOMEPATH environment variable points to?):

C:\Documents and Settings\USERNAME\Application Data\gnupg

Make sure you include this location in any backup programs that you are using. Your public and secret keyrings are stored in this folder and need to be backed up regularly.

Public Key Pairs

Now we get into the theoretical realm, GPG now supports RSA signing and encryption keys (in addition to the older DSA for signing and Elgamal for encryption methods). DSA signing keys are limited to 1024 bit lengths, while RSA signing keys can be much longer (512 to 4096 bits are commonly used). The only restriction that you should keep in mind for RSA keys is that you should never sign with the same key that you use for encryption (and vice-versa). In GnuPG v2, the default is now to create (2) RSA keys for the account, one for encryption and one for signing.

Typically, you'll want signing keys to have a very long lifespan (at least 5 years, maybe as long as 20 or more). This allows you to build a much larger web of trust before your key can no longer be used to sign other keys. However, you should really expire your encryption key after a few years. Then, a bit before your encryption key expires, you should add a new encryption subkey to your key with a new expiration date.

Unfortunately, the default creation options in GnuPG will assign the same expiration to both the signing key and the encryption keys. But this can be fixed using the "gpg --edit-key" command.

Creating a GPG key

gpg --gen-key
gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
Your selection?


Unless you have a strong reason to use DSA/Elgamal, you may as well use the defaults in GPG v2 and pick "RSA and RSA".

RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)


If you're creating a key that will expire in the next 5 years, I recommend 2048 bits. For longer durations, you may wish to use 3172 or 4096 bits.

Please specify how long the key should be valid.
0 = key does not expire
= key expires in n days
w = key expires in n weeks
m = key expires in n months
y = key expires in n years
Key is valid for? (0)


For an initial key where you're not protecting anything super critical, I suggest starting with a 25 year (entered as "25y") expiration date. You will be asked to confirm the expiration date (enter "y" to continue).

GnuPG needs to construct a user ID to identify your key.

Real name:


For personal use, I suggest just entering your name (i.e. "Thomas Harold"). But if you're creating a key for corporate/business use, I suggest adding a bit more information in this field to make things easier for others if they have more then one key with similar names. I recommend against using parenthesis in this field as it can be confusing later on. Square brackets "[]", curly braces "{}", or angle brackets "<>" are all good choices to set elements off from each other. Some examples:

Thomas Harold, Acme Inc.
Thomas Harold [Acme]
Thomas Harold
Thomas Harold {Example LTD}

Remember, that this and the next two fields are all public information that will be visible to everyone who uses your public key to send you things, or who uses your signing key to verify a signature.

EMail address:

This is very simple, you should enter the primary email address that you want associated with this key (i.e. "tgh@tgharold.com"). If you need to add additional email addresses, you can do that later using the "gpg --edit-key" command.

Comment:

The comment field is a public field and will be seen by others. I recommend putting website information here, or the full company name, or a combination of the two. Keep in mind that the contents of this field are typically displayed as enclosed in parenthesis, so avoid using parenthesis or brackets/braces here. Some examples:

www.tgharold.com
Acme Corporation - www.acme.corp
Example LTD, www.example.com

You selected this USER-ID:
"Thomas Harold [Acme] (Acme Corporate Sales - www.acme.corp) "

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?


After entering those three values, you will be presented with how it might look to another user. As you can see, the comment gets wrapped in parenthesis while the email address gets presented inside of angled brackets. Once you are satisfied with how it looks, enter "O" for "Okay" to continue.

GnuPG will then pop-up a window that prompts you for a passphrase. This is extremely important. The passphrase that protects your key from unauthorized use is the weakest link of the entire GnuPG encryption chain. Pick something lengthy, yet easy to type, that is extremely difficult for someone to guess or attack. Write it down if you want, but keep that slip of paper secure in a safe or safety deposit box.

You will eventually be presented with something that looks like:

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 2 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: next trustdb check due at 2009-12-16
pub 3200R/AAFA2876 2009-11-21 [expires: 2009-12-16]
Key fingerprint = 0324 917E C27D 2FB0 DDEF ABFA 4DEE 71F0 AAFA 2876
uid Thomas Harold [Acme] (Acme Corporate Sales - www.acme.corp)
sub 3200R/1972B360 2009-11-21 [expires: 2009-12-16]


This means that GnuPG has finished generating your key and has saved it to your keyring. This sample key (both the encryption key and the signing key) will expire Dec 16, 2009.

The key fingerprint is an important piece of information that should be given to your contacts over a secure channel. It will allow them to verify that they have the correct key and that they are not subject to a man-in-the-middle (MitM) attack when they use the key. You can find out the fingerprints of keys in your keyring using the "gpg --fingerprint" command. Typically, you would send them your public encryption key via email or some other digital method while telling them the key's fingerprint over an entirely different medium such as a telephone call or a physical piece of paper (letter / package).

Editing your key

In order to edit your key using GnuPG, you must know the 8-digit key ID. In the above example it is listed on the line that starts with "pub". For example, the key that I just created has a key ID of "AAFA2876":

pub 3200R/AAFA2876 2009-11-21 [expires: 2009-12-16]

In order to edit the key, you will use the following command:

gpg --edit-key aaFa2876

As you can see, the key ID is not case sensitive as it is merely an 8-digit hexadecimal string.

gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Secret key is available.

pub 3200R/AAFA2876 created: 2009-11-21 expires: 2009-12-16 usage: SC
trust: ultimate validity: ultimate
sub 3200R/1972B360 created: 2009-11-21 expires: 2009-12-16 usage: E
[ultimate] (1). Thomas Harold [Acme] (Acme Corporate Sales - www.acme.corp)

Command>


This shows us a bunch of information. The line that starts with "pub" gives us the following information:

pub - indicates that this is the primary key (you will also see "sub"
3200R - this is a 3200 bit RSA key (R=RSA, D=DSA, g=Elgamal)
AAFA2876 - the key ID (or subkey ID)
created: / expire(d|s): - the creation and expiration dates
usage: - indicates how the key can be used (S=sign, E=encrypt)

Useful commands at this point are:

fpr - show key fingerprint
list - list key and user IDs
quit - exit without making changes

Changing the expiration date

By default, all operations will occur to the primary key (the "pub" line) in this keyset. So before you edit a subkey, you need to tell GnuPG to work with that key. These keys are simply numbered 1-N as they are shown in the list.

Command> key 1

pub 3200R/AAFA2876 created: 2009-11-21 expires: 2009-12-16 usage: SC
trust: ultimate validity: ultimate
sub* 3200R/1972B360 created: 2009-11-21 expires: 2009-12-16 usage: E


This puts an asterisk by the "sub*" line telling us that we're going to work on the subkey with ID "1972B360".

Command> expire
Changing expiration time for a subkey.
Please specify how long the key should be valid.
0 = key does not expire
= key expires in n days
w = key expires in n weeks
m = key expires in n months
y = key expires in n years
Key is valid for? (0) 6m
Key expires at 05/19/10 20:28:31 Eastern Daylight Time
Is this correct? (y/N) y

You need a passphrase to unlock the secret key for
user: "Thomas Harold [Acme] (Acme Corporate Sales - www.acme.corp) "
3200-bit RSA key, ID AAFA2876, created 2009-11-21

pub 3200R/AAFA2876 created: 2009-11-21 expires: 2009-12-16 usage: SC
trust: ultimate validity: ultimate
sub* 3200R/1972B360 created: 2009-11-21 expires: 2010-05-20 usage: E


As you can see, the subkey's expiration date changed from "2009-12-16" to "2010-05-20". If we had wanted to change the primary key's expiration date, we would've entered "key 0" then "expire" at the "Command>" prompt.

Once you are happy with the new expiration dates, enter "save" to save and quit the key editor.

Adding another User ID to the key

Let's say that you want to add a second email address to your key pairs. As before, you're going to use the "gpg --edit-key" command to do this.

gpg --edit-key AaFa2876

Then you'll issue the "adduid" command.

Command> adduid
Real name: Thomas Harold [Example]
Email address: tgh@example.com
Comment: www.example.com
You selected this USER-ID:
"Thomas Harold [Example] (www.example.com) "

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O


Your key will now look like:

pub 3200R/AAFA2876 created: 2009-11-21 expires: 2012-11-20 usage: SC
trust: ultimate validity: ultimate
sub 3200R/1972B360 created: 2009-11-21 expires: 2010-05-20 usage: E
[ultimate] (1) Thomas Harold [Acme] (Acme Corporate Sales - www.acme.corp)
[ unknown] (2). Thomas Harold [Example] (www.example.com)


Now that we have two User IDs associated with this key, we should flag one of them as the primary.

Command> uid 2
Command> primary
Command> uid 0

pub 3200R/AAFA2876 created: 2009-11-21 expires: 2012-11-20 usage: SC
trust: ultimate validity: ultimate
sub 3200R/1972B360 created: 2009-11-21 expires: 2010-05-20 usage: E
[ultimate] (1) Thomas Harold [Example] (www.example.com)
[ultimate] (2). Thomas Harold [Acme] (Acme Corporate Sales - www.acme.corp)


The asterisk by the number in parenthesis is the currently selected user ID. If you see a dot/period after the number in parenthesis, that indicates which user ID is the primary.

Backing up your key

The following command allows you to export your secret key to an ASCII armored text file.

gpg -a --export-secret-keys aafa2876 >> my-secret-key.asc

You should also export your currently usable public encryption key.

gpg -a --export aafa2876 >> my-public-key.asc

You should print these files out as well as keeping an electronic copy in a secure location such as a safe or safe-deposit box. Don't leave the secret key ASCII file laying around. A sealed security envelope with a phrase and the current date written across the sealed flap and then covered with transparent tape is a good countermeasure to detect tampering.

Monday, November 09, 2009

CentOS 5, ClamAV 0.95 and /etc/sysconfig/clamav

Trying to configure the new ClamAV 0.95 as a milter for our Postfix install this week. So I've been doing some digging into the configuration files. Here's what I've found so far.

In order to get the newer ClamAV for Red Hat Enterprise Linux 5 (RHEL5) and CentOS 5, I had to use the RPMForge repository in order to get the 0.95 version.

The old clamav-milter package is outdated and should not be installed (use the newer clamav 0.95 or later package).

The /etc/rc.d/init.d/clamav script is still from 2008 and is very old. It references /etc/sysconfig/clamav which has an outdated setting called "CLAMAV_MILTER=yes". In ClamAV 0.95+, the milter was rewritten and now uses a configuration file (/etc/clamav-milter.conf) instead of command-line arguments. The init.d script that manages the clamd daemon is still for the older milter. It works fine for starting and stopping clamd, but you should not use the "CLAMAV_MILTER=yes" setting in the sysconfig file.

If you were using the /etc/sysconfig/clamav file to turn on the milter in RHEL5, then you will probably see the following error when you upgrade to ClamAV 0.95 or later:

Starting clamav-milter: clamav-milter: unrecognized option `--max-children=10'
ERROR: Unknown option passed
ERROR: Can't parse command line options


You'll need to convert your old command line options into configuration file options.

Thursday, October 22, 2009

Flash memory price check

Flash memory prices have finally dropped below $2.00/GB (around $1.90 at the moment).

SDHC cards:

$7 2GB
$9 4GB
$16 8GB
$30 16GB
$80 32GB

The sweet spot right now is $30 for 16GB.

What we see is that at the lower end of the scale, there's a minimum price point. Manufacturers don't like to sell units below $6-$7 or maybe retailers don't like to stock units below that price point. You'll see something similar in Hard Drive prices where there are very few drives below $40-$50 on the market.

SSDs (MLC)

$95 32GB
$145 64GB
$280 128GB
$680 256GB

The sweet spot is either 64GB for $145 or 128GB for $280. SSDs are still a bit above the $2/GB price point. Probably due to the extra circuitry and packaging of a 2.5" SSD.

Hopefully, by this time next year they'll break the $1/GB mark. Magnetic 2.5" drives are down around $0.17/GB for 500GB drives.

Monday, September 07, 2009

SNMP: Finding OIDs and MIBs

The key tool in the toolbox for exploring MIBs and finding things in SNMP is either "snmpwalk" or looking at the actual MIB text definitions. On CentOS 5 (and RHEL 5), the net-snmp package installs a default set of MIBs to "/usr/share/snmp/mibs/".

# snmpwalk -v 2c -c public localhost diskIONReadX

That particular command uses version "2c" of the SNMP protocol to talk to the "public" community on the localhost and looks for "diskIONReadX" (which is a 64bit counter value column from the diskIOTable).

# snmptranslate -m +ALL -IR -Td diskIONReadX

Here, we use "snmptranslate" to report on full details (-Td) of the diskIONReadX property. When looking up SNMP attributes by labels, you'll want to use the above format, but you can change "-Td" to other "-T" options or a "-O" option. Some common choices are:

# snmptranslate -m +ALL -IR -Td diskIONReadX
UCD-DISKIO-MIB::diskIONReadX
diskIONReadX OBJECT-TYPE
-- FROM UCD-DISKIO-MIB
SYNTAX Counter64
MAX-ACCESS read-only
STATUS current
DESCRIPTION "The number of bytes read from this device since boot."
::= { iso(1) org(3) dod(6) internet(1) private(4) enterprises(1) ucdavis(2021) ucdExperimental(13) ucdDiskIOMIB(15) diskIOTable(1) diskIOEntry(1) 12 }

# snmptranslate -m +ALL -IR -On diskIONReadX
.1.3.6.1.4.1.2021.13.15.1.1.12

# snmptranslate -m +ALL -IR -Of diskIONReadX
.iso.org.dod.internet.private.enterprises.ucdavis.ucdExperimental.
ucdDiskIOMIB.diskIOTable.diskIOEntry.diskIONReadX

# snmptranslate -m +ALL -IR -Ou diskIONReadX
enterprises.ucdavis.ucdExperimental.ucdDiskIOMIB.diskIOTable.
diskIOEntry.diskIONReadX

Monday, August 03, 2009

Second Copy 7 vs Samba v3

One of the tools that we use on our desktop machines is Second Copy 7, which is a very useful tool for doing file-level backups that are user friendly. It has a mode where it mirrors the source directory tree to the remote location, along with putting older copies of the files in a second remote location.

However, if things are strange, you'll find that Second Copy will end up making repeated copies of files in the "older copies" location every time the profile runs.

The primary problem that causes this is if the Windows desktop's clock does not exactly match the server's clock. You will see this problem frequently if you use "time.windows.com" as your clock source. (In Windows XP; Control Panel -> Date and Time -> Internet Time tab.) The "time.windows.com" clock source is generally horribly inaccurate compared to the time that your Linux boxes running Samba get their time from (usually from the pool.ntp.org servers).

So the solution is either to sync your Windows boxes to a better clock source (such as "us.pool.ntp.org" or an internal NTP time server), or to adjust Second Copy to be much more tolerant of time differences. SC's default is a 2 sec time difference allowance. You may wish to increase this to as much as 30 or 60 seconds. This is a hidden option in the Second Copy profiles.dat file.

Setting up a Linux box to poll the pool.ntp.org servers and provide time to the internal network is a much preferred solution. You can also setup Samba to provide time to clients that belong to your domain.

References:

Q10169 - INFO: How does Second Copy handle file time stamps when copying the files between different file systems?

Addendum:

- After a bit of playing around with Samba options, I finally gave up and increased the "IgnoreTimeDifference=N" value under [Options] in the "profiles.dat" file from 2 seconds to 15 seconds. The Windows XP desktop machine, even though it was getting its time from the Linux Samba server, wasn't staying within 2 second variation. But after loosening up the time to 15 seconds, things are working much better.

- If your Windows boxes are actually joined to the Samba Domain as client machines (only possible with Windows XP Pro, or the pro/business versions of Vista/Win7), then they might keep better synchronization with the Samba server's time.

- I'm pretty sure that the problem was not due to my referencing the backup location using UNC naming (i.e. \\servername\share\path).

- This issue mostly comes into play when you are backing up from one machine to another (such as a share location on another desktop or a server share). This is not something that you'll normally run into if you're backing up to a drive hooked directly to the machine.

Wednesday, July 22, 2009

3ware SATA RAID - Reboot and Select proper Boot device

Reboot and Select proper Boot device
or Insert Boot Media in selected Boot device and press a key


In the process of setting up my 15-disk array, I kept encountering the above error message while attempting to boot into CentOS 5 off of the 3ware 9650SE array. If I turned off the offending drive (or removed it), the error went away and CentOS would boot properly.

This particular error message seems to be generated by an MS-DOS / Windows boot loader on the hard drive. Some motherboards seem to prefer this particular MBR, rather then loading GRUB/LILO as desired. So, when scanning the drives, they tend to fixate on whatever boot loader they find first.

If you hook the machine to a regular SATA port (or use an external USB device), you can overwrite the first few gigabytes of the drive with zeroes. At which point, adding the drive to the array will no longer cause the problem. Attempting to remedy the issue while the drive is connected to the 3ware controller will not work as the 3ware controller seems to hide the first few cylinders of the drive.

(It might work in JBOD mode, but you can't setup a JBOD disk on the fly using the tw_cli command on a 9650SE controller. So an external USB enclosure / tray / drive toaster is the best route.)

Updates:

- After playing with a little more, it seems to be a BIOS issue where the BIOS gets confused if the 3ware controller presents more then 12 units to the BIOS.

- The 3ware controller does not "hide the first few cylinders of the drive". Instead, it seems to store it's metadata for the drive at the end. Aside from the problem of losing data in the last few cylinders, you can take a drive configured as a "single-unit" and hook it up to a regular SATA controller with no issues.