Our backup script for pgsql dumps the databases out in plain text SQL format (my preferred method for a variety of reasons). The question was, do we leave it as plain text and/or which compressor do we use?
...
Here are sample times for a plain-text SQL backup.
real 3m30.523s
user 0m14.053s
sys 0m5.132s
The raw database cluster is 22GB (22150820 KB), but that includes indexes. The resulting size of the backups is 794MB (812436 KB). The specific command line used is:
pg_dump -a -b -O -t $s.$t -x $d -f $DIR/$d/$s.$t.sql
($s, $t, $d and $DIR are variables denoting the schema, table, database, and base backup directory)
...
gzip (default compression level of "6")
pg_dump -a -b -O -t $s.$t -x $d | gzip -c > $DIR/$d/$s.$t.sql
CPU usage is pegged at 100% on one of the four CPUs in the box during this operation (due to gzip compressing the streams). So we are bottlenecked by the somewhat slow CPUs in the server.
real 3m0.337s
user 2m17.289s
sys 0m6.740s
So we burned up a lot more CPU time (user 2m 17s) compared to the plain text dump (user 14s). But the overall operation still completed fairly quickly. So how much space did we save? The resulting backups are only 368MB (376820 KB), which is a good bit smaller.
(It would be better, but a large portion of our current database is comprised of the various large "specialized" tables, which are extremely random and difficult to compress. I can't talk about the contents of those tables, but the data in them is generated by a PRNG.)
...
gzip (compression level of "9")
pg_dump -a -b -O -t $s.$t -x $d | gzip -c9 > $DIR/$d/$s.$t.sql.gz
We're likely to be even more CPU-limited here due to telling gzip to "try harder". The resulting backups are 369MB (376944 KB), which is basically the same size.
real 9m39.513s
user 7m28.784s
sys 0m12.585s
So we burn up 3.2x more CPU time, but we don't really change the backup size. Probably not worth it.
...
bzip2
pg_dump -a -b -O -t $s.$t -x $d | bzip2 -c9 > $DIR/$d/$s.$t.sql.bz2
real 19m45.280s
user 3m52.559s
sys 0m11.709s
Interesting, while bzip2 is about twice as slow as gzip (the default compression level), it didn't perform as badly as the maximum compression option of gzip. The resulting backup files are only 330MB (337296 KB), which is a decent improvement over the gzip compression level.
Now, the other interesting thing is that bzip2 took a lot longer to run then gzip, but the server is pretty busy at the moment.
...
Ultimately, we ended up going with bzip2 for a variety of reasons.
- Better compression
- The additional CPU usage was not an issue
- We could change to a smaller block size (-c2) to be more friendly to rsync
No comments:
Post a Comment