Wednesday, February 20, 2008

High File Compression

I've been toying with 7-zip yesterday to see how far I could compress a file.

From the tests I did, 7-zip is capable of doing 7088:1 compression ratio. Here's the run:
mj@mj-evil-station:~/share$ dd bs=1073741824 count=10 if=/dev/zero | 7z a -mx9 -si haha.zip

7-Zip 4.43 beta Copyright (c) 1999-2006 Igor Pavlov 2006-09-15
p7zip Version 4.43 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Updating archive haha.zip

Compressing [Content] 0%
10+0 records in
10+0 records out
10737418240 bytes (11 GB) copied, 2465.64 seconds, 4.4 MB/s

mj@mj-evil-station:~/share$ ls -l
-rw-r--r-- 1 mj mj 1514800 2008-02-19 13:03 haha.zip

As you can see, I stuffed 10GB (not 11, notice the block size parameter) into a 1.4MB file. This was on a laptop. At home, my dual-core box stuffed 100GB into 14.4MB in 3 hours, 39 minutes & 30 seconds. (while watching a video file, consuming some CPU cycles)

I could probably optimize it more, by tweaking parameters of 7-zip, unfortunately it requires a lot of RAM, which I don't have (limited to 2GB only).

The tests done were simply stuffing zeroes in a text file, and then compressing the file. This is an ideal case, since all characters are the same, hence producing the highest rate of compression (provided you tweak your program to the max).

I have a couple of machines at work that are quad-core with 6GB of RAM, unfortunately they have Windows installed, so I don't have a device to pump zeroes into 7-zip directly, instead, I have to create a file of the size I need, then have 7-zip compress it; quite inconvenient.

No comments: