Learn about the technologies behind the Internet with The TCP/IP Guide!
NOTE: Using robot software to mass-download the site degrades the server and is prohibited. See here for more.
Find The PC Guide helpful? Please consider a donation to The PC Guide Tip Jar. Visa/MC/Paypal accepted.
View over 750 of my fine art photos any time for free at DesktopScenes.com!

[ The PC Guide | Systems and Components Reference Guide | Hard Disk Drives | Hard Disk Logical Structures and File Systems | Disk Compression ]

Why Disk Compression Works

Disk compression takes advantage of (at least) two characteristics of most files. First is the fact that most files have a large amount of redundant information, with patterns that tend to repeat. By using "placeholders" that are smaller than the pattern they represent, the size of the file can be reduced.

For example, let's take the sentence "In fact, there are many theories to explain the origin of man.". If you look closely, you will see that the string " the" (space plus "the") appears in this sentence three times. Compression software can replace this string with a token, for example "#", and store the phrase as "In fact,#re are many#ories to explain# origin of man.". Then, they reverse-translate the "#" back to " the" when the file is read back. Further, they can replace the string " man" with "$" and reduce the sentence to "In fact,#re are$y#ories to explain# origin of$.". Just replacing those two patterns reduces the size of the sentence by 24%, and this is just a simple example of what full compression algorithms can do, working with a large number of patterns and rules.

The other characteristic of many files that disk compression makes use of is the fact that while each character in a file takes up one byte, most characters don't require the full byte to store them. Each byte can hold one of 256 different values, but if you have a text file, there will be very long sequences containing only letters, numbers, and punctuation. Compression agents use special formulas to pack information like text so that it makes full use of the 256 values that each byte can hold.

The combination of these two effects results in text files often being compressed by a factor of 2 to 1 or even 3 to 1. Data files can often be compressed even more: take a look at some spreadsheet or database files and you will find long sequences of blanks and zeros, sometimes hundreds or thousands in a row. These files can often be compressed 5 to 1, 10 to 1 or even more.

Finally, compression is also useful in battling slack. If you have 1,000 files on a hard disk that uses 16,384 byte clusters, and each of these files is 500 bytes in size, you are using 16 MB of disk space to store less than 500 KB of data. The reason is that each file must be allocated a full cluster, and only 500 of the 16,384 bytes actually has any data--the rest is slack (97%!) If you put all of those files into a compressed file like a ZIP file, not only will they probably be reduced in size greatly, but the ZIP file will have a maximum of 16,383 bytes of slack by itself, resulting in a large amount of saved disk space. The advanced features of DriveSpace 3 volume compression will in fact reduce slack even if file compression isn't enabled.

Next: Compression Types


Home  -  Search  -  Topics  -  Up

The PC Guide (http://www.PCGuide.com)
Site Version: 2.2.0 - Version Date: April 17, 2001
Copyright 1997-2004 Charles M. Kozierok. All Rights Reserved.

Not responsible for any loss resulting from the use of this site.
Please read the Site Guide before using this material.
Custom Search