# Mark a directory for compression
setfattr -h -v 0x00000800 -n system.ntfs_attrib_be directory-name
# On small-endian computers when above is not possible
setfattr -h -v 0x00080000 -n system.ntfs_attrib directory-name
# Disable compression for files to be created in a directory
setfattr -h -v 0x00000000 -n system.ntfs_attrib directory-name
NTFS compression is based on the public domain algorithm LZ77 (Ziv
and
Lempel,
1977). It is
faster than most widely used compression methods, and does not require
to decompress the beginning of the file to access a random part of it,
but its compression rate is moderate.
The file to compress is split into 4096 byte blocks, and compression
is applied on each block independently. In each block, when a
sequence of three bytes or more appears twice, the second occurrence is
replaced by the position and length of the first one. A block can thus
be decompressed, provided its beginning can be located, by locating the
references to a previous sequence and replacing the references by the
designated bytes.
If such a block compresses to 4094 bytes or less, two bytes
mentioning the new size are prepended to the block. If it does not, the
block is not compressed and two bytes mentioning a count of 4096 are
prepended.
Several compressed blocks representing 16 clusters of uncompressed
data are then concatenated. If the total compressed size is 15 clusters
or less, the needed clusters are written and marked as used, and the
remaining ones are marked as unneeded. If they only contain zeroes,
they are all marked as unneeded. If 16 or 17 clusters are
needed, no compression is done, the 16 clusters are filled with
uncompressed data. The cluster size is defined when formating the
volume (generally 512 bytes for small volumes and 4096 for big volumes).
Only the allocated clusters in a set of 16 or less are identified in
the
allocation tables, with neighbouring ones being grouped. When seeking
to a random byte for reading, the first
cluster in the relevant set is directly located. If the set is found to
contain 16
allocated clusters, it is not compressed and the requested byte is
directly
located. If it contains 15 clusters or less, it contains blocks of
compressed data, and the first couple of bytes of each block indicates
its compressed size, so that the relevant block can be located, it has
to be decompressed to access the requested byte.
When ntfs-3g appends data to a compressed file, the data is first
written uncompressed, until 16 clusters are filled, which implies
the 16 clusters are allocated to the file. When the set of 16 clusters
is
full, data is read back and compressed. Then, if compression if
effective, the needed clusters are written again and the unneeded ones
are deallocated.
When the file is closed, the last set of clusters is compressed, and
if the file is opened again for appending, the set is decompressed
for
merging the new
data.
To report any problem, please post to the support forum
hosted by Tuxera