This is the mail archive of the cygwin-apps mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

on supporting lzma for packages


Based on some early movement in other projects[1] I decided to take a
look at what we might potentially gain by supporting lzma in setup.exe
and using it for packages.  The motivating factor here is that lzma
compresses better than bzip2 while at the same time having faster
decompression[2], so on paper it's nearly a complete win.

Of course, the problem is that if we start uploading .tar.lzma files
onto the mirrors then we implicitly require users to be using a version
of setup.exe new enough to contain this support, which means a forced
upgrade.  Obviously we have to move slowly with these sorts of changes,
so I don't see us being able to actually start uploading lzma packages
for some time.  However, to me what this says is that it's important to
get that support into the setup code sooner rather than later, so that
at the point where we might consider actually using it, there have been
enough releases of setup.exe that most users will have upgraded.

But all of this fuss only makes sense if there is actually a worthwhile
gain, which is still not clear.  So I did the following experiment of
recompressing the entire[3] distro with it, with the following results:

                   .tar     .tar.bz2         .tar.lzma
                 +-------------------------------------------
1083 binary pkgs | 3209 MB   877 MB (27.3%)  720 MB (22.4%)
 549 source pkgs | 2266 MB  1002 MB (44.2%)  947 MB (41.8%)
   w/deep repack |                           768 MB (33.9%)

The reduction in binary package size compared to bz2 is about 5% which
is modest, but coupled with faster decompression speed it starts to look
attractive.

The compression savings for source code are potentially greater, however
this is complicated by the fact that that source packages made by
cygport or the generic build script (as opposed to the older "prepatched
tree with reverse patch in CYGWIN-PATCHES/ method) consist of just a
.patch and .sh file along with an inner .tar.bz2 file.  Just
recompressing the outer tar does not leave much opportunity for gain
with those packages, so the very slight savings in the second row (1002
MB -> 947 MB) were mainly due entirely to the few large packages that
use the prepatched-tree method (like Cygwin itself.)

But if you consider the possibility of using lzma for the inner tar file
as well, things look a lot better, and that is what the third row shows:
a roughly 10% reduction in source package size.  In practice this is
harder to achieve since it would require maintainers to patch all those
old copies of the g-b-s, or else move to cygport (which would also have
to be updated, but at least that's just one place.)

So to summarize: binary - modest but measurable gain; source -
potentially bigger gain if tools are updated; both potentially also
benefit from faster decompress speed.

I'm not sure yet what conclusion to draw from this, but as I said before
adding lzma support to setup is better done now rather than later --
even if we have no intention of using it at present -- so that it's
there if we want it later.  So far I haven't been impressed with the
portability of the lzma code, as it seems to be just a lone directory
that is grafted out of the 7zip program's source tree, but if free time
permits I intend to see what would be involved with importing it to the
setup codebase.

And again let me just emphasize that this is nothing but a preliminary
kicking of the tires and solicitation for comments -- I don't want to
imply that we should or could make any changes yet.

Brian


[1] Recent automake has added support for dist-lzma, and coreutils
snapshots are now being offered in lzma format.

[2] from lzma(1): 

       lzma  provides  notably better compression ratio than bzip2 espe-
       cially with files having other than plain text content. The other
       advantage  of  lzma  is  fast  decompression  which is many times
       quicker than bzip2. The major disadvantage is that achieving  the
       highest  compression  ratios  requires extensive amount of system
       resources, both CPU time and RAM. Also software  to  handle  LZMA
       compressed  files  is  not installed by default on most distribu-
       tions.

[3] I didn't actually do the entire distro, but instead took the two
most recently modified .tar.bz2 files from each directory (normally
bin+src of one version) in order to not bias the measurement towards
those packages that have a million versions sitting on the mirror.  This
still resulted in about 1600 .tar.bz2 files totaling around 1.6 GB, so
it's still a large scale test.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]