This is the mail archive of the cygwin-patches@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

File handling in setup.exe


This is regarding the *.cwp stuff that was discussed last month.  It was
agreed that my initial patch had good ideas, but that as long as I was
in there, I might as well clean up the code some.  I've looked into the
code, and have realized that I need some input before proceeding.

My initial idea when I agreed to take this on was to just refactor and
OOP-ify the code around tar.cc some.  I can do that, but some comments
from Robert Collins got me on the track of looking into handling
alternate sources for package files.

This implies some kind of link between archive handling and the current
NetIO hierarchy.  This would also require changes to geturl.cc and the
code that calls functions in geturl.cc.  The foremost issue is, should I
be chasing this at all, or should I simply refactor the tar handling
mechanism as it exists right now?  

If we want a Grand Refactoring and not just some reworking of tar.cc and
friends, here's my proposal:

I assume that reading packages from the network would be useful for
allowing setup.exe to install directly from the network, without writing
the packages out to disk first as it does today.  Yet, we need to keep
that "caching" mechanism somehow, because it's useful.  Currently, file
handling logic exists in geturl.cc, nio-file.cc, tar.cc, and probably
other places.  To deal with all that, I have in mind something like
this:

class Source {
public:
	Source(out_pathname);
	virtual int read(buffer, size);
	virtual int write(buffer, size);

	...
private:
	Source() { }	// can't create Source objects directly

	FILE* fp_out;
};

class HTTPSource : public Source {
public:
	HTTPSource(in_url, out_pathname = 0);
	...
};


By default, Source reads data from a file and has the option to cache
the data it reads out to another file.  (If out_pathname == 0, the data
isn't cached to a file as it's read.)  Subclasses override the
constructor and read() to retrieve data from various network sources.
(HTTP, FTP, WinInet.dll, etc.)  When reading straight from a file, you
would set the Source to non-cacheable, but when reading via HTTP, you
could elect to either cache the data to a file, or simply read the data
in without caching it.

This implies a fairly major refactoring all by itself.  As I stated
above, there's a lot of code that assumes that it can write data out to
disk and read it back.  My proposal would mean that everything deals
with Source objects.  Because the data may not be cached, you'd want to
keep the data pipeline simple: in the HTTP case, you'd read the data
from the network, pass it to the gz/bz unpacker, and pass that stream to
the tar file unpacker.  That is, go from initial network connection open
to final unpacking, all in one operation.

This implies two other class hierarchies:

class Decomp {	// a cleaned-up version of class gzbz from tar.cc
public:
	// this is decomp_factory(), from my original patch
	static Decomp* factory(Source*)

	~Decomp();		// gzbz::close()

	virtual int read(buf, len) = 0;
	virtual off_t tell() = 0;

protected:
	FILE* fp;

private:
	Decomp(Source*);
};

class GZDecomp : public Decomp ...
class BZDecomp : public Decomp ...


class Archive {
public:
	Archive(Decomp*);

	virtual int read(buf, len) = 0;
	virtual off_t tell() = 0;
	virtual const char* next_file_name() = 0;
};

class TarArchive : public Archive ...
class RPMArchive : public Archive ...


These are just "sketches" to give you an idea of where I'm headed with
all this.  Don't worry about critiquing the actual member names or even
the minor structures I've sketched out.  The main thing is the class
chain structure I've sketched.

As you can see, you create a Source object to retreive (and optionally
cache) the data, then you create a Decomp object to read data from the
Source and decompress it, and finally an Archive object to parse the
data from the Decomp object, extracting files and other things found in
tar/rpm/deb/whatever files.

The get_url_*() functions can't exist in this scheme.  They only know
how to read files in from what I'm calling Sources.  I haven't traced
the code out beyond the get_url_* functions to find out how the data
within the archives is dealt with.  My idea, however, is to make all
that code look something like this:

	// Given the URL, the options the user picked, and whether
	// we have the file locally already or not, create a Source 
	// subclass to read the archive in.
	Source* source = open_source(url);

	Archive* arch = new Archive(Decomp::factory(source));
	while (arch) {
		munch on archive, update UI, spit files out to disk...
	}
	delete arch;	// closes cache file (if any) as well
			// as network connections, etc.

I'm leaving the issue here until I hear back from the people whose
opinions matter.  :)  I don't want to jump in and start all this rework
if this idea is somehow broken, or simply too grandiose w.r.t. where
people want to see setup.exe go.

I'm thinking this will take a week of ideal hacking time, which is a lot
considering that I'm doing all this in my spare time here at work.  In
real terms, this may take a month or more.
--
'Net Address: http://www.cyberport.com/~tangent/
ICBM Address: 36.8274040 N, 108.0204086 W, alt. 1714m


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]