This is the mail archive of the cygwin-apps@cygwin.com mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: File handling in setup.exe

To: <cygwin-apps at cygwin dot com>
Subject: Re: File handling in setup.exe
From: "Robert Collins" <robert dot collins at itdomain dot com dot au>
Date: Fri, 5 Oct 2001 11:51:03 +1000
References: <3BBD05EB.2357D53A@etr-usa.com> <20011004212030.C1118@redhat.com>

----- Original Message -----
From: "Christopher Faylor" <cgf@redhat.com>
To: <cygwin-patches@cygwin.com>
Cc: <cygwin-apps@cygwin.com>
Sent: Friday, October 05, 2001 11:20 AM
Subject: Re: File handling in setup.exe

> FWIW, I really like what you've proposed.  It feels right.

Ditto.

> Although, I guess we should wait for a little more input first.

I'm got some - inline below.

> >This implies some kind of link between archive handling and the
current
> >NetIO hierarchy.  This would also require changes to geturl.cc and
the
> >code that calls functions in geturl.cc.  The foremost issue is,
should I
> >be chasing this at all, or should I simply refactor the tar handling
> >mechanism as it exists right now?

I think that refactoring the tar handling is really just bit twidling.
IMO bringing it all together, and _then_ handling the magic number issue
can be done cleanly.

> >I assume that reading packages from the network would be useful for
> >allowing setup.exe to install directly from the network, without
writing
> >the packages out to disk first as it does today.  Yet, we need to
keep
> >that "caching" mechanism somehow, because it's useful.  Currently,
file
> >handling logic exists in geturl.cc, nio-file.cc, tar.cc, and probably
> >other places.  To deal with all that, I have in mind something like
> >this:
> >
> >class Source {
> >public:
> > Source(out_pathname);
> > virtual int read(buffer, size);
> > virtual int write(buffer, size);
> >
> > ...
> >private:
> > Source() { } // can't create Source objects directly
> >
> > FILE* fp_out;
> >};
> >
> >class HTTPSource : public Source {
> >public:
> > HTTPSource(in_url, out_pathname = 0);
> > ...
> >};
> >

All good...

> >By default, Source reads data from a file and has the option to cache
> >the data it reads out to another file.  (If out_pathname == 0, the
data
> >isn't cached to a file as it's read.)  Subclasses override the
> >constructor and read() to retrieve data from various network sources.
> >(HTTP, FTP, WinInet.dll, etc.)  When reading straight from a file,
you
> >would set the Source to non-cacheable, but when reading via HTTP, you
> >could elect to either cache the data to a file, or simply read the
data
> >in without caching it.
> >
> >This implies a fairly major refactoring all by itself.  As I stated
> >above, there's a lot of code that assumes that it can write data out
to
> >disk and read it back.  My proposal would mean that everything deals
> >with Source objects.  Because the data may not be cached, you'd want
to
> >keep the data pipeline simple: in the HTTP case, you'd read the data
> >from the network, pass it to the gz/bz unpacker, and pass that stream
to
> >the tar file unpacker.  That is, go from initial network connection
open
> >to final unpacking, all in one operation.

Here's the bit I want to comment on: I think this got missed from the
prior discussion: (If it didn't, and is simply wrong/not logical, feel
free to say so!).

Let me restate what you've said to be sure I understand you correctly:
You're proposing something like

read from Source
write to Decomp
Read from Decomp
write to Archive
while nextfilename()
  read from archive
  write to filename
wend

(sure this could be written as
foo = new source (...)
bar = new decomp (foo)
new archive (bar)
) but thats a presentation thing, not really important.

I don't like this, because each of the three classes all perform read
and write. (and Archive is the only one of them is able to generate
multiple streams - as it should be :]).

I propose the following modification to your class hierarchy.
Class Stream {
public:
  /* create a new stream from an existing one - used to get decompressed
data
   * or open archives.
   * will return NULL if there is no sub-stream available (ie (peek()
didn't
   * match any known magic number) && nextfilename () = NULL
   */
  static Stream * factory (Stream *);
  /* read data (duh!) */
  virtual ssize_t read(void *buffer, size_t len);
  /* provide data to (double duh!) */
  virtual ssize_t write(void *buffer, size_t len);
  /* read data without removing it from the class's internal buffer */
  virtual ssize_t peek(void *buffer, size_t len);
  /* Find out the next stream name -
   * ie for foo.tar.gz, at offset 0, next_file_name = foo.tar
   * for foobar that is an archive, next_file_name is the next
extractable filename.
   */
  virtual const char* next_file_name() = NULL;
};

So Source becomes:
class Source : Stream {
public:
 Source(out_pathname);
...

and likewise for Archive and Decomp.

This minor change will immediately allow archives-within-archives,
double-compressed-files, and whathaveyou - without hacing to code to
handle that.

Rob

References:
- Re: File handling in setup.exe
  - From: Christopher Faylor

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]