This is the mail archive of the cygwin-apps mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch/rebase] Add a rebase database to keep track of DLL addresses


On 7/6/2011 4:41 AM, Corinna Vinschen wrote:
> On Jul  5 15:38, Charles Wilson wrote:
>> img_info_cmp: what if a or b is NULL? do we care?  Ditto for a->name,
>> b->name -- POSIX doesn't mandate the behavior of strcmp if args are null...
> 
> Neither the a/b pointers, nor the name strings can become NULL.

So, the "protection" is algorithmic -- you simply ensure that this
function is not used in cases where those pointers are null.

(And there ARE cases where they CAN be null: e.g. the newly-allocated
entries at the tail of the list -- since you allocate in rounded-up
chunks of 100(?); the name pointers are, of course, explicitly SET to
null when reading data in from the db, prior to reading and allocating
storage for the strings themselves, etc).

>> Since the image bases are stored (in our database and data structures)
>> as ULONG, how do we support 64bit systems and rebasing above 4G?  Or is
>> that fodder for a future patch?
> 
> I was thinking about that.  First of all, imagehelper does not support
> 64 bit files so far.  What we need is a ReBaseImage64 implementation.
> Alternatively, we use Windows ReBaseImage64 and drop imagehelper.

Well, isn't the whole reason for imagehelper that ReBaseImage(32) wasn't
sufficient?  According to imagehelper's README:

The main reasons for this rewrite are, that
1. WinMe does not support the RebaseImage()
2. the recent binutils strip command corrupts dll's linked with
debugging informations.

OK, so #1 doesn't apply anymore (technically msys and MinGW still kinda
maybe support Win9x [*] in the "if you're lucky it'll work but don't
come crying to us if it doesn't").  I'm not really sure what the story
is behind #2...

[*] Officially, mingw compilers 4.x and above only support WinNT+. If
they work on 9x, great, but no guarantees.  Same with (current) msys,
since its purpose is solely to support the mingw compiler: msys is based
on a cygwin from an era when cygwin DID support 9x, but we're not too
fussed if we've broken that support.

Anyway, now imagehelper uses cygwin pathname conversion functions and
the like, so it does a bit more than simply re-implementing ReBaseImage.

> Without this decision first, I didn't want to clutter the database
> file format with stuff it doesn't need.

OK.

> What's more, the rebase addresses of 32 and 64 bit DLLs have nothing to
> do with each other.  Since they never share the same VM space, they are
> disjunct.  I mulled over a combined database format for both versions,
> but it doesn't make sense.  I think, what we should do is to keep two
> database files, one for 32 and one for 64 bit DLLs, and handle them
> independently.  Change the magic number and you're all set.

That sounds reasonable. 'Course, you're still talking about a single
*program* handling both, aren't you (maybe run twice, in two separate
'modes'?)  Or would you actually need a *64bit* rebase.exe to handle the
64bit dlls, and a 32bit one to handle the 32bit dlls?

>> Perhaps something like...
>>
>> #define xstr(s) str(s)
>> #define str(s) #s
>> #if defined __CYGWIN__ || defined __MSYS__
>> # define IMG_INFO_FILE xstr(SYSCONFDIR) "/rebase.image_info"
>> # char tmp_file[] = xstr(SYSCONFDIR) "/rebase.image_info.XXXXXX";
>> #else
>> /* psuedo-relocatable for native mingw build */
>> # define IMG_INFO_FILE "/../etc/rebase.image_info"
>> # char tmp_file[] = xstr(SYSCONFDIR) "/../etc/rebase.image_info.XXXXXX";
> 
> I don't understand.  What exactly is "pseudo-relocatable" in this
> drive-absolute expression?  I added that to my patch, though.

cut-n-paste error on my part. The last two lines should have read

# define IMG_INFO_FILE "/../etc/rebase.image_info"
# char tmp_file[] = "/../etc/rebase.image_info.XXXXXX";

e.g. both paths are *fragments* in the mingw case -- specifying the bit
that should be tacked on to the directory of the exe (found via
GetModuleFilePath).  I did not show where that path manipulation step
should be done -- 'cause I don't know yet. <g>

>> ...with appropriate -DSYSCONFDIR flags added in Makefile.in (for the
>> cygwin|msys case), as well as a bit of code elsewhere, to prepend the
>> installation directory of rebase.exe itself for the mingw case.
> 
> That's something which somebody else will have to add.

OK. I'll take a look after your patch is committed.

>> Or maybe, for the mingw case, the database should be in cwd()? or
>> directly in same directory as rebase.exe?
> 
> That's something which somebody else will have to add.  I really don't
> care at all for the mingw case.

Maybe so, but it's not correct to apply a patch to ANY cross-platform
tool that deliberately -- or uncaringly -- breaks one of those other
platforms.  It is the responsibility of contributor to ensure (or at
least make best effort to ensure, *without* actually compiling or
testing on those other targets) that her patch First, Does No Harm --
even to platforms she doesn't care about.

Otherwise, the patch should be rejected absent specific agreement
otherwise by the maintainer(s) -- which in this case is NOT me, but Jason.

I know that you know all this...I'm just getting it on the record.


In *this* case, I'll fix up the remaining mingw warts relative to "where
should the db file go" -- and probably more.  But in general "I don't
care for the mingw case" != "I can commit patches that break mingw".


>>> +  if (write (fd, &hdr, sizeof hdr) < 0)
>>> +  else if (write (fd, img_info_list, img_info_size * sizeof (img_info_t)) < 0)
>>
>> Hmm...is sizeof(struct) the same, on both 64bit and 32bit windows, and
>> between cygwin|msys|mingw, and with/without -mms-bitfields and or
>> __atrribute__((ms_struct))?
> 
> Skipping the 64 bit issue which needs another decision first, I don't
> understand the question.  Whatever the sizeof img_info_t is, it's a
> constant expression on the system for which rebase has been compiled.

You're assuming that nobody will try to use cygwin's rebase on their
C:\MinGW tree, or their mingw rebase on their cygwin tree, or some other
combination.

Sure, it'd be stupid, don't do that (*, **). AND most likely the "mingw
rebase" would actually be using a *different db file* (C:/MinGW/etc/foo
vs C:/cygwin/etc/foo) than the cygwin one, even if the targetted DLLs
were, for instance, under the cygwin tree in both cases.

(*) unless you are SPECIFICALLY trying to change the base address of the
cygwin dll itself for some reason, which can't be done using the cygwin
rebase tool -- but you CAN use the mingw rebase tool to do it. 'Course,
in that specific case you wouldn't want to use the db anyway.

[[[ see revision, below ]]]
(**) except that the using mingw-rebase for both the msys and mingw
trees, or using the msys-rebase for both trees, is quite common
already...because in this scenerio you quite often DO have a chain of
fork/exec that begins with an msys shell, configure script, msys-make,
what have you, and eventually ends with mingw gcc.exe.  In that case,
you want to ensure that ALL the dlls -- some of which are msys, and some
of which are "native" -- have non-overlapping image base addresses.

So, I'm actually thinking that mingw-rebase and msys-rebase SHOULD share
/exactly/ the same db file (where it should go, and how each version of
rebase.exe should 'find' it, is a question I'll try to answer later).
But...if that is the case, then the on-disk format should be the same in
each case.

It's not hard to ensure that it be so -- so why NOT do it?

[[[revision]]]
Wait.  The only time the fork/exec thing matters is when the current
process is, itself, a cygwin/msys one.  So the ONLY dlls you care about,
for the fork/exec reason, are cygwin/msys ones -- you do NOT care about
(non-system) mingw dlls, because only cygwin/msys processes ever fork().

Once you exec() the target -- if it is a mingw exe -- then (a) you don't
care where its mingw dlls are mapped, and (b) you'll never fork() again.
 You might call CreateProcess, but we don't care about that.  So, you
really should never need to rebase the msys DLLs *and* the mingw DLLs
into a single non-overlapping set.  You only need to worry that the msys
ones are non-overlapping -- you don't CARE where the mingw ones go.

So, there's *no* reason to use, say, msys-rebase on both msys/* and
mingw/*.  Ditto mingw-rebase on msys/* and mingw/* together -- at least,
not with the same db file.

So...no reason to share the dbs -- so ensuring the format is "the same"
is not important.  This decision does have usage implications regarding
the 'database dumper' program (see below), if I were to write one:
basically you'd have to use the 'correct' dumper for each platform.
That's ok; it would only be a debugging tool anyway.

>> This would force MS-style struct padding and bitfield sizing for all
>> three platforms...
> 
> Why?  You don't want to share the stuff.

See above.

>> So...ok.  But there should probably be some documentation up at the top
>> of rebase.c for the database format.
> 
> Ok, I add comments to the structure layout.  It's really not complicated
> enough to ask for more.
> 
>> OK, so now that I look at it, I wonder about a few more things:
>>
>> 1) should the header have a db_version field -- what if we change the
>> format, either by rearranging, changing the "official" encoding for the
>> name strings, or add new fields to the records?
> 
> Sure, why not.

<g>

>> 2) Should we worry about the on-disk format being "the same" between
>> cygwin and msys and native?
> 
> No.

I only mildly disagree, mostly for aesthetics at this point -- and not
enough to object.

>> 3) It's nice to have constant size records for all DLLs "up front" and
>> then the variable-sized strings "at the end", since that allows for
>> random-access.
> 
> Actually, it has nothing to do with random access but with ease of
> programming.
> 
>> But...we don't DO random access, so that's not much of
>> an issue (although it does allow for easy direct inspection of a hexdump
>> of the db).  The cost tho is that (a) we store a useless pointer in the
>> DB, and (b) the data for the Nth DLL is separated from the NAME of that
>> DLL, so it's hard to tell which DLL actually uses any given record.
> 
> a) 4 bytes per record.  The entire size of the database file is just
>    a tiny bit over 20K on my machine, with 311 DLLs installed, most of
>    it the string table.  When was the last time you had to worry about
>    a waste of 1244 bytes on your hard drive?

Not worried about on-disk *size* per se -- just extra *time*.  Of
course, reading/writing it all in a big chunk direct to a memory buffer
-- even if 4 bytes per entry is "wasted" -- is so much faster than
iterating thru the data structure and only writing/reading the
"necessary" fields one. at. a. time. for. each. entry. iteratively. --
that your way is still faster.

Heck, opening all the dll's to get their actual image base and size data
takes longer.

> b) Why do you want to look into the file?  That's what rebase -s -i
>    is for.
> 
>> Maybe if rebase.exe had an explicit "--dump-db-for-debuggging" option
> 
> Still, rebase -s -i

But I thought that would, in the output, present the current DLL data in
preference to the database contents -- so you never REALLY know what is
stored IN the db.

> Again, why would you want to do that?  There's no information to gain.

Debugging only. You've never had to inspect raw database entries as part
of tracking down a problem?

> rebase -s -i prints the database content, unless the real DLL differs
> from the database storage.  If so, and if DLLs now overlap, you get
> the information printed on the screen.  That's what you need to decide
> if some more rebasing might be necessary.  What crucial information do
> you gain from printing the database as is, if it doesn't match reality?

Absolutely agreed that USERS should never care about anything but -s -i.
 I'm only thinking of rebase developers, if/when there is some
hard-to-track problem and you need a sanity check.

What if a user's db gets scrogged -- how can you tell?  rebase -s -i
will simply show a giant list of (currently installed DLLs) and will
show their actual image base, and probably mark all of them as
'different from (the scrogged) db' (if can actually match up the names,
which might also be scrogged).  That's not helpful in tracking down a
corrupted db.

Maybe the db impl should be in a separate source file, and then we could
have a separate, very dumb tool, that just dumps the contents. It'd be
packaged in a "rebase-devel" package and nobody (but you, me, and Jason,
and some user who we are trying to help) would ever install it...

That's for a separate patch tho. Maybe I'll do that as part of my
mingw/msys "cleanup". Or another other patch.

>> And here's where the version number would be checked.  If we had
>> multiple versions, I guess you'd dispatch to the appropriate 'reader'
>> routine for each version here as well.
> 
> Ok.

Thanks.

>> endsWithI allows to handle both "/usr/bin/cygwin1.dll" and
>> "/bin/cygwin1.dll" -- ...I for case-insensitive?
> 
> The /usr/bin path and the cygwin DLL name are always lower case.  The
> code in collect_image_info converts the path so that all stuff under
> /usr/bin will show up as /usr/bin, even if you used /bin on the command
> line.  There's no reason to check for /bin.

OK, thanks for explaining that.

>> but perhaps that is made unnecessary by the "back and forth to POSIX"
>> name manipulations; I'm not sure.
> 
> Right.
> 
>> Probably for clarity, all the places that deal with the cygwin DLL
>> itself should be marked
>>
>> #if defined(__CYGWIN__) || defined(__MSYS__)
> 
> Any chance I can concentrate on the Cygwin stuff and you add MSYS
> later?

My only worry with that is, it's possible to code something in such a
way that /deeply/ embeds modern cygwinisms, so that it becomes very
difficult to retrofit support for msys or native.

So far, that doesn't seem to be the case...so as long as you don't
*deliberately* go out of your way to break msys or native, I guess I can
come along behind with a broom and dustpan.  It's really up to Jason,
tho -- because what you're REALLY asking is
	"Can I add new functionality for cygwin, that will make
	 rebase.exe either non- or sub-functional (or worse,
	 non-compilable) on msys and mingw" -- until somebody else
	 makes a separate cleanup patch for those platforms?"
Now, *I* certainly wouldn't expect you to compile and test on those
other platforms; only just make your best effort to not deliberately
break stuff.  But again, it's really up to Jason.

Jason?

This patch is OK with me, given the understanding that "somebody" --
e.g. me -- will need to present another patch to fixup the msys and
mingw bits.

> +#if !defined (__CYGWIN__) && !defined (__MSYS__)
> +#define SYSCONFDIR "/../etc"
> +#endif
> +#define IMG_INFO_FILE SYSCONFDIR "/rebase.image_info"
> +char tmp_file[] =     SYSCONFDIR "/rebase.image_info.XXXXXX";

'sokay for now.  Obviously needs a bit of fixup (e.g. the extra "prepend
the exe's dir" code) for mingw, but I'll address that.

--
Chuck


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]