This is the mail archive of the cygwin-apps mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch/rebase] Add a rebase database to keep track of DLL addresses


On 7/2/2011 4:41 PM, Corinna Vinschen wrote:
> On Jul  2 09:15, Corinna Vinschen wrote:
>> On Jul  2 01:14, Charles Wilson wrote:
>>> I'll take a look next week.

Comments inline.

> New patch attached with the following changes:
> 
> - I introduced a last minute bug into load_image_info.  Fixed.
> - merge_image_info now eliminates duplicate DLLs given on the command
> - print_image_info now also eliminates duplicates and handles the files> 
> - What's more important, print_image_info now checks for overlapping DLLs

img_info_cmp: what if a or b is NULL? do we care?  Ditto for a->name,
b->name -- POSIX doesn't mandate the behavior of strcmp if args are null...

Also, what encoding do we end up with (on cygwin)? does that affect the
use of strcmp() to compare names?


img_info_name_cmp: ditto.

Since the image bases are stored (in our database and data structures)
as ULONG, how do we support 64bit systems and rebasing above 4G?  Or is
that fodder for a future patch?

> Index: rebase.c
> ===================================================================

> +#define IMG_INFO_FILE	"/etc/rebase.image_info"
> +char tmp_file[] = "/etc/rebase.image_info.XXXXXX";

Hmm.  I don't think this will work as expected in the native MinGW
build, given the whole absolute pathname "problem".  Then, even on
cygwin/msys it'd also be nice to have some sort of support for configure
--sysconfdir and --prefix...

Perhaps something like...

#define xstr(s) str(s)
#define str(s) #s
#if defined __CYGWIN__ || defined __MSYS__
# define IMG_INFO_FILE xstr(SYSCONFDIR) "/rebase.image_info"
# char tmp_file[] = xstr(SYSCONFDIR) "/rebase.image_info.XXXXXX";
#else
/* psuedo-relocatable for native mingw build */
# define IMG_INFO_FILE "/../etc/rebase.image_info"
# char tmp_file[] = xstr(SYSCONFDIR) "/../etc/rebase.image_info.XXXXXX";
#endif

...with appropriate -DSYSCONFDIR flags added in Makefile.in (for the
cygwin|msys case), as well as a bit of code elsewhere, to prepend the
installation directory of rebase.exe itself for the mingw case.

Or maybe, for the mingw case, the database should be in cwd()? or
directly in same directory as rebase.exe?

>  #ifdef __CYGWIN__
>  ULONG cygwin_dll_image_base = 0;
>  ULONG cygwin_dll_image_size = 0;
> @@ -75,11 +108,18 @@ main (int argc, char *argv[])
>    BOOL status;
>  
>    setlocale (LC_ALL, "");
> +  progname = (progname = strrchr (argv[0], '/')) ? progname + 1 : argv[0];

And I guess here would be the place to set/compute the executable's
'dirname'. Use GetModuleFileName() if original progname has no dir info?

> +  if (write (fd, &hdr, sizeof hdr) < 0)
> +  else if (write (fd, img_info_list, img_info_size * sizeof (img_info_t)) < 0)

Hmm...is sizeof(struct) the same, on both 64bit and 32bit windows, and
between cygwin|msys|mingw, and with/without -mms-bitfields and or
__atrribute__((ms_struct))?


Maybe both structs should be defined as:

#ifdef __GNUC__
#define STRUCT_FMT __attribute__((ms_struct))
#endif

typedef struct _img_info_hdr
{
  char  magic[4];
  ULONG base;
  ULONG offset;
  BOOL  down_flag;
  ULONG count;
} img_info_hdr_t STRUCT_FMT ;

typedef struct _img_info
{
  char *name;
  ULONG name_size;
  ULONG base;
  ULONG size;
  ULONG slot_size;
  struct {
    unsigned needs_rebasing : 1;
  } flag STRUCT_FMT ;
} img_info_t STRUCT_FMT ;

This would force MS-style struct padding and bitfield sizing for all
three platforms...


> +      for (i = 0; i < img_info_size; ++i)
> +	if (write (fd, img_info_list[i].name,
> +		   strlen (img_info_list[i].name) + 1) < 0)

So, you're using a binary format database, and assuming that all sizes
are constant regardless of the bitness of the target.  I guess that's
ok, since (a) we're talking about Windows only here --
cygwin|msys|native -- so it's always little-endian, and (b) the structs
are defined using ULONG types and similar, which are constant width on
32- and 64- bit platforms.

The obvious alternative is to use the xdr_* functions, but (a) those are
not available on msys or native without an additional library
dependency, and (b) it'd really be nice to avoid additional library deps.

So...ok.  But there should probably be some documentation up at the top
of rebase.c for the database format.  Something like this:

Database format:
   NAME      SIZE (bytes)     TYPE    comments
header---
   MAGIC        4             char    "rBiI"
   BASE         4             ULONG   LE32 format
   OFFSET       4             ULONG   LE32 format
   DOWN_FLAG    4             BOOL    nonzero = true (down)
   COUNT        4             ULONG   LE32 format

repeating entries, one per dll
   NAME         4             pointer  --- value ignored --- [*]
   NAME_SIZE    4             ULONG   LE32 (length of dll name string
PLUS ONE for trailing null)
   BASE         4             ULONG   LE32 (ImageBase for this dll)
   SIZE         4             ULONG   LE32 (ImageSize for this dll)
   SLOT_SIZE    4             ULONG   LE32  ???
   FLAGS        ????
(uh-oh. What about struct padding? Does that differ for 32bit and 64bit?
 Bitfield formats?  Require -mno-msbitfields when compiling natively?
use the attribute marker as above?  Anyway, hard to document the format
without forcing a particular struct packing regime...)

repeating entries, one per dll
   (sequence of null-terminated strings, in same order as previous
repeating group.  These are in what encoding -- current locale?
multibyte? utf-8?  What about native mingw or msys?)


OK, so now that I look at it, I wonder about a few more things:

1) should the header have a db_version field -- what if we change the
format, either by rearranging, changing the "official" encoding for the
name strings, or add new fields to the records?

2) Should we worry about the on-disk format being "the same" between
cygwin and msys and native?  I suspect cygwin can "go its own way", but
I think msys and native should coordinate, since it's likely that the
two versions may be used interchangeably in a typical MinGW/MSYS
installation.

3) It's nice to have constant size records for all DLLs "up front" and
then the variable-sized strings "at the end", since that allows for
random-access.  But...we don't DO random access, so that's not much of
an issue (although it does allow for easy direct inspection of a hexdump
of the db).  The cost tho is that (a) we store a useless pointer in the
DB, and (b) the data for the Nth DLL is separated from the NAME of that
DLL, so it's hard to tell which DLL actually uses any given record.

Maybe if rebase.exe had an explicit "--dump-db-for-debuggging" option
that reads in the db and prints it out (that is, unlike -s, it would
totally ignore the actual DLLs), then there's no real need to "inspect"
the on-disk data, and we can continue to use this nice -- fast to read,
fairly compact with only 4 wasted bytes per entry -- binary format.

> +int
> +load_image_info ()
...
> +  /* Check the header. */
> +  if (memcmp (hdr.magic, IMG_INFO_MAGIC, 4) != 0)
> +    {
> +      fprintf (stderr, "%s: \"%s\" is not a valid rebase database.\n",
> +	       progname, IMG_INFO_FILE);
> +      close (fd);
> +      return -1;
> +    }

And here's where the version number would be checked.  If we had
multiple versions, I guess you'd dispatch to the appropriate 'reader'
routine for each version here as well.

> +merge_image_info ()
> +{
> +  int i, end;
> +  img_info_t *match;
> +  ULONG floating_image_base;
> +
> +  /* Sort new files from command line by name. */
> +  qsort (img_info_list + img_info_rebase_start,
> +	 img_info_size - img_info_rebase_start, sizeof (img_info_t),
> +	 img_info_name_cmp);
> +  /* Iterate through new files and eliminate duplicates. */
> +  for (i = img_info_rebase_start; i + 1 < img_info_size; ++i)
> +    if ((img_info_list[i].name_size == img_info_list[i + 1].name_size
> +	 && !strcmp (img_info_list[i].name, img_info_list[i + 1].name))



> +#ifdef __CYGWIN__
> +	|| !strcmp (img_info_list[i].name, "/usr/bin/cygwin1.dll")
> +#endif

replace with something like

#if defined __MSYS__
	|| endsWith (img_info_list[i].name, "/msys-1.0.dll")
#elif defined __CYGWIN__
        || endsWith (img_info_list[i].name, "/cygwin1.dll")
#endif

endsWithI allows to handle both "/usr/bin/cygwin1.dll" and
"/bin/cygwin1.dll" -- ...I for case-insensitive?

int endsWithI(char *s, char *p)
{
   int slen, plen, offset;
   slen = strlen(s);
   plen = strlen(p);
   offset = slen - plen;
   if (offset < 0)
      return 0;
   return (stricmp(string+offset, end) == 0);
}

but perhaps that is made unnecessary by the "back and forth to POSIX"
name manipulations; I'm not sure.

> -  exit (0);
> +  /* Now sort the old part of the list by base address. */
> +  if (img_info_rebase_start)
> +    qsort (img_info_list, img_info_rebase_start, sizeof (img_info_t),
> +	   img_info_cmp);
> +  /* Perform several tests on the information fetched from the database
> +     to match with reality. */
> +  for (i = 0; i < img_info_rebase_start; ++i)
> +    {
> +      ULONG cur_base, cur_size, slot_size;
> +
> +      /* Files with the needs_rebasing flag set have been checked already. */
> +      if (img_info_list[i].flag.needs_rebasing)
> +	continue;
> +      /* Check if the files in the old list still exist.  Drop non-existant
> +	 or unaccessible files. */
> +      if (access (img_info_list[i].name, F_OK) == -1

There's a problem with access() and native mingw (when running on
Vista+), but only affects the X_OK flag...

http://gcc.gnu.org/ml/gcc/2007-05/msg00228.html

Not an issue here, since you're only using F_OK.  Just FYI...

> +  /* Try to fit all DLLs with base address 0 into the given list. */
> +  /* FIXME: This loop only implements the top-down case.  Implement a
> +     bottom-up case, too, at one point. */

So does this mean that rebase.exe is broken unless -d is supplied? (Oh,
nevermind: I see that -s implies -d.  So, for now, you just can't use
the database without -d.)

> +  floating_image_base = image_base;
> +  end = img_info_size - 1;
> +  while (img_info_list[0].base == 0)
> +    {
> +      ULONG new_base;
> +
> +      /* Skip trailing entries as long as there is no hole. */
> +       while (img_info_list[end].base + img_info_list[end].slot_size + offset
> +	     >= floating_image_base)
> +	{
> +	  floating_image_base = img_info_list[end].base;
> +	  --end;
> +	}
> +      /* Test if one of the DLLs with address 0 fits into the hole. */
> +      for (i = 0, new_base = 0; img_info_list[i].base == 0; ++i, new_base = 0)
> +	{
> +	  new_base = floating_image_base - img_info_list[i].slot_size - offset;
> +	  if (new_base >= img_info_list[end].base
> +			  + img_info_list[end].slot_size

Probably for clarity, all the places that deal with the cygwin DLL
itself should be marked

#if defined(__CYGWIN__) || defined(__MSYS__)

so long as...

> +#ifdef __CYGWIN__
> +	      /* Don't overlap the Cygwin DLL. */
> +	      && (new_base >= cygwin_dll_image_base + cygwin_dll_image_size
> +		  || new_base + img_info_list[i].slot_size
> +		     <= cygwin_dll_image_base)
> +#endif

...the 'cygwin_dll_image_base' and 'cygwin_dll_image_size' vars are
initialized "correctly".  That is:

#if defined __MSYS__
  GetImageInfos ("/bin/msys-1.0.dll", &cygwin_dll_image_base,
                                      &cygwin_dll_image_size);
  /* adjustments to size and base. Not sure what this should be; we'll
   * figure it out later.
   */
#elif defined __CYGWIN__
  /* Fetch the Cygwin DLLs data to make sure that DLLs aren't rebased
     into the memory area taken by the Cygwin DLL. */
  GetImageInfos ("/bin/cygwin1.dll", &cygwin_dll_image_base,
                                     &cygwin_dll_image_size);
  /* Take the three shared memory areas preceeding the DLL into account. */
  cygwin_dll_image_base -= 3 * ALLOCATION_SLOT;
  /* Add a slack of 8 * 64K at the end of the Cygwin DLL.  This leave a
     bit of room to install newer, bigger Cygwin DLLs, as well as room to
     install non-optimized DLLs for debugging purposes.  Otherwise the
     slightest change might break fork again :-P */
  cygwin_dll_image_size += 3 * ALLOCATION_SLOT + 8 * ALLOCATION_SLOT;
#endif

> +	     )
> +	    break;
> +	}
> +      /* Found a match.  Mount into list. */
> +      if (new_base)
> +	{
> +	  img_info_t tmp = img_info_list[i];
> +	  tmp.base = new_base;
> +	  memmove (img_info_list + i, img_info_list + i + 1,
> +		   (end - i) * sizeof (img_info_t));
> +	  img_info_list[end] = tmp;
> +	  continue;
> +	}
> +      /* Nothing matches.  Set floating_image_base to the start of the
> +	 uppermost DLL at this point and try again. */
> +#ifdef __CYGWIN__

  || defined __MSYS__, for clarity

> +      if (floating_image_base >= cygwin_dll_image_base + cygwin_dll_image_size
> +	  && img_info_list[end].base < cygwin_dll_image_base)
> +	floating_image_base = cygwin_dll_image_base;
> +      else
> +#endif
>      }


>  void
>  print_image_info ()

> +  /* Now sort by address. */
>    qsort (img_info_list, img_info_size, sizeof (img_info_t), img_info_cmp);
>    for (i = 0; i < img_info_size; ++i)
> -    printf ("%-47s base 0x%08lx size 0x%08lx\n",
> -	    img_info_list[i].name,
> -	    img_info_list[i].base,
> -	    img_info_list[i].size);

0x%08lx is big enough for 32bit ULONG .base, but if we allow 64bit...

>  BOOL
> Index: rebaseall.in
> ===================================================================
> RCS file: /sourceware/projects/cygwin-apps-home/cvsfiles/rebase/rebaseall.in,v
> retrieving revision 1.3
> diff -u -p -r1.3 rebaseall.in
> --- rebaseall.in	28 Jun 2011 19:43:19 -0000	1.3
> +++ rebaseall.in	2 Jul 2011 20:39:45 -0000
> @@ -13,7 +13,7 @@
>  #
>  # Written by Jason Tishler <jason-/+CHcdCtaLzR7s880joybQ@public.gmane.org>
>  #
> -# $Id: rebaseall.in,v 1.3 2011/06/28 19:43:19 corinna Exp $
> +# $Id: rebaseall.in,v 1.2 2011/06/21 15:40:10 corinna Exp $
>  #
>  
>  # Define constants
> @@ -46,7 +46,7 @@ cleanup()
>  trap cleanup 1 2 15
>  
>  # Set defaults
> -BaseAddress=$DefaultBaseAddress
> +BaseAddress=""
>  Offset=$DefaultOffset
>  Verbose=$DefaultVerbose
>  FileList=$DefaultFileList
> @@ -82,6 +82,17 @@ do
>      esac
>  done
>  
> +# Check if rebase database already exists.
> +database_exists="no"
> +[ -f "/etc/rebase.image_info" ] && database_exists="yes"

In this case, we can unconditionally use

[ -f "@SYSCONFDIR@/rebase.image_info" ]

because the scripts are only useful with cygwin and msys.

--
Chuck


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]