This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: 1.3.3-2: fseek fails on multiples of 1024 (binary mode)

To: cygwin at cygwin dot com
Subject: Re: 1.3.3-2: fseek fails on multiples of 1024 (binary mode)
From: Pavel Tsekov <ptsekov at syntrex dot com>
Date: Tue, 23 Oct 2001 18:21:40 +0200
References: <20011021225629.17478.qmail@foolabs.com> <20011022150648.B10383@redhat.com> <3BD556FE.46350849@syntrex.com> <20011023141012.G31068@cygbert.vinschen.de>

Upon further investigation this problem seems to
be much bigger than I've initially thought and
there is no easy patch to it (at least I think
so) so I'll describe what've found about it. I
started a patch though and will would like to know
if you like my approach or will suggest another
one.

So here is what I've found:

1. The described problem appears only when
you have buffering enabled.

2. It basically has to do with a intended
optimisation which involves not reading 
the same buffered region twice, if the
fseek call  requests a offset which is
in the buffered region. This optiomisation
is performed only if the file is opened in
READONLY mode.

3. The problem is to happen only if you fseek
backwards from the current file position and
the difference between the current file position
and the new position is not greater than the 
buffer size used to buffer reads. A requirement
for the problem to appear is to have two or more fseeks
between no other operation between them.

4. This problem is likely to affect all BSD
flavour OSes - the newlib stdio code is derived
from BSD. I looked at the fseek code in OpenBSD
and FreeBSD and I think it has the error.

Now let me point you to the code location which
introduces the problem:

--> Target is where do we wont to go

 if (whence == SEEK_SET)
    target = offset;
  else
    {
      if (_fstat_r (ptr, fp->_file, &st))
        goto dumb;
      target = st.st_size + offset;
    }

--> What is actually done here up to the next --> mark
    is that fseek tries ot determine the position where 
    the last read occured. fp->_r is number if bytes
    left in the buffer to be read, n = fp->_p - fp->_bf._base
    is the bytes from the buffer already read. n + fp->_r
    is the size of the buffer. By decrementing the current 
    file positon with n + fp->_r the code expects to find
    the position where the last read occured.

  if (!havepos)
    {
      if (fp->_flags & __SOFF)
        curoff = fp->_offset;
      else
        {
          curoff = (*seekfn) (fp->_cookie, 0L, SEEK_CUR);
          if (curoff == POS_ERR)
            goto dumb;
        }
      curoff -= fp->_r;
      if (HASUB (fp))
        curoff -= fp->_ur;
    }

  /*
   * Compute the number of bytes in the input buffer (pretending
   * that any ungetc() input has been discarded).  Adjust current
   * offset backwards by this count so that it represents the
   * file offset for the first byte in the current input buffer.
   */

  if (HASUB (fp))
    {
      curoff += fp->_r;       /* kill off ungetc */
      n = fp->_up - fp->_bf._base;
      curoff -= n;
      n += fp->_ur;
    }
  else
    {
      n = fp->_p - fp->_bf._base;
      curoff -= n;
      n += fp->_r;
    }

--> Here we try to determine if the position we request
    is in the buffered region. If we determnine erronosly
    that the requested position is in this buffer we got the
    error which started this thread.

  /*
   * If the target offset is within the current buffer,
   * simply adjust the pointers, clear EOF, undo ungetc(),
   * and return.  (If the buffer was modified, we have to
   * skip this; see fgetline.c.)
   */

  if ((fp->_flags & __SMOD) == 0 &&
      target >= curoff && target < curoff + n)
    {
      register int o = target - curoff;

      fp->_p = fp->_bf._base + o;
      fp->_r = n - o;
      if (HASUB (fp))
        FREEUB (fp);
      fp->_flags &= ~__SEOF;
      return 0;
    }

Let me show you with some digits whats going on:

We have 2048 bytes file. We have bufsize 1024.
We fread 1024 bytes - this fills exactly the
internal buffer thus  'n' will become 1024,
fp->_r will become 0. We request then file
position 0 from the end of file. ftell 
reports 2048. Now we want to go back to 1024
and read 8 bytes. fseek tries to optiomize since
we are in read only mode ... it does this

curoff = 2048 (current position)

curoff -= r; 2048 - 0 = 2048
curoff -= n; 2048 - 1024 = 1024 Determines where last read occured

n + r = 1024 + 0; Determines the upper limit of the internal buffer

now the final check 

target >= curoff && target < curoff + n
1024 >= 1024 && 1024 < 1024 + 1024 is TRUE

we read from the old buffer which is error.

I think to fix the situation so this optimisation
really works the best thing is to add a new
member to FILE which is the last known  read
position and replace curoff in the last check
with this member. However this requires much more
work than just a simple patch - So do you think
there is another (better) way to do this. Or perhaps
just remove this optimisation at all ? Which is easy
and will require chabges just to fseek.

Thanks.

Corinna Vinschen wrote:
> 
> On Tue, Oct 23, 2001 at 01:39:42PM +0200, Pavel Tsekov wrote:
> > I think I found the problem - is there anyone who can
> > test a patch ?
> 
> Just send it to the cygwin-patches mailing list, please.
> 
> Thanks,
> Corinna
>

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Follow-Ups:
- Re: 1.3.3-2: fseek fails on multiples of 1024 (binary mode)
  - From: Christopher Faylor

References:
- 1.3.3-2: fseek fails on multiples of 1024 (binary mode)
  - From: Derek B. Noonburg
- Re: 1.3.3-2: fseek fails on multiples of 1024 (binary mode)
  - From: Christopher Faylor
- Re: 1.3.3-2: fseek fails on multiples of 1024 (binary mode)
  - From: Pavel Tsekov
- Re: 1.3.3-2: fseek fails on multiples of 1024 (binary mode)
  - From: Corinna Vinschen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]