This patch fixes a resource starvation condition in Rx. The
problem arises, for instance, when more than 4 daemons try to
prefetch chunks of the same file at once. The fifth daemon is
stuck in MAKECALL_WAITING state, never getting a chance to run,
because the other 4 daemons never yield to the scheduler after
releasing the call, and just grab the call back again.
afs_RemoveCellEntry holds afs_xcell; setserverprefs modified the same
structure but did not which was problematic if something changed out from under
it
an ext3 journal in the vice cache (root of the partition) is allowable
we have no useful way to discern ext2 from ext3 without groveling in fstab
so just allow it
"My theory of what happened is roughly as follows:
Process tries to read data from AFS (as part of a page fault);
issues a new Rx call on an Rx connection to the fileserver.
The server transmits some data back to the client, but some packet
is lost.
Something tries to garbage-collect/destroy the connection; since
there is an active call, it can't do so, but issues an rx_AckAll
anyway, which acknowledges all packets transmitted by the server
as having been received. Server flushes its retransmit queue.
Client waits forever for the lost packet to arrive, but since the
server has already flushed the transmit queue, it cannot possibly
retransmit it.
All this is happening while the client has read-locked its address
space (since the read is part of a page fault). /proc accesses that
try to poke into that processes address space hang waiting for said
lock, causing the lossage we actually observed."
(as originally discovered by ted@mit.edu)
"This fix deals with the following lose case:
Client starts a call that, for some reason, takes a long time on the
server. While the client waits for the server to finish, client and
server usually send each other keep alive packets. If something
causes those packets to be delayed or dropped, then the client will
conclude that the call has failed or finished (usually failed), while
the server is still *busy* doing the call.
In this circumstance, the client will initiate another call and the
server will correctly respond that it is busy. Unfortunately, if the
callNumber of a received packet doesn't match the callNumber of the
outstanding call, then the client never sees that the server says it's
busy. Instead the server appears as a black hole to the client.
This fix ensures that the client sees the busy packets when its
callNumber is reasonably out of sync with the server."
this caused a call to pdflush to happen at the wrong time, which should fix
the zero filled files problem, the osi_assert(cred) problem and the
execsorwriters == 0 warnings to go away
if you're not using ufs logging it's ok to replace solaris fsck with vfsck,
except sometimes it exits with 40 and that's not a failure to the solaris
scripts.
make it so for us also
This patch makes sure that in-kernel aliases to non-existant names aren't
accidentally created due to case mismatch (e.g. "athena" being created as
a symlink to "athena.MIT.EDU", while "athena.mit.edu" is the real cell
that already exists). It also lowercases cell names in AFSDB lookups,
otherwise the same problem appears in userspace (eg "aklog athena" tries
to obtain tokens for cell "athena.MIT.EDU").
This patch fixes a problem with 64-bit pointers being munged by the
background daemons (by separating sizes and pointers into separate
variables -- this bug was apparently introduced by the 64-bit file
support patch), and makes the background daemons handle requests in
order they came in. The latter will be mostly just useful for some
prefetching and fine grained dcache-locking patches
the krb version of the module should be built completely in AFS_KERBEROS_ENV
====================
This delta was composed from multiple commits as part of the CVS->Git migration.
The checkin message with each commit was inconsistent.
The following are the additional commit messages.
====================
clean up spacing
Currently nothing clears the CLIENTDELETED flag in hosts, so once
a client has been deleted, h_TossStuff_r() will keep getting called
with every host release. This patch clears the CLIENTDELETED flag
every time we take care of deleted clients.