Master does not track a particular version number.
For Windows builds on master, reset the version to
0.0.0 so that the builds are not confused with the actual
1.5.7600.
Change-Id: I3c84bb117418284de0d65e2a4069b88908b91659
Reviewed-on: http://gerrit.openafs.org/6698
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Modify AFSRemoveFcb to use InterlockedComparePointerExchange
to ensure that only one thread can remove and deallocate an
AFSFcb structure.
Change-Id: I27d27b6a99806bee2fc2cfc04c2ac04d975a553d
Reviewed-on: http://gerrit.openafs.org/6696
Reviewed-by: Peter Scott <pscott@kerneldrivers.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Provide examples of the direct volume access syntax, using the
fictitious example.com cell.
Change-Id: Ia2ea592531e29f6b744d0bd6993d598d78a799c4
Reviewed-on: http://gerrit.openafs.org/6691
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
In AFSSetRenameInfo(), the rename to itself check was performed
after the name collision check. Move the check earlier in the
routine to ensure that we catch the no-op before any real work
is done.
Change-Id: I580dd9958a259d4a1819c6bd882dae8067d2853d
Reviewed-on: http://gerrit.openafs.org/6692
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Currently in h_Alloc_r, we h_Lock_r the host, so we have it locked on
return. However, h_Lock_r drops the host glock, which is bad in this
situation since we have already added the host to the global hash
table, so other threads may see it. This can mean that by the time
h_Alloc_r returns, the returned host may have HOSTDELETED set, and/or
the addresses associated with the host may be completely different.
h_Alloc_r's caller, h_GetHost_r, seems to assume that the host is
still associated with the address of the passed-in connection. When
this is not true, this can result in the host structure getting into a
strange state, such as the primary addr/port may not be hashed. The
host may also have HOSTDELETED set, in which case we're not supposed
to be dealing with it at all.
To avoid these problems, lock host->lock directly in h_Alloc_r,
without going through h_Lock_r and dropping H_LOCK. Also do it as one
of the first things we do to initialize the host, just to make sure
that if anybody else happens to see the host, it is locked by us when
they do.
Change-Id: Ia99cb84ad94f3e143ed0bae33485a88d60ff5b27
Reviewed-on: http://gerrit.openafs.org/6389
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
For replication writes at the remote site, we will want to call
this without a host structure.
Change-Id: I9cdef18f35229c9ab162cc07f6d60fe443204654
Reviewed-on: http://gerrit.openafs.org/6674
Reviewed-by: Simon Wilkinson <simonxwilkinson@gmail.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Add missing BlockSize registry value
Correct AFSRedirector\NetworkProvider registry key description
Add note that LanAdapter value is ignored if SMB mode is not in use.
Change-Id: I449988f1f6841c1b254d73b08a6ee53ca2dbaeda
Reviewed-on: http://gerrit.openafs.org/6685
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
OpenAFS reparse points represent mount points, symlinks, and dfs
referrals. All of which are file system objects that represent
another named entity in the system. As a result the reparse tag
field must include the Reparse Tag Surrogate bit (0x20000000) set.
This permits the IsReparseTagNameSurrogate() macro provided in
winnt.h to be used to determine if the reparse point is a surrogate
or not.
See
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365197%28v=vs.85%29.aspx
Change-Id: I2561823e23371c2fdf01941da99fe848ca1fa11d
Reviewed-on: http://gerrit.openafs.org/6668
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Add some basic definitions that will be needed to handle RW
replicas.
A new volume type RWREPL is added. Replicas will share the same
volume ID as the RW volume, so the array of volume IDs by volume
type is unchanged, as is the VLDB entry format.
A new flag bit ITSRWREPL/VLSF_RWREPLICA for serverFlags identifies
RW replica sites in VLDB entries.
Change-Id: I882b238f34e682ebea782e11dc418ae1340d4546
Reviewed-on: http://gerrit.openafs.org/6676
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Simon Wilkinson <simonxwilkinson@gmail.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
OPENAFS_VOL_STATS has been unconditionally defined since the IBM days.
Adjust the code to assume it is set.
Change-Id: I3b5ff99a469e6865ff1e10405a7f77d8c3890f59
Reviewed-on: http://gerrit.openafs.org/5551
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
With newer Solaris Studio (sometime in the 12.* series), cc started
adding SSE instructions to optimized x86 code, which is invalid for
kernel code and can generate panics. There appears to be no way to
turn this off currently (-xvector=%none is non-functional), so default
to not optimizing kernel code.
Change-Id: I5fdedb11219df68e0146b8e0cee9010c2eb4067e
Reviewed-on: http://gerrit.openafs.org/6671
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
We no longer include rx_packet.h from rx.h, so rx_kcommon.h was not
picking up some packet-related definitions. Some files
(SOLARIS/rx_knet.c, IRIX/rx_knet.c) were using packet-related defines
(e.g. RX_HEADER_SIZE) while just including rx_kcommon.h. Include
rx_packet.h in those files to get the relevant definitions.
Change-Id: Ib012f295d8e324dd8b38eb0b89933eac392a9583
Reviewed-on: http://gerrit.openafs.org/6670
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
For many vfs ops to the cache, we currently pass &afs_osi_cred for our
credentials, which is a mostly zeroed-out credential structure. In
some modern versions of Solaris (Solaris 11), at least some parts of
this structure need to not be NULL (cr_zone), or we will panic.
The Solaris kernel provides a 'kcred' credentials structure for the
purpose of using "kernel" credentials for i/o. So just use that
instead, since kcred has existed at least since Solaris 8.
Change-Id: Ia5252580d2de6dd7adfa1a1929148362d1da6360
Reviewed-on: http://gerrit.openafs.org/6669
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Use InterlockedCompareExchangePointer to assign the DirNode to
ObjectInfo->Specific.Directory.PIOCtlDirectoryCB. Otherwise,
one thread could race with another thread when allocating the
pioctl object.
Change-Id: Ic5b1a0ff2e44f2c4520cc7f5e536bd876bc83a65
Reviewed-on: http://gerrit.openafs.org/6661
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
If the Fcb reference count hits 0 while the service is called
it is possible that the Fcb can be garbage collected prior to
the completion of the call.
Change-Id: I32c3c5e3debb246fe63ac6f6cc5625b493ee47a9
Reviewed-on: http://gerrit.openafs.org/6660
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
NSIS installers are no longer up to date and do not support 64-bit
builds. OpenAFS no longer distributes them for 1.7 and beyond.
Stop building them by default.
Change-Id: I6b8c2b46ccc30654cfb4661c9bde50483bc99785
Reviewed-on: http://gerrit.openafs.org/6664
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Add a utility function that invalidates all buffers for a
cm_scache_t object.
Change-Id: Ib10139fb2aefa03d597d5afd494652fade40432e
Reviewed-on: http://gerrit.openafs.org/6651
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
In cm_DirOpDelBuffer() the data version field for a buffer
in cm_dirOp_t.buffers[] can be CM_BUF_VERSION_BAD if the buffer
was added to the buffer list but was never fetched from the file
server. If the buffer was recycled by buf_Get() an attempt to
remove an entry from the directory will be failed as opposed to
fetching the buffer from the file server and performing the local
removal.
Change-Id: Id9af5180f2176c2a90ef9907ae84139e66ffe5d6
Reviewed-on: http://gerrit.openafs.org/6650
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
In cm_MergeStatus, always set cm_scache_t.bufDataVersionLow
to the new data version because the cm_dir package does not
support version ranges. All modified dir buffers have their
dataVersion field set to the current data version value.
Failure to update the bufDataVersionLow field can result in
B+ Trees being constructed from out of date directory information.
Change-Id: Ic6bb6f78275de9c6c7960f2fc7c06c507b1144c1
Reviewed-on: http://gerrit.openafs.org/6649
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
B+Tree key strings were changed to wchars for unicode support,
the debugging printf format patterns were not updated to match.
Do so now.
Change-Id: I70619d2e3fbc007f3f21eaf56cc5d61503203818
Reviewed-on: http://gerrit.openafs.org/6648
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Perform the shutdown check earlier in AFSCommonCreate() to prevent
a request from being processed after the service indicates that
a shutdown has begun.
Change-Id: I8959141b5e2161ffe960e93a500b1153d9594a28
Reviewed-on: http://gerrit.openafs.org/6647
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Turn on bug checking by default via the installation.
This permits sites to disable the functionality but will allow
us to capture more meaningful minidump output.
Change-Id: I62b6d0ce5deed2c8798c9afb09565a8846c32a8c
Reviewed-on: http://gerrit.openafs.org/6646
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Do not call AFSNotifyDelete after the reference count on the
DirEntry->ObjectInformation is given up.
Log the Parent FID and file name since that is what are passed
to the service to perform a delete. Log the actual FID of the
object being deleted and not the address of the FID fields.
Change-Id: Ic02e2cec625258356d1b08e03a02a7a9c4eb4ce7
Reviewed-on: http://gerrit.openafs.org/6645
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Not all volumes are lower case. Do not lowercase the string.
Change-Id: Icb5f5ee9865bd856775486dffb1849f17f9b23f7
Reviewed-on: http://gerrit.openafs.org/6644
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
On machines lacking a libintl, _intlize() currently fails to initialize
the output error string--leading to tools (e.g., translate_et) returning
a null string; make afs_com_err fall back to returning the en/US canonical
error text when we don't have any i18n support...
Change-Id: I333745fb0a16e5bc9adb0755591d80de010d4d31
Reviewed-on: http://gerrit.openafs.org/6638
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Commit 267934d0e6 introduced
probing code to deal with the renameing of simple_fsync
inside the linux-kernel.
This test does not take different parameter-lists
for noop_fsync or simple_fsync resp. into account.
Fix this.
Change-Id: Ib490f0bb7e8098acc83fce001a43c08f478ad582
Reviewed-on: http://gerrit.openafs.org/6628
Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Tested-by: Derrick Brashear <shadow@dementix.org>
Add man pages for two new Windows only commands
fs getverify
fs setverify -verify {on, off}
Change-Id: Id784608fba35147a4e33f22e43c7cd50a2307b9e
Reviewed-on: http://gerrit.openafs.org/6632
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
The size of the afs redirector worker thread pools should be
made configurable but for now just increase the pool size to
be in parity with the default worker pool created by the
afsd service.
Change-Id: Ib3c9356783162620112041582fa3d9dbaf8fce37
Reviewed-on: http://gerrit.openafs.org/6627
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Do not allow a worker thread to sleep until the task queue is
empty. It is better for the running thread to pick up and process
a task then to sleep this thread and wait for another one to wake
up to perform the work.
Change-Id: I776bb9408ab054b045acb9bc003b88436cc4266b
Reviewed-on: http://gerrit.openafs.org/6626
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
The afs redirector used notification events to wake up worker
threads when a task was added to a work queue. Notification
events when signalled wake up all threads instead of just one.
Instead, use synchronization events to wake up a single thread at
a time and restructure the code to permit workers to wake up
additional workers if there is additional work to be performed
or during library shutdown.
Thanks to Peter Scott for his assistance.
Change-Id: I0fb9d8578035f606f03170622fc9c50a1dbfee3a
Reviewed-on: http://gerrit.openafs.org/6595
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
If the buffer passed to DriveSubstitution is too small the
resulting file path will end up being truncated. At the very
least log the fact that truncation is occurring. In addition
return the fact that truncation occurred to the caller.
In NPGetUniversalName allocate a 4K buffer on the heap instead
of calculating a buffer based on the local name buffer size.
The local name buffer size has no relationship with the required
buffer size for the expanded unc or device path.
FIXES 130548
Change-Id: I86fbb9db4aa6a438dbb5e793678ec52283d5546b
Reviewed-on: http://gerrit.openafs.org/6618
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
The afsredirlib.sys library driver is unloaded when the afsd_service
stops and is reloaded when the afsd_service restarts. During the
shutdown window any objects known to the kernel are preserved by
afsredir.sys. When the afsd_service restarts, there are no valid
callbacks on any objects so the afsredirlib.sys must invalidate all
status info to permit the service to request a callback from the
file server on next use.
Change-Id: I3e8fa9513f435ff5cd1a8cfb8daa766aa30dd8c1
Reviewed-on: http://gerrit.openafs.org/6617
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Invalidation requests were being processed in an inconsistent
manner because different rules were being applied to volume root
directories and other objects and whether or not the invalidation
was a whole volume invalidation or not.
This patchset consolidates all invalidation logic for an object
in the new AFSInvalidateObject function. AFSInvalidateObject
is then called from AFSInvalidateCache and AFSInvalidateVolume
as necessary.
AFSInvalidateVolume executes AFSInvalidateObject on all objects
in the volume object tree. As a result, whole volume invalidations
whether triggered by the file server or "fs flushvolume" now work.
Change-Id: I83f110b0987eb153794b6803a1fe48247090277f
Reviewed-on: http://gerrit.openafs.org/6616
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Group the definitions of server flags for VLDB entries in one place,
and rename VLSERVER_FLAG_UUID to make its name consistent with the
other flags.
This makes it easier to see the complete set of flags and avoid
conflicts.
Change-Id: I3b326e3d97bc297c0314cfc48f0a066c3ff0415e
Reviewed-on: http://gerrit.openafs.org/6615
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
*) Remove all LWP specific code from the fileserver, and make pthread
the default
*) Build the pthreaded fileserver in the 'viced' directory, rather than
in tviced
*) Move the DAFS specific files from tviced to viced (arguably, these
should move into dviced, but there are currently no source files in
that directory)
*) Remove tviced from the build
Change-Id: I6e186c9fad6d9dccd04cf1317a80c087587ef25f
Reviewed-on: http://gerrit.openafs.org/5816
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Currently SYNC clients will "disable" themselves on certain error
patterns. For example, if the server end closes its file descriptor
too many times, or takes too long and then closes the fd, the SYNC
client will return an error and set fatal_error. On any subsequent
SYNC requests, the request will immediately fail without contacting
the server, often making SYNC client programs effectively useless
until they are restarted.
There isn't really any reason to cause future requests to fail.
Transient problems in the fileserver can easily make this situation
possible (e.g. a fileserver can crash but still take several minutes
to close the SYNC fd while the core is written to disk), and so while
we may return an error for a specific problematic request, future
requests may be fine.
So, just remove everything related to fatal_error, so future SYNC
requests can continue to be attempted. Adjust some log messages to
reflect the new behavior.
Change-Id: I4b8bfe53f591a9e8541cd5a98c909208df5bcbac
Reviewed-on: http://gerrit.openafs.org/6548
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
keep pool of connections to use for replicated volumes,
so we can have a separate idle time setting
Change-Id: I61ed62c652c924b33fde920fac766c4ca0043826
Reviewed-on: http://gerrit.openafs.org/6546
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
The lock reader history on osi_rwlock is proving to be too
expensive. Only use it for DEBUG builds. Leave the data
structures the same so that DEBUG builds can be mixed with
a RELEASE build of afsd_service.exe.
Change-Id: If0eeddb63c8f9919cdb5e119f31cde77974447b6
Reviewed-on: http://gerrit.openafs.org/6559
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Over the lifetime of OpenAFS a number of bugs have been discovered
that can result in data corruption. This new mode (Windows only)
will double check that the data received by the file server does
in fact match the data that was written by the cache manager.
After a successful StoreData and status merge but before the BIOD
is released, a fetchdata is issued to read the data written by the
cache manager. If the data fails to match, the StoreData operation
is repeated.
Data verification mode can be queried with "fs getverify" and set
with "fs setverify {on, off}". The default value can be set with
the TransarcAFSDaemon\Parameters DWORD "VerifyData" registry value.
By default verification is disabled.
Change-Id: Ic99c1692e6e78790e65ae600c3e428a79df59370
Reviewed-on: http://gerrit.openafs.org/6601
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Releasing the BIOD permits the accumulated buffers to be accessed.
Releasing the BIOD before the cm_MergeStatus() call creates a
window where the buffer data version is larger than the cm_scache
data version. Release the BIOD after the status merge.
Change-Id: I023413cd41fbbd2d844d79a3b29c087792fffa24
Reviewed-on: http://gerrit.openafs.org/6598
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
when we are going to hit the backend storage, disable keepalives.
the net effect of this is that no idle dead time is needed; instead,
the normal dead time will result in a connection with no activity
simply dying naturally if i/o blocks forever.
it's important that keepalives be enabled during callback breaks,
so that is done.
Change-Id: I1a7bfe0bc62a092ca7dd6dbc4710f1b8254ca9a1
Reviewed-on: http://gerrit.openafs.org/6515
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
FS_STATS_DETAILED has been unconditionally defined since the IBM days.
Adjust the code to assume it is set.
Change-Id: If7fb913bbb42dba5d749e7c30b8d9b7d81e4b4f8
Reviewed-on: http://gerrit.openafs.org/5550
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
When a file server returns the VBUSY error for an RPC the
cache manager records the 'srv_busy' state in the cm_serverRef_t
structure binding that file server to the active cm_volume_t
object. The 'srv_busy' was never cleared which prevents the
volume from being accessed.
Clear the 'srv_busy' flag whenever cm_Analyze() receives a
CM_ERROR_ALLBUSY error which means that all replicas have
been tried or whenever the error is not VBUSY or VRESTARTING.
FIXES 130537
Change-Id: I5020198e4f0ded1df0f64e228e699852f9de7c4d
Reviewed-on: http://gerrit.openafs.org/6563
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
RX_CALL_IDLE has been treated the same as RX_CALL_DEAD which is
a fatal error that results in the server being marked down. This
is not the appropriate behavior for an idle dead timeout error
which should not result in servers being marked down.
Idle dead timeouts are locally generated and are an indication
that the server:
a. is severely overloaded and cannot process all
incoming requests in a timely fashion.
b. has a partition whose underlying disk (or iSCSI, etc) is
failing and all I/O requests on that device are blocking.
c. has a large number of threads blocking on a single vnode
and cannot process requests for other vnodes as a result.
d. is malicious.
RX_CALL_IDLE is distinct from RX_DEAD_CALL in that idle dead timeout
handling should permit failover to replicas when they exist in a
timely fashion but in the non-replica case should not be triggered
until the hard dead timeout. If the request cannot be retried, it
should fail with an I/O error. The client should not retry a request
to the same server as a result of an idle dead timeout.
In addition, RX_CALL_IDLE indicates that the client has abandoned
the call but the server has not. Therefore, the client cannot determine
whether or not the RPC will eventually succeed and it must discard
any status information it has about the object of the RPC if the
RPC could have altered the object state upon success.
This patchset splits the RX_CALL_DEAD processing in cm_Analyze() to
clarify that only RX_CALL_DEAD errors result in the server being marked
down. Since Rx idle dead timeout processing is per connection and
idle dead timeouts must differ depending upon whether or not replica
sites exist, cm_ConnBy*() are extended to select a connection based
upon whether or not replica sites exist. A separate connection object
is used for RPCs to replicated objects as compared to RPCs to non-replicated
objects (volumes or vldb).
For non-replica connections the idle dead timeout is set to the hard
dead timeout. For replica connections the idle dead timeout is set
to the configured idle dead timeout.
Idle dead timeout events and whether or not a retry was triggered
are logged to the Windows Event Log.
cm_Analyze() is given a new 'storeOp' parameter which is non-zero
when the execute RPC could modify the data on the file server.
Change-Id: Idef696b15a8161335aa48907c15a4dc37f918bdb
Reviewed-on: http://gerrit.openafs.org/6118
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>