Since the library creates its own background thread, the library must
load its own reference to itself to prevent the library from being
unloaded behind its back.
remove the conditionalized code used to give up callbacks in response
to stat cache recycling due to performance impacts described in the
commit for DELTA windows-give-up-callbacks-20070627
This large patch adds support for giving up callbacks in response to three
events:
1. power management suspend
2. power management shutdown
3. stat cache object recycling
The third item is submitted as a condition compilation if GIVE_UP_CALLBACKS
is defined. Properly handing callback give ups and the associated race
conditions with revokes and fetch status requests requires a great deal of
over head. The first attempt used one GiveUpCallBacks RPC for each callback
that was being dropped as the stat cache object was recycled. This resulted
in a 27% performance drop in the MIT stress test. The code that is being
committed maintains a callback give up list on each server object. The
callback is added to the list as the callbacks are dropped and then they
are sent to the server in bulk by the background daemon thread if the
server is known to be UP after a ping. Logic is added to the
EndCallbackRequest and CallbackRevoke operations to ensure that race
conditions are addressed. With all of this, there is a 17% performance drop
in the MIT stress test.
As a result, it is my conclusion that the client side costs associated with
optimizing the load on the server are simply too high. I am committing this
code to ensure that it is not lost. I will remove this support in the next
patch while leaving the support for giving up all callbacks in response
to suspend and shutdown events.
FIXES 63763
probe for something else for 2.4 and older
====================
This delta was composed from multiple commits as part of the CVS->Git migration.
The checkin message with each commit was inconsistent.
The following are the additional commit messages.
====================
i'll spare you
return an error when the cm_fid_t * is NULL since we can't look up
the volume to obtain a server list without knowing which volume we
should be looking up
if the fidp is known to be NULL, don't call cm_GetServerList()
Add name and ID hash tables for cell lookups. cell lookups occur on
every request. sometimes multiple times. removing the walking of the
cell list when there are dozens of cells decreases cpu utilization and
increases throughput.
there were two sets of registry values that could be used to configure
the daemon thread check intervals. keep the one that was documented
in the release notes and discard the other.
Add a registry value "daemonCheckOfflineVolInterval" to configure the
offline volume check interval.
Ensure that the cm_GetConn... functions initialized the output variables
to NULL on error.
When we are faking the status data we can use the vnode value to determine
if the object should be treated as a directory or file. even is a directory
and odd is a file. This works even when we have never successfully
obtained status data for the object.
If the we can match up the host address from which the revoke was received
with one of our cm_server_t objects, then we know which cell the revoke
has been received from. With that information we can ensure that we only
revoke the status of cm_scache_t objects belonging to that cell.
Reverse the order of the allCellsp list. Append new cells onto the end
of the list. This ensures that the workstation cell will always be the
first in the list. Adding additional cells will not degrade the performance
to the workstation cell.
No longer permit cm_GetCell() or cm_FindCellByID() to return NULL simply
because cm_UpdateCell() failed. The cm_cell_t object still exists and
is valid even if the vlServersp list is empty.
Modify the lock management in cm_GetCell_Gen() to ensure we drop all the
locks.
In cm_Analyze() update the volume status when one of the servers reports
VBUSY or VRESTARTING.
fix deadlock on cm_volumeLock introduced by last week's work
in cm_Analyze, make sure we get a cm_cell_t reference otherwise we
won't find the cm_volume_t we are searching for when ALLOFFLINE or
ALLBUSY.
VMWare adapters have proven unreliable replacements for the Microsoft
loopback adapter. Registering AFS often results in a name space collision.
Add cm_DumpCells() function and dump the cells as part of "fs memdump"
Dump all cm_scache_t and cm_volume_t regardless of reference counts
Fix cm_GetCell_Gen() to not allocate a new cm_cell_t when evaluating
mount points to aliases. Instead, after looking up the alias successfully
search the allCellsp list for the fullname of the cell. If found, use
the existing entry and cleanup the one we were about to allocate.
Use read locks whenever possible instead of write locks when searching
the allCellsp list.
Don't assume that WM_DESTROY is the final message received by a
window. Verify dialog data structures when handling messages and
reset the window data field when freeing the data structure.
Zero should be considered a valid credentials type identifier in
Network Identity Manager.
When checking if an identity is configured to obtain a token for a
specific cell, don't go through the list of cells if AFS tokens
are disabled for the identity.
Similarly, when removing a token for a specific cell from all
identities, don't bother modifying identities for whom AFS tokens
are disabled.
Keep track of whether a specific cell was added to the list of
cells to authenticate for an identity because it was listed in the
configuration or because a token for the cell already existed.
Correct an off-by-one error when calculating buffer sizes for
multi strings which failed to account for a double NULL
terminator.
Don't update the cell->identity mapping if a token for that cell
could not be obtained.
If the list of cell to authenticate for an identity is empty, we
still need to write the empty string to the configuration.
Otherwise, removing all the tokens from an identity will not
result in a configuration change reflecting that.
fix cm_IoctlPathAvailability to return the current volume state.
0, CM_ERROR_ALLBUSY, CM_ERROR_ALLDOWN, CM_ERROR_ALLOFFLINE
modify fs.c to generate messages when the errors are received.
When the system's IP address list changes we invalidate the existing
RX connections and probe all of the servers. A better algorithm is
to probe all vldb servers, invalidate the rx connections, and then
probe all file servers.
update the lwp version of rxi_sendmsg to return the same error, -1,
returned by the pthread version.
replace errno with WSAGetLastError() in the Windows blocks so that
the correct error value is checked.
FIXES 61906
2.6.21.1 introduces an additional .parent pointer in the middle of
the structure. As the OpenAFS code just initialises the structure
with a list, this causes it to assign the value intended
for .proc_handler to .parent
* re-write cm_Analyze to make better use of the known volume
status. VL_Server queries cannot result in CM_ERROR_ALLOFFLINE
messages.
* renamed cm_CheckBusyVolumes to cm_CheckOfflineVolumes.
busy volumes will be reset to srv_non_busy by the function
but there is no mechanism for querying the busy state other
than by attempting to access the resource.
* cm_Analyze will query the state of an offline volume before
deciding whether or not to retry when all volume instances
are offline.