openafs/doc/txt/vm-integration
Michael Meffie 90acda692a relocate old afs docs to doc/txt
Move the afs/DOC files to the top-leve doc/txt directory, since this has
become the home for developer oriented documentation.

Change-Id: I128d338c69534b4ee6043105a7cfd390b280afe3
Reviewed-on: https://gerrit.openafs.org/12662
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
2017-07-26 19:39:43 -04:00

106 lines
5.6 KiB
Plaintext

Copyright 2000, International Business Machines Corporation and others.
All Rights Reserved.
This software has been released under the terms of the IBM Public
License. For details, see the LICENSE file in the top-level source
directory or online at http://www.openafs.org/dl/license10.html
Here's a quick guide to understanding the AFS 3 VM integration. This
will help you do AFS 3 ports, since one of the trickiest parts of an
AFS 3 port is the integration of the virtual memory system with the
file system.
The issues arise because in AFS, as in any network file system,
changes may be made from any machine while references are being made
to a file on your own machine. Data may be cached in your local
machine's VM system, and when the data changes remotely, the cache
manager must invalidate the old information in the VM system.
Furthermore, in some systems, there are pages of virtual memory
containing changes to the files that need to be written back to the
server at some time. In these systems, it is important not to
invalidate those pages before the data has made it to the file system.
In addition, such systems often provide mapped file support, with read
and write system calls affecting the same shared virtual memory as is
used by the file should it be mapped.
As you may have guessed from the above, there are two general styles
of VM integration done in AFS 3: one for systems with limited VM
system caching, and one for more modern systems where mapped files
coexist with read and write system calls.
For the older systems, the function osi_FlushText exists. Its goal is
to invalidate, or try to invalidate, caches where VM pages might cache
file information that's now obsolete. Even if the invalidation is
impossible at the time the call is made, things should be setup so
that the invalidation happens afterwards.
I'm not going to say more about this type of system, since fewer and
fewer exist, and since I'm low on time. If I get back to this paper
later, I'll remove this paragraph. The rest of this note talks about
the more modern mapped file systems.
For mapped file systems, the function osi_FlushPages is called from
various parts of the AFS cache manager. We assume that this function
must be called without holding any vnode locks, since it may call back
to the file system to do part of its work.
The function osi_FlushPages has a relatively complex specification.
If the file is open for writing, or if the data version of the pages
that could be in memory (vp->mapDV) is the current data version number
of the file, then this function has no work to do. The rationale is
that if the file is open for writing, calling this function could
destroy data written to the file but not flushed from the VM system to
the cache file. If mapDV >= DataVersion, then flushing the VM
system's pages won't change the fact that we can still only have pages
from data version == mapDV in memory. That's because flushing all
pages from the VM system results in a post condition that the only
pages that might be in memory are from the current data version.
If neither of the two conditions above occur, then we actually
invalidate the pages, on a Sun by calling pvn_vptrunc. This discards
the pages without writing any dirty pages to the cache file. We then
set the mapDV field to the highest data version seen before we started
the call to flush the pages. On systems that release the vnode lock
while doing the page flush, the file's data version at the end of this
procedure may be larger than the value we set mapDV to, but that's
only conservative, since a new could have been created from the
earlier version of the file.
There are a few times that we must call osi_FlushPages. We should
call it at the start of a read or open call, so that we raise mapDV to
the current value, and get rid of any old data that might interfere
with later reads. Raising mapDV to the current value is also
important, since if we wrote data with mapDV < DataVersion, then a
call to osi_FlushPages would discard this data if the pages were
modified w/o having the file open for writing (e.g. using a mapped
file). This is why we also call it in afs_map. We call it in
afs_getattr, since afs_getattr is the only function guaranteed to be
called between the time another client updates an executable, and the
time that our own local client tries to exec this executable; if we
fail to call osi_FlushPages here, we might use some pages from the
previous version of the executable file.
Also, note that we update mapDV after a store back to the server
completes, if we're sure that no other versions were created during
the file's storeback. The mapDV invariant (that no pages from earlier
data versions exist in memory) remains true, since the only versions
that existed between the old and new mapDV values all contained the
same data.
Finally, note a serious incompleteness in this system: we aren't
really prepared to deal with mapped files correctly. In particular,
there is no code to ensure that data stored in dirty VM pages ends up
in a cache file, except as a side effect of the segmap_release call
(on Sun 4s) that unmaps the data from the kernel map, and which,
because of the SM_WRITE flag, also calls putpage synchronously to get
rid of the data.
This problem needs to be fixed for any system that uses mapped files
seriously. Note that the NeXT port's generic write call uses mapped
files, but that we've set a flag (close_flush) that ensures that all
dirty pages get flushed after every write call. It is also something
of a performance hit, since it would be better to write those pages to
the cache asynchronously rather than after every write, as happens
now.